Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

moinmoin · Nov 17, 2023

Or Surface management invaded Azure.

MS is no stranger to making, erm, strange hardware decisions.

darkswordsman17 · Nov 17, 2023

Mopetar said:
Actually quite interesting. Any idea if AMD starts planning to offer 4P solutions for Epyc as well, or is this just for these parts specifically?

Didn't they announce the 4P when they announced these? Actually I think they touted 8 way? Wasn't it a big part of their new InfinityFabric or...whatever they're calling their Infinity____ interconnect?

I'd guess both probably. Enterprise wants density and I think they've been wanting 4P EPYC since its inception, because they want both more cores per socket but more sockets per server too.

Joe NYC said:
AMD is leaning in favor of 1P for Epyc, and adding extra core to that single processor.

Also, AMD was so concentrated on getting the El Capitan finished on time and Mi300x to market that the CPU solution based on Mi300 were likely on a side track. But we will see what 2024 brings...

Not sure about that as they've pushed 2P the whole time with EPYC haven't they? And I think they either announced or there's evidence that the next gen of EPYC is going to offer 4P.

Joe NYC said:
Could also be a timing issue.

NVidia probably offers a reference design for the 8x H100 that is using Sapphire Rapids. Maybe Microsoft found out that they can just swap the cards with 8xMi300x and everything just worked. So no reason to start validating a new platform, since everyone is racing to deployment.

Gonna say Nvidia wouldn't be ok with that, and no way Microsoft would be able to keep that secret.

As far as why they're pairing with Sapphire Rapids, I'm guessing its a mix of 2 things. Rumor is that Microsoft was already buying up about as much of EPYC production as they could, so its possible its simply production constraints (and AMD probably had to fulfill other contracts as well). I think Intel's chips have some software optimizations that help them in AI workloads, meaning it probably makes more sense to put them there (where CPU is likely much smaller part of the overall performance compared to the GPU based stuff), leaving EPYC for other workloads where its advantages shine more. I'm not sure if people realize how much AI hardware is being bought by these companies, its ridiculous amount (it really is like how consumers and everyone but enterprise went after GPUs for crypto-mining), so they're probably going with whatever they can get and I'd guess they can get a lot of CPUs from Intel.

Guess it could also be that Microsoft is wanting to try and boost ROCm to try and help break NVidia's stranglehold. But ultimately I'd guess its simply production capacity is the biggest factor.

blackangus · Nov 17, 2023

If MS is installing SuperPod reference architecture, I know SR is used in that. Not sure if SuperPod has an Epyc option.

DeathReborn · Nov 17, 2023

Joe NYC said:
Could also be a timing issue.

NVidia probably offers a reference design for the 8x H100 that is using Sapphire Rapids. Maybe Microsoft found out that they can just swap the cards with 8xMi300x and everything just worked. So no reason to start validating a new platform, since everyone is racing to deployment.

Given that H100 (SXM5) & Mi300x (SH5) use different sockets I doubt Microsoft has been able to swap them out unless they are using the PCIe versions.

adroc_thurston · Nov 17, 2023

DeathReborn said:
Given that H100 (SXM5) & Mi300x (SH5) use different sockets I doubt Microsoft has been able to swap them out unless they are using the PCIe versions.

SH5 is 300A only thing.
300X is bog standard OAM2.

randomhero · Nov 18, 2023

Here is my take on RDNA4 and lack of MCM high end SKUs, for what is worth.
I think AMD scrapped those not for having trouble with software side but money. Or to be more precise revenue opportunity. There is finite amount of packaging capacity and due to AI craze they choose to spend it on MI300 and MI400. Those bring revenue in multiples of $10k per unit, in contrast to at best $1k per unit in case of high end RDNA4.

So IMHO, all rumours of MCM being problematic (not working efficiently, etc.) are false, AMD just went for money, as any sane business should.

adroc_thurston · Nov 18, 2023

randomhero said:
I think AMD scrapped those not for having trouble with software side but money. Or to be more precise revenue opportunity

No.
It wasn't about software really.

randomhero said:
There is finite amount of packaging capacity and due to AI craze they choose to spend it on MI300 and MI400.

Navi4c did not use CoWoS or anything 2.5D.

darkswordsman17 · Nov 18, 2023

randomhero said:
Here is my take on RDNA4 and lack of MCM high end SKUs, for what is worth.
I think AMD scrapped those not for having trouble with software side but money. Or to be more precise revenue opportunity. There is finite amount of packaging capacity and due to AI craze they choose to spend it on MI300 and MI400. Those bring revenue in multiples of $10k per unit, in contrast to at best $1k per unit in case of high end RDNA4.

So IMHO, all rumours of MCM being problematic (not working efficiently, etc.) are false, AMD just went for money, as any sane business should.

I'm sure prioritizing enterprise production capacity in a buying spree could play some role, but I also doubt AMD has sorted out MCM graphics rendering issues. They still haven't actually built a split graphics chiplet (the GPU is still single chip, they split off the memory into sub-chiplets) design yet and that's the hardest part.

adroc_thurston said:
No.
It wasn't about software really.

Navi4c did not use CoWoS or anything 2.5D.

What is the reason for it? By it, I mean the rumored RDNA4 being just what 2 lower end chips. I forget what else was said, like they're also monolithic or still similar to how RDNA3 is?

No, but it would potentially take away from their overall wafers that could be going towards enterprise?

PJVol · Nov 18, 2023

darkswordsman17 said:
the GPU is still single chip, they split off the memory into sub-chiplets

What they also weren't particularly succeessful at, to put it mildly.

Ajay · Nov 18, 2023

adroc_thurston said:
It wasn't about software really.

Well, not on AMD's part as you've pointed out in the past. MickeySoft never considered the possible utility of distributed compute for consumer graphics.

adroc_thurston · Nov 18, 2023

darkswordsman17 said:
What is the reason for it?

Making anything tiled work for 25yo weirdly serial graphics APIs is hard.

darkswordsman17 said:
No, but it would potentially take away from their overall wafers that could be going towards enterprise?

TSMC has a ton of spare bleeding edge capacity.

Ajay said:
MickeySoft never considered the possible utility of distributed compute for consumer graphics.

MS has nothing to do with RDNA4.

Ajay · Nov 18, 2023

adroc_thurston said:
MS has nothing to do with RDNA4.

You have previously pointed out the difficulty of dealing with serial portions of the DX APIs as the most difficult problem to solve wrt/RDNA5/4 - hence the difficulty in getting it right and on time. Perhaps this is the problem with the terse answers you usually provide, you don't give a full enough picture and we have to try and assemble the pieces. Apparently, that's not working in my case - and I suspect in others as well.

adroc_thurston · Nov 18, 2023

Ajay said:
You have previously pointed out the difficulty of dealing with serial portions of the DX APIs as the most difficult problem to solve wrt/RDNA5/4 - hence the difficulty in getting it right and on time.

Yeah, but that's not an MS issue, it's a legacy designs from ca 1995 issue.
OGL and derivatives are the same.

randomhero · Nov 19, 2023

adroc_thurston said:
No.
It wasn't about software really.

Navi4c did not use CoWoS or anything 2.5D.

So what is it?
According to your statements, latency sensitive part of code is problem, not hardware. But to solve those issues in tiled architectures, to be precise multi chiplet architectures you need to use advanced packaging. Which again, is not used according to your statements.
Well, no wonder high end SKUs were total failure. According to your statements RTG is led by bunch of lunatics and complete idiots.
Sorry if this post sounds aggressive but there is no other way to put it.

GodisanAtheist · Nov 19, 2023

adroc_thurston said:
Making anything tiled work for 25yo weirdly serial graphics APIs is hard.

- I don't get this. So AMD designed an entire arch/product stack/whatever around a problem that hasn't been solved?

It's like ok guys I am making this incredible interplanetary spaceship that will take people to other star systems in style/luxury/comfort, we just have to figure out FTL travel first...

branch_suggestion · Nov 19, 2023

randomhero said:
So what is it?
According to your statements, latency sensitive part of code is problem, not hardware. But to solve those issues in tiled architectures, to be precise multi chiplet architectures you need to use advanced packaging. Which again, is not used according to your statements.
Well, no wonder high end SKUs were total failure. According to your statements RTG is led by bunch of lunatics and complete idiots.
Sorry if this post sounds aggressive but there is no other way to put it.

It uses active-Si bridges between base dies (AID's) and SoIC to stack the SED's atop the AID's. There is a leaked diagram showing the packaging layout earlier ITT. Also refer to the patent Spec was showing ages ago which sure enough is the final design.
Just to add, AMD tries to avoid CoWoS-S like the plague as it is expensive and hard to make at high volume, but it just works, which is why companies without much experience with advanced packaging all use it. AMD tried to make CoWoS-R work for MI300 but it would've missed TTM due to the extra work needed to make the packaging thermally stable. MI400 will likely use different packaging.

Joe NYC · Nov 19, 2023

branch_suggestion said:
It uses active-Si bridges between base dies (AID's) and SoIC to stack the SED's atop the AID's. There is a leaked diagram showing the packaging layout earlier ITT. Also refer to the patent Spec was showing ages ago which sure enough is the final design.
Just to add, AMD tries to avoid CoWoS-S like the plague as it is expensive and hard to make at high volume, but it just works, which is why companies without much experience with advanced packaging all use it. AMD tried to make CoWoS-R work for MI300 but it would've missed TTM due to the extra work needed to make the packaging thermally stable. MI400 will likely use different packaging.

If the SoIC Active Silicon Bridges are feasible, then CoWoS becomes redundant.

randomhero · Nov 19, 2023

branch_suggestion said:
It uses active-Si bridges between base dies (AID's) and SoIC to stack the SED's atop the AID's. There is a leaked diagram showing the packaging layout earlier ITT. Also refer to the patent Spec was showing ages ago which sure enough is the final design.
Just to add, AMD tries to avoid CoWoS-S like the plague as it is expensive and hard to make at high volume, but it just works, which is why companies without much experience with advanced packaging all use it. AMD tried to make CoWoS-R work for MI300 but it would've missed TTM due to the extra work needed to make the packaging thermally stable. MI400 will likely use different packaging.

So we return bac to my OGP about topic. It is cost and revenue opportunity. Everything is finite resource, hence use that limited resource as efficiently as you can - produce MI300 and MI400 as much and as fast as you can. There is AI craze going on out there.

PJVol · Nov 19, 2023

randomhero said:
There is AI craze going on out there.

What is this in essense? Who are the end consumers of all this AI things and what they need it for?

randomhero · Nov 19, 2023

Bihevioral analysis, urban development models, logistics optimisation, chip design, marketing, materials research, production processes optimisation, etc.....

adroc_thurston · Nov 19, 2023

randomhero said:
you need to use advanced packaging. Which again, is not used according to your statements.

Advanced packaging is not just 2.5D garbage.
There are other options.

GodisanAtheist said:
So AMD designed an entire arch/product stack/whatever around a problem that hasn't been solved?

Yeah I mean that's how one innovates.
You have a problem and you devise a solution.

Joe NYC said:
If the SoIC Active Silicon Bridges are feasible, then CoWoS becomes redundant.

Unless you need HBM.

GodisanAtheist · Nov 19, 2023

adroc_thurston said:
Yeah I mean that's how one innovates.
You have a problem and you devise a solution.

-100% agreed, now do you take that non-functional solution and plan to bring a product to market with it before the problem itself has actually been solved?

adroc_thurston · Nov 19, 2023

GodisanAtheist said:
before the problem itself has actually been solved?

They are solving it, that's the idea.

GodisanAtheist · Nov 19, 2023

adroc_thurston said:
They are solving it, that's the idea.

-But they haven't solved it. That's the other idea.

adroc_thurston · Nov 19, 2023

GodisanAtheist said:
But they haven't solved it.

That's how R&D cycles go.
You have the problem and you make a solution.

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Diamond Member

Lifer

Senior member

Platinum Member

Diamond Member

Member

Diamond Member

Lifer

Senior member

Lifer

Diamond Member

Lifer

Diamond Member

Member

Diamond Member

Senior member

Platinum Member

Member

Senior member

Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member