Discussion RDNA4 + CDNA3 Architectures Thread

Page 46 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:
Mar 11, 2004
23,280
5,722
146
Actually quite interesting. Any idea if AMD starts planning to offer 4P solutions for Epyc as well, or is this just for these parts specifically?

Didn't they announce the 4P when they announced these? Actually I think they touted 8 way? Wasn't it a big part of their new InfinityFabric or...whatever they're calling their Infinity____ interconnect?

I'd guess both probably. Enterprise wants density and I think they've been wanting 4P EPYC since its inception, because they want both more cores per socket but more sockets per server too.

AMD is leaning in favor of 1P for Epyc, and adding extra core to that single processor.

Also, AMD was so concentrated on getting the El Capitan finished on time and Mi300x to market that the CPU solution based on Mi300 were likely on a side track. But we will see what 2024 brings...

Not sure about that as they've pushed 2P the whole time with EPYC haven't they? And I think they either announced or there's evidence that the next gen of EPYC is going to offer 4P.

Could also be a timing issue.

NVidia probably offers a reference design for the 8x H100 that is using Sapphire Rapids. Maybe Microsoft found out that they can just swap the cards with 8xMi300x and everything just worked. So no reason to start validating a new platform, since everyone is racing to deployment.

Gonna say Nvidia wouldn't be ok with that, and no way Microsoft would be able to keep that secret.

As far as why they're pairing with Sapphire Rapids, I'm guessing its a mix of 2 things. Rumor is that Microsoft was already buying up about as much of EPYC production as they could, so its possible its simply production constraints (and AMD probably had to fulfill other contracts as well). I think Intel's chips have some software optimizations that help them in AI workloads, meaning it probably makes more sense to put them there (where CPU is likely much smaller part of the overall performance compared to the GPU based stuff), leaving EPYC for other workloads where its advantages shine more. I'm not sure if people realize how much AI hardware is being bought by these companies, its ridiculous amount (it really is like how consumers and everyone but enterprise went after GPUs for crypto-mining), so they're probably going with whatever they can get and I'd guess they can get a lot of CPUs from Intel.

Guess it could also be that Microsoft is wanting to try and boost ROCm to try and help break NVidia's stranglehold. But ultimately I'd guess its simply production capacity is the biggest factor.
 
Last edited:
Reactions: Mopetar

DeathReborn

Platinum Member
Oct 11, 2005
2,770
775
136
Could also be a timing issue.

NVidia probably offers a reference design for the 8x H100 that is using Sapphire Rapids. Maybe Microsoft found out that they can just swap the cards with 8xMi300x and everything just worked. So no reason to start validating a new platform, since everyone is racing to deployment.
Given that H100 (SXM5) & Mi300x (SH5) use different sockets I doubt Microsoft has been able to swap them out unless they are using the PCIe versions.
 

randomhero

Member
Apr 28, 2020
184
251
136
Here is my take on RDNA4 and lack of MCM high end SKUs, for what is worth.
I think AMD scrapped those not for having trouble with software side but money. Or to be more precise revenue opportunity. There is finite amount of packaging capacity and due to AI craze they choose to spend it on MI300 and MI400. Those bring revenue in multiples of $10k per unit, in contrast to at best $1k per unit in case of high end RDNA4.

So IMHO, all rumours of MCM being problematic (not working efficiently, etc.) are false, AMD just went for money, as any sane business should.
 
Mar 11, 2004
23,280
5,722
146
Here is my take on RDNA4 and lack of MCM high end SKUs, for what is worth.
I think AMD scrapped those not for having trouble with software side but money. Or to be more precise revenue opportunity. There is finite amount of packaging capacity and due to AI craze they choose to spend it on MI300 and MI400. Those bring revenue in multiples of $10k per unit, in contrast to at best $1k per unit in case of high end RDNA4.

So IMHO, all rumours of MCM being problematic (not working efficiently, etc.) are false, AMD just went for money, as any sane business should.

I'm sure prioritizing enterprise production capacity in a buying spree could play some role, but I also doubt AMD has sorted out MCM graphics rendering issues. They still haven't actually built a split graphics chiplet (the GPU is still single chip, they split off the memory into sub-chiplets) design yet and that's the hardest part.

No.
It wasn't about software really.

Navi4c did not use CoWoS or anything 2.5D.

What is the reason for it? By it, I mean the rumored RDNA4 being just what 2 lower end chips. I forget what else was said, like they're also monolithic or still similar to how RDNA3 is?

No, but it would potentially take away from their overall wafers that could be going towards enterprise?
 

adroc_thurston

Diamond Member
Jul 2, 2023
3,549
5,116
96

Ajay

Lifer
Jan 8, 2001
16,094
8,109
136
MS has nothing to do with RDNA4.
You have previously pointed out the difficulty of dealing with serial portions of the DX APIs as the most difficult problem to solve wrt/RDNA5/4 - hence the difficulty in getting it right and on time. Perhaps this is the problem with the terse answers you usually provide, you don't give a full enough picture and we have to try and assemble the pieces. Apparently, that's not working in my case - and I suspect in others as well.
 

adroc_thurston

Diamond Member
Jul 2, 2023
3,549
5,116
96
You have previously pointed out the difficulty of dealing with serial portions of the DX APIs as the most difficult problem to solve wrt/RDNA5/4 - hence the difficulty in getting it right and on time.
Yeah, but that's not an MS issue, it's a legacy designs from ca 1995 issue.
OGL and derivatives are the same.
 

randomhero

Member
Apr 28, 2020
184
251
136
No.
It wasn't about software really.

Navi4c did not use CoWoS or anything 2.5D.
So what is it?
According to your statements, latency sensitive part of code is problem, not hardware. But to solve those issues in tiled architectures, to be precise multi chiplet architectures you need to use advanced packaging. Which again, is not used according to your statements.
Well, no wonder high end SKUs were total failure. According to your statements RTG is led by bunch of lunatics and complete idiots.
Sorry if this post sounds aggressive but there is no other way to put it.
 
Reactions: GodisanAtheist

GodisanAtheist

Diamond Member
Nov 16, 2006
7,160
7,657
136
Making anything tiled work for 25yo weirdly serial graphics APIs is hard.

- I don't get this. So AMD designed an entire arch/product stack/whatever around a problem that hasn't been solved?

It's like ok guys I am making this incredible interplanetary spaceship that will take people to other star systems in style/luxury/comfort, we just have to figure out FTL travel first...
 

branch_suggestion

Senior member
Aug 4, 2023
391
869
96
So what is it?
According to your statements, latency sensitive part of code is problem, not hardware. But to solve those issues in tiled architectures, to be precise multi chiplet architectures you need to use advanced packaging. Which again, is not used according to your statements.
Well, no wonder high end SKUs were total failure. According to your statements RTG is led by bunch of lunatics and complete idiots.
Sorry if this post sounds aggressive but there is no other way to put it.
It uses active-Si bridges between base dies (AID's) and SoIC to stack the SED's atop the AID's. There is a leaked diagram showing the packaging layout earlier ITT. Also refer to the patent Spec was showing ages ago which sure enough is the final design.
Just to add, AMD tries to avoid CoWoS-S like the plague as it is expensive and hard to make at high volume, but it just works, which is why companies without much experience with advanced packaging all use it. AMD tried to make CoWoS-R work for MI300 but it would've missed TTM due to the extra work needed to make the packaging thermally stable. MI400 will likely use different packaging.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,487
3,386
106
It uses active-Si bridges between base dies (AID's) and SoIC to stack the SED's atop the AID's. There is a leaked diagram showing the packaging layout earlier ITT. Also refer to the patent Spec was showing ages ago which sure enough is the final design.
Just to add, AMD tries to avoid CoWoS-S like the plague as it is expensive and hard to make at high volume, but it just works, which is why companies without much experience with advanced packaging all use it. AMD tried to make CoWoS-R work for MI300 but it would've missed TTM due to the extra work needed to make the packaging thermally stable. MI400 will likely use different packaging.
If the SoIC Active Silicon Bridges are feasible, then CoWoS becomes redundant.
 

randomhero

Member
Apr 28, 2020
184
251
136
It uses active-Si bridges between base dies (AID's) and SoIC to stack the SED's atop the AID's. There is a leaked diagram showing the packaging layout earlier ITT. Also refer to the patent Spec was showing ages ago which sure enough is the final design.
Just to add, AMD tries to avoid CoWoS-S like the plague as it is expensive and hard to make at high volume, but it just works, which is why companies without much experience with advanced packaging all use it. AMD tried to make CoWoS-R work for MI300 but it would've missed TTM due to the extra work needed to make the packaging thermally stable. MI400 will likely use different packaging.
So we return bac to my OGP about topic. It is cost and revenue opportunity. Everything is finite resource, hence use that limited resource as efficiently as you can - produce MI300 and MI400 as much and as fast as you can. There is AI craze going on out there.
 

adroc_thurston

Diamond Member
Jul 2, 2023
3,549
5,116
96
you need to use advanced packaging. Which again, is not used according to your statements.
Advanced packaging is not just 2.5D garbage.
There are other options.
So AMD designed an entire arch/product stack/whatever around a problem that hasn't been solved?
Yeah I mean that's how one innovates.
You have a problem and you devise a solution.
If the SoIC Active Silicon Bridges are feasible, then CoWoS becomes redundant.
Unless you need HBM.
 
Reactions: Joe NYC
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |