Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 122 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,702
6,405
146

Joe NYC

Platinum Member
Jun 26, 2021
2,324
2,929
106
A decently clocked N32 would have more flops but less bandwidth so it could ve 5600XT vs 5700 over again. Former has more flops but at 4k the bandwidth difference matters and creates a differentiation.

AMD can increase bandwidth (for higher end model of Navi 32) with stacked SRAM Infinity Cache, if needed.
 

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
AMD can increase bandwidth (for higher end model of Navi 32) with stacked SRAM Infinity Cache, if needed.

They don't need to for N32. Point I am making is that AMD have had a lower end part with more tflops than a higher end card but differentiated on bandwidth and VRAM amount and if N32 is higher clocking by enough that it had more tflops thab the 7900XT it is not the end of the world provided the extra bandwidth gives the 7900XT an advantage at 4k.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,324
2,929
106
That's a costly solution in my opinion. I would rather prefer more IC per MCD, 16MB IC is not a lot per MCD and +50% more shouldn't make a big difference in size.

Most people assume that a stacked SRAM on MCD would have the same amount of SRAM. Which, I don't think is given. The size of the MCD chip would actually fit 64MB of SRAM. 32 MB safely.

AMD must have run multiple cost models to come up with this plan. Biggest variable would be the percentage of plain, non-stacked MCD's that will be manufactured.

If it is, let's say > 80%, then the size of base MCD is extremely important.

Going from base MCD to MCDs that need more Infinity Cache: Stacked MCD maintains the same size, same layout of the chip, and stacked MCD can be replaced as a Lego piece.

Different sized MCDs. with more cache and no stacking would need a different layout of the RDL level connecting the chips (or whatever is being used). And cost of mask sets for 2 different sizes of MCD would exceed the cost of MCD mask set + stacked SRAM mask set. And if the rumor of 2 layers is right, then 2 sets of masks (MCD, SRAM) can make 3 different chips.

So, the solution AMD has is quite elegant and economical.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,324
2,929
106
Lets say 7900XT is gimped to max 2.4 GHz boost. But N32 is not, and is 'fixed' according to that rumor.
So a high boosting N32 at 3.2 GHz+ will match it in perf but costing lesser. Not sure if BW will be an issue for these chips, they have way too much BW.
Will AMD launch a gimped 84CU 7900XT at 2.4 GHz to be beaten by a 7700XT/7800XT at 3.2GHz+
How will the 7900XT buyer receive that?

It's bad enough when a product does not quite reach the intended target.

But it would be outright stupid to be held back in the future because of hurting feeling of past buyers.

Because that's actually what the golden age of PC was all about. Being able to buy same level of performance, sometime later, for 1/2 the price.
 

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
Lets say 7900XT is gimped to max 2.4 GHz boost. But N32 is not, and is 'fixed' according to that rumor.
So a high boosting N32 at 3.2 GHz+ will match it in perf but costing lesser. Not sure if BW will be an issue for these chips, they have way too much BW.
Will AMD launch a gimped 84CU 7900XT at 2.4 GHz to be beaten by a 7700XT/7800XT at 3.2GHz+
How will the 7900XT buyer receive that?

Well 84 CUs @ 2.4Ghz is about 51.6 tflops and 60 CUs @ 3.2 is just under 50.

Also with 3 SEs vs 6 there will be half the ROPs as well which will impact high resolution performance.

So even if the theoretical compute is close there are enough cuts elsewhere for the 7900XT to differentiate itself.

The other option is a quick refresh so release 7950 with N32 and drop 7900 entirely.
 
Reactions: Tlh97

Panino Manino

Senior member
Jan 28, 2017
846
1,061
136
Unfortunately these uncertainties left the door open to more ludicrous previsions...

Even if there will be a higher tier card that can actually compete against 4090(ti) doesn't matter, that car isn't here now, won't be here for the next few months.

But meanwhile AMD may try to invest on some parallel mind share playing Nvidia's game:

 

Kepler_L2

Senior member
Sep 6, 2020
460
1,895
106
Most people assume that a stacked SRAM on MCD would have the same amount of SRAM. Which, I don't think is given. The size of the MCD chip would actually fit 64MB of SRAM. 32 MB safely.

AMD must have run multiple cost models to come up with this plan. Biggest variable would be the percentage of plain, non-stacked MCD's that will be manufactured.

If it is, let's say > 80%, then the size of base MCD is extremely important.

Going from base MCD to MCDs that need more Infinity Cache: Stacked MCD maintains the same size, same layout of the chip, and stacked MCD can be replaced as a Lego piece.

Different sized MCDs. with more cache and no stacking would need a different layout of the RDL level connecting the chips (or whatever is being used). And cost of mask sets for 2 different sizes of MCD would exceed the cost of MCD mask set + stacked SRAM mask set. And if the rumor of 2 layers is right, then 2 sets of masks (MCD, SRAM) can make 3 different chips.

So, the solution AMD has is quite elegant and economical.
Not happening, at least not for this generation.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
From what I've been reading, the expectation was 3.1 - 3.4 GHz. Not a power problem really. A hard wall. Bad libraries. Identified and being corrected. Shades of R520 (X1800 > X1900 ?).

Could the metal layers be the problem in causing signal integrity failures above a certain frequency? No amount of additional power will allow much higher clocks?
 

DisEnchantment

Golden Member
Mar 3, 2017
1,684
6,227
136
Well 84 CUs @ 2.4Ghz is about 51.6 tflops and 60 CUs @ 3.2 is just under 50.

Also with 3 SEs vs 6 there will be half the ROPs as well which will impact high resolution performance.

So even if the theoretical compute is close there are enough cuts elsewhere for the 7900XT to differentiate itself.
Perhaps, or it could be something like XSX vs PS5. But I digress, it is all speculation at this point.
Adored is back...

Hmmm, doubtful. Whole point of chiplets is to not have to tape out individual monolithic dies eventually including GCDs.
Instead of tinkering with a bigger GCD they could just try solving problems with synchronizing the driver submitted job scheduling across multiple CPs and to allow the CPs across multiple GCDs to coordinate the export from the Shader arrays.
But we can see this is happening as we speak, the Front End is decoupled from shaders to allow it to work independently from the SEs to have more flexibility to operate at higher clocks.
From patents this is described as necessary as the CP need to do a lot more latency sensitive work to perform sync across GCDs.
What is needed is also a low latency high bandwidth interconnect to create a cross bar across the SEs for all the Pixel Pipelines, Geometry and all the other sequential stages. Which is now available in the form of EFB.

I believe N31 will be the biggest RDNA3 chip but let's see.
 
Last edited:

DisEnchantment

Golden Member
Mar 3, 2017
1,684
6,227
136
From what I've been reading, the expectation was 3.1 - 3.4 GHz. Not a power problem really. A hard wall. Bad libraries. Identified and being corrected. Shades of R520 (X1800 > X1900 ?).

Could the metal layers be the problem in causing signal integrity failures above a certain frequency? No amount of additional power will allow much higher clocks?
I find it odd that AMD would make such a mistake with bad libraries. This is the very first thing that someone needs to ensure before doing anything.
AMD already launched products on an N5 based process already, so they would have known its characteristics when they started customizing stuffs and 3.x GHz frequency range can use standard PDK and doesn't really need customization of any kind.

Been wondering about this, but alas we will never know if it never was designed in the first place to hit 3+ GHz or that there was a problem and they had to respin.
Unless the information comes from AMD directly we also will never know if somebody is just humoring us with good story telling and its not like AMD will tell the world yes we have a problem and had to go for a redesign, Thank You.

Which makes the slide at VCZ on "Designed to exceed 3 GHz" and Jarred Walton 3 GHz slip up even more bizarre.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,324
2,929
106
I find it odd that AMD would make such a mistake with bad libraries. This is the very first thing that someone needs to ensure before doing anything.
AMD already launched products on an N5 based process already, so they would have known its characteristics when they started customizing stuffs and 3.x GHz frequency range can use standard PDK and doesn't really need customization of any kind.

Been wondering about this, but alas we will never know if it never was designed in the first place to hit 3+ GHz or that there was a problem and they had to respin.
Unless the information comes from AMD directly we also will never know if somebody is just humoring us with good story telling and its not like AMD will tell the world yes we have a problem and had to go for a redesign, Thank You.

Which makes the slide at VCZ on "Designed to exceed 3 GHz" and Jarred Walton 3 GHz slip up even more bizarre.

Transistor density of the GCD is almost on the very high end of the spectrum, more likely associated with much lower frequency mobile designs.

It's not clear if AMD got too cocky, or saw an opening to have both, clock speed and density - which in the end did not pan out as expected....
 

DisEnchantment

Golden Member
Mar 3, 2017
1,684
6,227
136
Transistor density of the GCD is almost on the very high end of the spectrum, more likely associated with much lower frequency mobile designs.
Its in range of Mobile SoC frequency like the A15 which also carries 130+ MTr/mm2 at 3.2GHz.
TSMC Shmoo plot puts N5 HD top end at around 4 GHz. Heat density and power should be a primary concern here and not clock wall.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
I find it odd that AMD would make such a mistake with bad libraries. This is the very first thing that someone needs to ensure before doing anything.
AMD already launched products on an N5 based process already, so they would have known its characteristics when they started customizing stuffs and 3.x GHz frequency range can use standard PDK and doesn't really need customization of any kind.

Been wondering about this, but alas we will never know if it never was designed in the first place to hit 3+ GHz or that there was a problem and they had to respin.
Unless the information comes from AMD directly we also will never know if somebody is just humoring us with good story telling and its not like AMD will tell the world yes we have a problem and had to go for a redesign, Thank You.

Which makes the slide at VCZ on "Designed to exceed 3 GHz" and Jarred Walton 3 GHz slip up even more bizarre.
Maybe I should have written misapplied libraries. The Zen4 chiplets are a lot less dense, aren't they?
 
Reactions: Tlh97 and Leeea

leoneazzurro

Golden Member
Jul 26, 2016
1,005
1,599
136
I wonder if it's truly only a clock wall on some critical section of the chip, because the power draw at these frequencies is still quite high. So maybe there is a clock wall, but also the power draw seems to be higher than expected. Other than that, the product in itself is not bad at all at that price. Only, if N32 indeed has the problem corrected, the performance difference could be not so big. Maybe a respin is already inthe way.
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
I wonder if it's truly only a clock wall on some critical section of the chip, because the power draw at these frequencies is still quite high. So maybe there is a clock wall, but also the power draw seems to be higher than expected. Other than that, the product in itself is not bad at all at that price. Only, if N32 indeed has the problem corrected, the performance difference could be not so big. Maybe a respin is already inthe way.
Yep, if N32 fixes whatever is wrong, is this a good buy?

Collector's card. 1st ever released multi-die gamer GPU. Maybe prices of new cards will keep this one acceptable and not lead to buyer's remorse.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,684
6,227
136
Maybe I should have written misapplied libraries. The Zen4 chiplets are a lot less dense, aren't they?
Yeah, those Zen4 chiplets are around 92.5 MTr/mm2 and obviously can work all the way to 5.8+ GHz, far beyond standard N5 HD operating range.
And the scaling boosters don't come "out of the box" from TSMC, AMD optimized this process with many additional scaling boosters and customization of metal layers etc. etc.

N5 HD standard cells will hit a clock wall at ~4 GHz but they come "out of the box" in the standard TSMC PDK and used by everybody else with no customization needed at the similar density reached by N31 GCD.


Obviously other Mobile SoCs don't have as much transistors packed in one die, so the thermal and power constraints are something which AMD have to address.
So all of this sounds weird but we can speculate all day without really knowing what is going on.
 

Kepler_L2

Senior member
Sep 6, 2020
460
1,895
106
Perhaps, or it could be something like XSX vs PS5. But I digress, it is all speculation at this point.

Hmmm, doubtful. Whole point of chiplets is to not have to tape out individual monolithic dies eventually including GCDs.
Instead of tinkering with a bigger GCD they could just try solving problems with synchronizing the driver submitted job scheduling across multiple CPs and to allow the CPs across multiple GCDs to coordinate the export from the Shader arrays.
But we can see this is happening as we speak, the Front End is decoupled from shaders to allow it to work independently from the SEs to have more flexibility to operate at higher clocks.
From patents this is described as necessary as the CP need to do a lot more latency sensitive work to perform sync across GCDs.
What is needed is also a low latency high bandwidth interconnect to create a cross bar across the SEs for all the Pixel Pipelines, Geometry and all the other sequential stages. Which is now available in the form of EFB.

I believe N31 will be the biggest RDNA3 chip but let's see.
Wild speculation but I believe the de-coupling of FE/SE clocks is due to future generations further splitting them up. Instead of multi-GCD (which is hard) we'll likely see a FED (Front-End Die) containing GCP, HWS, DCN, VCN and PCIe, while the GCD is literally just Shader Engines.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,684
6,227
136
At what Vth?
I attached the Shmoo plot above you can check it.

Can Cac overhead be the reason for bad perf/power scaling?
It should already have been addressed by TSMC for N5 HD since a lot of designers are already shipping N5 HD silicon operating in this frequency range.
You can even look at AD102 with 126+ MTr/mm2 on N4 which can supposedly operate at 3GHz with 80 Billion XTor with some overclock.
AD102 likely has lesser density due to PHYs, MCs, more SRAM but should be similar to N31 process wise save for additional efficiency of N4, but still have a lot more transistors actively switching.

Looking through the SMU driver interface for the gfx11, some parts of it (power play) points to a new avfs model, based on CPO measurement data collection.
GFX11 has a new block IMU which is managing power for all GFX blocks

I think they are design issues if we believe there are any, not process issues.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Wild speculation but I believe the de-coupling of FE/SE clocks is due to future generations further splitting them up. Instead of multi-GCD (which is hard) we'll likely see a FED (Front-End Die) containing GCP, HWS, DCN, VCN and PCIe, while the GCD is literally just Shader Engines.
Is there that much % traffic between the shader engines? If not, as you wrote but with multiple SE block chiplets, the unit size corresponding to the low end model.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |