Question Speculation: RDNA3 + CDNA2 Architectures Thread

uzzi38 · Jan 23, 2021

Man I have been dying to make this one for a while now.

First rumours for RDNA3 are here so new thread time!

Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3 is much bigger than from RDNA1 to RDNA2. We should expect many big improvements in GFX11. 🤔" / Twitter

Joe NYC · Oct 26, 2022

Kaluan said:
Anyone else think Angstronomics' supposed N32 leak makes little sense?

Why would they suddenly change GPU hardware allocation from N31 to N32 (and funnily enough N33 goes back to N31-style as well).

I would be quite shocked if N32 really is 30WGP and 2560 SIMD per Shader Engine, when N31 and N33 are both 2048 SIMD per SE.

Also why does everyone believe V-stacked RDNA3 won't use specialized SRAM libraries on the MCDs? Doesn't 16+32MB make more sense than 16+16MB? That's how ZenX3D does it at least. Why the departure?

The part about stacking such a tiny die, half the size of the base does not make sense to me either.

I am guessing that TSMC would be using Wafer on Wafer for stacking for this, as opposed to stacking the tiny parts individually, to get the cost down, and if it is Wafer on Wafer stacking, why leave > half of the top wafer area unused?

SteinFG · Oct 26, 2022

Joe NYC said:
The part about stacking such a tiny die, half the size of the base does not make sense to me either.

I am guessing that TSMC would be using Wafer on Wafer for stacking for this, as opposed to stacking the tiny parts individually, to get the cost down, and if it is Wafer on Wafer stacking, why leave > half of the top wafer area unused?

because taping out 1 design is cheaper than 2. Each mask costs millions. The less designs the better.
look at first gen ryzen for the extreme example. AMD used 5 different ways to package their first zen1 die: Naples, Whitehaven, Summit Ridge, Snowy Owl (1-die and 2-die variants).

Joe NYC · Oct 26, 2022

SteinFG said:
because taping out 1 design is cheaper than 2. Each mask costs millions. The less designs the better.
look at first gen ryzen for the extreme example. AMD used 5 different ways to package their first zen1 die: Naples, Whitehaven, Summit Ridge, Snowy Owl (1-die and 2-die variants).

It is interesting where the cost crossover would be, between making new masks and just stacking 2 identical MCD dies, over life of RDNA3.

SRAM only wafer should be quite a bit cheaper to make the masks for. There should be fewer metal layers compared to logic.

And, importantly, for the user, there would be more performance from doubling the stacked SRAM.

Kaluan · Oct 27, 2022

SteinFG said:
because taping out 1 design is cheaper than 2. Each mask costs millions. The less designs the better.
look at first gen ryzen for the extreme example. AMD used 5 different ways to package their first zen1 die: Naples, Whitehaven, Summit Ridge, Snowy Owl (1-die and 2-die variants).

If that would be the case, AMD may as well go ahead and market "7950XT" as a 768bit GPU 😅

Tuna-Fish · Oct 27, 2022

RnR_au said:
Don't understand how it can't be ready yet since N32 and N31 shares arch?

There is substantial work in physical design in bringing a product to market even if the logical design (the arch) is the same. AMD GPU side, to the best of my knowledge, has one physical design team that does all the products in their lineup in series, so they get completed one at a time.

amenx · Oct 27, 2022

From quick search I see N32 expected in Q1 2023. That could be 2 to 5 months off. Anything more specific than that?

jpiniero · Oct 27, 2022

amenx said:
From quick search I see N32 expected in Q1 2023. That could be 2 to 5 months off. Anything more specific than that?

Only speculation at this point. March was when the full N22 was released, so that's a good of guess as any.

biostud · Oct 28, 2022

I think N32 and zen4 3D will release simultaneously.

Aapje · Oct 28, 2022

biostud said:
I think N32 and zen4 3D will release simultaneously.

That would be dumb, because then they will compete for media attention.

GodisanAtheist · Oct 28, 2022

Aapje said:
That would be dumb, because then they will compete for media attention.

- They will probably tease one in the presentation of the other though.

AMD's marketing team seems to have grown the minimum required 3rd neuron to have what can be considered a brain over the last couple years so I anticipate they'll roll the products out for as long as possible to keep themselves in the news.

Edit: So long as they have competitive products. You do a simultaneous launch when you want to hide a weaker product behind a more competitive one.

Kaluan · Oct 29, 2022

So
7900XT 20GB/320bit cut N31
7900XTX 24GB/384bit full(?) N31
7950XT 24GB/384bit full(?) N31+3D
Or?

BTW, I've seen some leaked specs say one 10752 SIMD on (one of) the cut N31, were does the "leak" originate from and what do people think of it's validity?
Weird cut at first glance, but I suppose it's based on 1 WGP being cut or defective from each Shader Engine block or 1 every 2 Shader Array blocks (or 1 CU each SA)?

biostud · Oct 29, 2022

So performance wise do you think +100% Navi31 and +50% Navi32 over 6950XT?

TESKATLIPOKA · Oct 29, 2022

https://twitter.com/x/status/1585984908412928001

The theoretical performance is 2x, but it's not necessarily reflected on the game frame.

Performance in a synthetic benchmark according to Greymon55.

Performance is not bad, but more shaders brings only a limited improvement.
WGP: +20%
Frequency: +25-30%(2.9-3GHz)
Just this would mean at beast 100*1.2*(1.25 or 1.3)=144-156
200/(144 or 156)=1.28-1.39 or 28-39% better performance per WGP.

jpiniero · Oct 29, 2022

I should point out that the 4090 is theoretically 2x of 3090 Ti. Well except for the bandwidth. I'm interested to see how much better the vcache model would be over the normal one.

maddie · Oct 29, 2022

TESKATLIPOKA said:
https://twitter.com/x/status/1585984908412928001

Performance in a synthetic benchmark according to Greymon55.

Performance is not bad, but more shaders brings only a limited improvement.
WGP: +20%
Frequency: +25-30%(2.9-3GHz)
Just this would mean at beast 100*1.2*(1.25 or 1.3)=144-156
200/(144 or 156)=1.28-1.39 or 28-39% better performance per WGP.

Do we ignore the > 50% perf/W increase?

As a very high level estimate, this can subsume all of the internal improvements into a box and ignores any blind analysis about the specific architectural improvements.

Kepler_L2 · Oct 29, 2022

TESKATLIPOKA said:
https://twitter.com/x/status/1585984908412928001

Performance in a synthetic benchmark according to Greymon55.

Performance is not bad, but more shaders brings only a limited improvement.
WGP: +20%
Frequency: +25-30%(2.9-3GHz)
Just this would mean at beast 100*1.2*(1.25 or 1.3)=144-156
200/(144 or 156)=1.28-1.39 or 28-39% better performance per WGP.

FMA throughput is around 3.5x
Pixel fillrate is around 2x
L2/L3 cache bandwidth is around 2x
Memory bandwidth is around 2x

If they can't hit 2x performance in games it's due to driver overhead/CPU bottleneck or there's an issue with scaling with the architecture.

alexruiz · Oct 30, 2022

If the performance is indeed that spectacular, AMD will price them accordingly
My take:

Price
RX 7900XTX: ~$1400
RX 7900XT: ~$1100
RX 7800XT: ~$900
RX 7700XT: ~$600
RX 7600XT: ~$400

Performance
RX 7900XTX: 180% of RX 6950XT
RX 7900XT: 150% of RX 6950XT
RX 7800XT: 130% of RX 6950XT
RX 7700XT: 100% of RX 6950XT
RX 7600XT: 70% of RX 6950XT (In between 6750XT and 6800)

TESKATLIPOKA · Oct 30, 2022

maddie said:
Do we ignore the > 50% perf/W increase?

As a very high level estimate, this can subsume all of the internal improvements into a box and ignores any blind analysis about the specific architectural improvements.

How does the improved perf/W affect what I wrote?

Kepler_L2 said:
FMA throughput is around 3.5x
Pixel fillrate is around 2x
L2/L3 cache bandwidth is around 2x
Memory bandwidth is around 2x

If FMA throughput is ~3.5 better, then considering N31 has 12288SP It would also mean ~45% higher clockspeed.
That would be ~3350MHz.
ROPs can be 192(+50%) and with this clockspeed It would be ~2.18x pixel fillrate.

I don't think ROPs are a limiting factor. My bet is memory, then I have to ask why they didn't add more cache but instead regressed.
It will be interesting to see N33 vs N23 when you set the same clockspeed for both of them.

RnR_au · Oct 30, 2022

alexruiz said:
If the performance is indeed that spectacular, AMD will price them accordingly

They can't. They don't have CUDA nor DLSS 3. AMD's noise suppression is also betaware in quality, at least from what I have read.

Tup3x · Oct 30, 2022

RnR_au said:
They can't. They don't have CUDA nor DLSS 3. AMD's noise suppression is also betaware in quality, at least from what I have read.

I think it comes down to ray tracing performance. If it's lacking then it obviously has to be cheaper. If it is a no compromise product this time around, then it will show in price. In that case I hope some kind of price war instead of price cartel.

DisEnchantment · Oct 30, 2022

Kepler_L2 said:
FMA throughput is around 3.5x
Pixel fillrate is around 2x
L2/L3 cache bandwidth is around 2x
Memory bandwidth is around 2x

If they can't hit 2x performance in games it's due to driver overhead/CPU bottleneck or there's an issue with scaling with the architecture.

Lots of resources doubled or more but are there any architectural gains?
So much of rework around the CU/WGP and SoC architecture, I don't think they would have a regression there architecturally, but we will know in few days.

Tup3x · Oct 30, 2022

DisEnchantment said:
Lots of resources doubled or more but are there any architectural gains?
So much of rework around the CU/WGP and SoC architecture, I don't think they would have a regression there architecturally, but we will know in few days.

I'd be surprised if the chiplet approach doesn't have any negative impact. Interesting to see how things turn out.

Yosar · Oct 30, 2022

RnR_au said:
They can't. They don't have CUDA nor DLSS 3. AMD's noise suppression is also betaware in quality, at least from what I have read.

DLSS 3 is pure crap (any technology giving such horrible artifacts is crap). If anyone cares better buy TV with this 'revolutionary' technology. It will be cheaper and
works on every game from start. Either way your latency goes to hell so who cares.
CUDA are worthless on gaming cards because they are gimped in drivers by nVidia. nVidia prefers to sell professional cards with the same
configuration as gaming cards but much more expensive just for not gimping CUDA in drivers.

Not that I want more expensive cards from AMD. Contrary but the only argument for these cards to be cheaper than nVidia is unlimited mindshare nVidia
has. And how AMD will choose to deal with it (if at all).
Of course assuming they are on par with nVidia cards or even better.

RnR_au · Oct 30, 2022

Yosar said:
DLSS 3 is pure crap (any technology giving such horrible artifacts is crap). If anyone cares better buy TV with this 'revolutionary' technology. It will be cheaper and
works on every game from start. Either way your latency goes to hell so who cares.

DLSS 3 is fine for single player games where latency is not so important. And not everyone want a TV to game on. As for artifacts, I'm sure nvidia will release DLSS 3.1 with fixes.

DLSS 3 is not for me since I play twitchy games, but I can see the appeal for those that want to immerse themselves sliding everything to max.

Yosar said:
CUDA are worthless on gaming cards because they are gimped in drivers by nVidia. nVidia prefers to sell professional cards with the same
configuration as gaming cards but much more expensive just for not gimping CUDA in drivers.

I have a friend that runs an nvidia consumer gaming card and CUDA support is fine for his needs. Enables him to do his stable diffusion ai art thing to his hearts content.

Edit: found a recent set of instructions for running on AMD gpu's - https://www.travelneil.com/stable-diffusion-windows-amd.html

...anecdotal observations that this seems to be anywhere from 3x to 8x slower than it is for people on similar-specced Nvidia hardware.

This is the reason why I have argued that AMD needs to be price conscious. They need market share more than anything else in this gpu space.

poke01 · Oct 31, 2022

AMD is in a great position now to beat Nvidia in gaming. They need to all go out and get market share and mind share

Question Speculation: RDNA3 + CDNA2 Architectures Thread

Platinum Member

Platinum Member

Senior member

Platinum Member

Senior member

Golden Member

Diamond Member

Lifer

Lifer

Golden Member

Diamond Member

Senior member

Lifer

Platinum Member

Lifer

Diamond Member

Senior member

Platinum Member

Platinum Member

Platinum Member

Golden Member

Golden Member

Golden Member

Member

Platinum Member

Platinum Member