Question Speculation: RDNA2 + CDNA Architectures thread

uzzi38 · Apr 28, 2020

All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html

kurosaki · Sep 23, 2020

TESKATLIPOKA said:
Techpowerup average performance in 4K.

😂

Yeah, well at those resolutions it's not all arch that's comparable. it's literally like comparing apples to pears. Thank you for the clarification.

Glo. · Sep 23, 2020

I love how things are in discussion twisted.

I was talking about 40 CU GPU being 10% above RTX 2080 Super or 35% above RX 5700 XT.

I've just checked the performance comparison with RTX 2080 Super in TPU suite. Well, if we look at TechPowerUp charts, RTX 2080 Super in 4K is 25% above RX 5700 XT.

Secondly. Thats the performance level that RTX 3070 will achieve, even according to Galax:

GALAX's internal roadmap confirms GeForce RTX 3080 20GB, GeForce RTX 3060 - VideoCardz.com

A leaked Galax roadmap confirms the upcoming GeForce RTX 30 series graphics cards. GALAX confirms GeForce RTX 3080 20GB, GeForce RTX 3060 The company has held a meeting with either employees or key partners in China. The logo in the upper left corner confirms that the slide belongs to Galax. The...

videocardz.com

So lets get back to the discussion. How come 40 CU GPU chip cannot compete with RTX 3070, while clocked at 2.3 GHz and having 10% higher IPC than RDNA1 GPUs?

Can anyone explain this to me? From the start, Im trying to tell people that RTX 3070 WILL NOT ACHIEVE RTX 2080 Ti performance levels. I don't know why people believe in this, and spin the discussion that in order to compete with RTX 3070, 40 CU GPU has to beat RTX 2080 Ti.

Its absolutely ridiculous how overestimated Nvidia is, and their Ampere GPUs.

dzoni2k2 · Sep 23, 2020

Glo. said:
Can anyone explain this to me? From the start, Im trying to tell people that RTX 3070 WILL NOT ACHIEVE RTX 2080 Ti performance levels. I don't know why people believe in this, and spin the discussion that in order to compete with RTX 3070, 40 CU GPU has to beat RTX 2080 Ti.

Nvidias marketing is probably just behind Apples in brainwashing power. Some people actually believe Nvidia more than independent benchmarks. They are still quoting ridiculous performance and perf/W claims from marketing slides that were proven to be complete horse*. It's quite mind boggling.

Glo. · Sep 23, 2020

dzoni2k2 said:
Nvidias marketing is probably just behind Apples in brainwashing power. Some people actually believe Nvidia more than independent benchmarks. They are still quoting ridiculous performance and perf/W claims from marketing slides that were proven to be complete horse*. It's quite mind boggling.

Simplest possible calculations.

RTX 3080 is 25-30% faster than RTX 2080 Ti.

RTX 3080 has 68 SMs, massive bandwidth.

So how come suddenly 44 CU GPU, will achieve RTX 2080 Ti performance, considering that RTX 3080 has 54%(!) more SMs? And does not use GDDR6X, but only GDDR6?

How will it mitigate the undeniable lack of hardware? Nvidia's magic?

So maybe that 40CU has way smaller hill to climb, despite what people want to believe?

blckgrffn · Sep 23, 2020

TESKATLIPOKA said:
Techpowerup average performance in 4K.

NVM, @kurosaki got it covered for me.

HurleyBird · Sep 23, 2020

Glo. said:
And it never will achieve its potential.

Nvidia simply just GCN'd their gaming architecture. It behaves EXACTLY like GCN in games, and EXACTLY like GCN in compute. Mosterous in compute, mediocre in games, with insane inefficiency. The performance increase in compute is not reflected in gaming. For the same exact reasons why GCN never reflected its performance in games.

I'm not sure. GA100 is only designed for compute, but doesn't have the doubling of fp32 resources. fp32 doubling in Ampere seems to be aimed at gaming specifically.

But to play devil's advocate, it's obvious that Nvidia's implementation of fp32 doubling could be better for gaming too.

The 50% ratio between fp32 and int/fp32 hybrid resources seems arbitrary. How many games are going to utilize 50% int? One or two edge cases? Zero? And you can ask the same question about how many games will use 0% int, because if you use any int some of those extra fp32 transistors will go to waste.

No, what you want, and I wouldn't be surprised to see with Hopper, is some percentage of pure fp32 cores, some percentage of hybrid cores, and some percentage of pure int cores. Maybe something like 60%, 25%, and 15% respectively.

Saylick · Sep 23, 2020

HurleyBird said:
I'm not sure. GA100 is only designed for compute, but doesn't have the doubling of fp32 resources. fp32 doubling in Ampere seems to be aimed at gaming specifically.

But to play devil's advocate, it's obvious that Nvidia's implementation of fp32 doubling could be better for gaming too.

The 50% ratio between fp32 and int/fp32 hybrid resources seems arbitrary. How many games are going to utilize 50% int? One or two edge cases? Zero? And you can ask the same question about how many games will use 0% int, because any time you use any int some of those extra fp32 transistors are going to waste.

No, what you want, and I wouldn't be surprised to see with Hopper, is some percentage of pure fp32 cores, some percentage of hybrid cores, and some percentage of pure int cores. Maybe somewhere around 60%, 25%, and 15% respectively.

You've hit the nail on the head. The tricky part is that on a given clock cycle, it's hard to pinpoint how much INT and FP are used. Nvidia says that for every 100 FP operations, there's 36 INT operations, but that's clearly an average over some timeframe of the average workload. Some cycles the GPU doesn't need as much INT and on others it needs more INT. I'm not sure what the ideal balance of purely dedicated FP, INT, and FP/INT cores would be, but a logical approach seems to be one where you profile the typical workload and for each clock cycle, you determine the INT/FP ratio. Then you plot a histogram to see what the distribution looks like, and then you create a GPU where the hybrid cores covers like 1 or 2 standard deviations out from the mean INT/FP ratio is so that you can swing towards more INT or more FP in the vast majority of cases. A parametric study can be conducted to determine the optimal balance of perf to area of using more hybrid cores vs dedicated FP and INT cores.

Heartbreaker · Sep 23, 2020

kurosaki said:
The 2080 Ti is mostly 25% faster. Are we on the same forum? Anand bench Or, are you solely looking at the 1080p figures in certain games?

Where the heck are you seeing the 2080 Ti only 25% faster than the 5700XT?

Heartbreaker · Sep 23, 2020

HurleyBird said:
I'm not sure. GA100 is only designed for compute, but doesn't have the doubling of fp32 resources. fp32 doubling in Ampere seems to be aimed at gaming specifically.

It's really over-provisioned in FP32 for common gaming resolutions, since they didn't really double ROPs/memory BW and other functional units to support all those extra FP32 cores.

A couple of reviewers have noted that this is the reason why it does so much better at 4K (even when not CPU limited), FP32 needs increase disproportionately at higher resolutions. Probably why NVidia is marketing 8K with 3090.

While it is FP32 overkill, it is still efficient design. They took the INT32 unit and added FP32 capability, so there will be significant reuse of die area, like register and cache space. It's cheaper than doing a complete separate FP32 unit.

Zstream · Sep 23, 2020

guidryp said:
Where the heck are you seeing the 2080 Ti only 25% faster than the 5700XT?

Outside of 4k, which is much different than arch discussions as there are more limiting factors.

Heartbreaker · Sep 23, 2020

Zstream said:
Outside of 4k, which is much different than arch discussions as there are more limiting factors.

It's odd but some people seem to want benchmarks done at:

4K for CPU tests.
Under 4K for GPU tests.

Which is precisely backwards to where the differences emerge.

Zstream · Sep 23, 2020

guidryp said:
It's odd but some people seem to want benchmarks done at:

4K for CPU tests.
Under 4K for GPU tests.

Which is precisely backwards to where the differences emerge.

Heck if I know, I frankly don't care anymore about lusting over hardware. I'm sure AMD will have plenty of SKU's to choose from, and compete just fine with Nvidia.

dzoni2k2 · Sep 23, 2020

Zstream said:
Heck if I know, I frankly don't care anymore about lusting over hardware. I'm sure AMD will have plenty of SKU's to choose from, and compete just fine with Nvidia.

They sure set the bar low cheaping out on 7nm. Everyone at AMD is probably grinning from ear to ear right now. 1st Intel and the 10nm debacle now Nvidia with Fermi v2 on 8nm. Mobile is going to be an absolute massacre for Nvidia.

Qwertilot · Sep 24, 2020

Mobile is certainly going to be rather interesting, and much more market relevant than the very top cards.

I would be very surprised if it was a massacre. NV started genuinely significantly ahead & they did get the die shrink and do know how important the mobile market is.

Fairly confident they'll get a decent performance improvement out for mobile.

AMD might well get some market share back though.

TESKATLIPOKA · Sep 24, 2020

Glo. said:
I love how things are in discussion twisted.

I was talking about 40 CU GPU being 10% above RTX 2080 Super or 35% above RX 5700 XT.

I've just checked the performance comparison with RTX 2080 Super in TPU suite. Well, if we look at TechPowerUp charts, RTX 2080 Super in 4K is 25% above RX 5700 XT.

Secondly. Thats the performance level that RTX 3070 will achieve, even according to Galax:

GALAX's internal roadmap confirms GeForce RTX 3080 20GB, GeForce RTX 3060 - VideoCardz.com

A leaked Galax roadmap confirms the upcoming GeForce RTX 30 series graphics cards. GALAX confirms GeForce RTX 3080 20GB, GeForce RTX 3060 The company has held a meeting with either employees or key partners in China. The logo in the upper left corner confirms that the slide belongs to Galax. The...

videocardz.com

So lets get back to the discussion. How come 40 CU GPU chip cannot compete with RTX 3070, while clocked at 2.3 GHz and having 10% higher IPC than RDNA1 GPUs?

Can anyone explain this to me? From the start, Im trying to tell people that RTX 3070 WILL NOT ACHIEVE RTX 2080 Ti performance levels. I don't know why people believe in this, and spin the discussion that in order to compete with RTX 3070, 40 CU GPU has to beat RTX 2080 Ti.

Its absolutely ridiculous how overestimated Nvidia is, and their Ampere GPUs.

RTX 2080 vs RTX 2080 Ti -> Ti is 26% faster
RTX 2080 vs RTX 3070 -> pretty much the same specs(SMs, TMUs, ROPs, Bandwidth) except the number of FP32 cores.
RTX 3070 is then only 14.4% faster than 2080, 8% faster than Super and 2080Ti is only 10% faster. Ok, this is possible, but Bandwidth is a bottleneck in my opinion, 16Ghz would bring extra performance.
Back to Navi2x with 40CU.
RX 5700XT has 1887Mhz clockspeed on average.
2.3GHz is actually 21.9% higher and with 10% IPC It's actually 100*1.1*1.219=134%, let's say 35% faster as a combination of clocks and IPC. This is not impossible, but the bigger problem for me is only 150W TBP and bandwidth If It's only 192bit. 192bit bus with 14-16GHz GDDR6 provides 336-384GB/s which is 14-25% less compared to 448GB/s of RX5700 XT while feeding a 35% faster chip. I would expect 256bit bus with 14-16GHz memory. TBP I won't comment, I did It more than once in the past.

TESKATLIPOKA · Sep 24, 2020

Glo. said:
Simplest possible calculations.

RTX 3080 is 25-30% faster than RTX 2080 Ti.

RTX 3080 has 68 SMs, massive bandwidth.

So how come suddenly 44 CU GPU, will achieve RTX 2080 Ti performance, considering that RTX 3080 has 54%(!) more SMs? And does not use GDDR6X, but only GDDR6?

How will it mitigate the undeniable lack of hardware? Nvidia's magic?

So maybe that 40CU has way smaller hill to climb, despite what people want to believe?

RTX 3070 does not have 44CU but 46SM.

Timorous · Sep 24, 2020

TESKATLIPOKA said:
RTX 2080 vs RTX 2080 Ti -> Ti is 26% faster
RTX 2080 vs RTX 3070 -> pretty much the same specs(SMs, TMUs, ROPs, Bandwidth) except the number of FP32 cores.
RTX 3070 is then only 14.4% faster than 2080, 8% faster than Super and 2080Ti is only 10% faster. Ok, this is possible, but Bandwidth is a bottleneck in my opinion, 16Ghz would bring extra performance.
Back to Navi2x with 40CU.
RX 5700XT has 1887Mhz clockspeed on average.
2.3GHz is actually 21.9% higher and with 10% IPC It's actually 100*1.1*1.219=134%, let's say 35% faster as a combination of clocks and IPC. This is not impossible, but the bigger problem for me is only 150W TBP and bandwidth If It's only 192bit. 192bit bus with 14-16GHz GDDR6 provides 336-384GB/s which is 14-25% less compared to 448GB/s of RX5700 XT while feeding a 35% faster chip. I would expect 256bit bus with 14-16GHz memory. TBP I won't comment, I did It more than once in the past.

I think 3070 vs 2080Ti will be like 5700XT vs R7. At 1080p and 1440p they might be very close but at 4k the 3070 will fall behind slightly in the comparison like 5700XT does.

PS5 is 2.23Ghz peak on the GPU. I can see a 40 CU RDNA2 gpu matching the 2080S. 12 Tflops of Series X is already getting a baseline of 2080 level performance in a really quick port.

The 5700XT has 225W tbp. When AMD compared navi to GCN they compared products not just the dies so on the basis they are doing that again it suggests they can hit 2080Ti performance at 225w tbp. It also suggests they could hit 5700XT perf with a 150W tbp. Given this it would mean that to hit 5700XT +32% the 40CU part would need to have a 200W tbp.

Looking at PS5 vs Series X power supplies it seems that it could go either way because the difference could be down the the PS5 SSD IO arrangement being more power hungry or the audio engine rather than SoC power. These power supplies are also undersized if the GPU in the SoC + Ram is consuming 200W on their own meaning if the performance is there AMD could have beaten their stated 50% perf/watt goal (I bet they had a higher internal target).

KompuKare · Sep 24, 2020

Isn't the PS5 design meant to be older than Series X one?
Closer to RDNA1 than RDNA2?
In which case, using PS5 for speculation isn't that helpful.
Plus, Microsoft have given out a lot more information than Sony.

Antey · Sep 24, 2020

just imagine a big vega (power consumption would be beyond insane though)... 80 Compute Units, 5120 stream processors, 2,3GHz... 5120*2*2,3= 23,55 TFLOPs. if radeon 7 is 13,44 TFLOPs then it would be 75% faster... that would be 26% faster than the 2080ti (based on that techpowerup relative perf data)... hey! not bad.

TESKATLIPOKA · Sep 24, 2020

Antey said:
just imagine a big vega (power consumption would be beyond insane though)... 80 Compute Units, 5120 stream processors, 2,3GHz... 5120*2*2,3= 23,55 TFLOPs. if radeon 7 is 13,44 TFLOPs then it would be 75% faster... that would be 26% faster than the 2080ti (based on that techpowerup relative perf data)... hey! not bad.

Wrong. Vega TFLOPs don''t provide the same performance as RDNA1 or RDNA2 TFlops. Just compare Radeon VII vs RX 5700 XT.

Antey · Sep 24, 2020

TESKATLIPOKA said:
Wrong. Vega TFLOPs doesn't provide the same performance as RDNA1 or RDNA2 TFlops.

What? I'm not talking about RDNA or RDNA2... Radeon 7 is GCN5 Vega.

TESKATLIPOKA · Sep 24, 2020

Antey said:
What? I'm not talking about RDNA or RDNA2... Radeon 7 is GCN5 Vega.

I see, my bad, didn't notice you mentioning big Vega at the beginning, but even so It wouldn't be 75% faster. BTW I don't see a reason for bringing up a hypothetical Big Vega when we have RDNA2 Big Navi.

Antey · Sep 24, 2020

Yes, RDNA has much better use of its resources and can provide a better gaming performance. you are saying that vega tflops and gaming performance doesnt scale linearly? (with pixel fillrate and bandwidth increased accordingly)

TESKATLIPOKA · Sep 24, 2020

First of all, the question is If you also doubled the ROPs and significantly increased the bandwidth or just doubled the CU number in your calculation. Even If you doubled everything, then the scaling is never linear or at least I didn't see It with any GPU.

Edit: actually it is almost linear in 4K at least with RDNA1, If there is enough ROPs and bandwidth.
You can compare RX5500XT 8GB vs RX5700XT 8GB(+87% in TFLOPs and ROPs, bandwidth is doubled)
difference:
FullHD: +79%
WQHD: +83%
4K: +86%
Review

Antey · Sep 24, 2020

mmmh ok, i will try it.

5500XT has 22CU / 1408 SM at 1,845 MHz (boost max) or 1717 MHz for gaming freq; thats 5195 / 4835 GFLOPS
5700XT has 40CU / 2560 SM at 1905 MHz (boost max) or 1755 MHz for gaming freq; thats 9753 / 8986 GFLOPS

5500XT boost 100% -> 5700XT boost 188%
5500XT gaming 100% -> 5700XT gaming 186%

ASUS Radeon RX 5500 XT STRIX 8 GB Review

The ASUS RX 5500 XT STRIX OC is their most premium entry-level Navi design. The large triple-slot, dual-fan cooler achieves outstanding temperatures and is whisper quiet at the same time. Fan stop is included too, and the VRM design is overbuilt, which could explain the good OC results we're seeing.

www.techpowerup.com

i'm going to use 4k data because is close to not CPU bottlenecked (isn't it?)

5500XT 4K gaming perf 100% -> 5700XT 4K gaming perf 186%

i dont know friend, that very linear to me.

Question Speculation: RDNA2 + CDNA Architectures thread

Platinum Member

Senior member

Diamond Member

Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Member

Golden Member

Platinum Member

Platinum Member

Golden Member

Golden Member

Member

Platinum Member

Member

Platinum Member

Member

Platinum Member

Member