Question Speculation: RDNA2 + CDNA Architectures thread

uzzi38 · Apr 28, 2020

All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html

TESKATLIPOKA · Jan 28, 2021

menhera said:
The 6700 XT is expected to boost higher than Navi 21. Even an AMD slide shows performance per clk starts to slow down from 2200MHz. Imo 6700 XT absolutely needs a 192-bit bus + 96MB L3 cache if its boost clock is really 2500MHz. It'll be just as good as 3060 Ti at best though.

Ok, so where is your proof that N22 absolutely needs 192bit GDDR6 + 96MB IC? That AMD slide doesn't really show that.
Even If N22 is clocked at ~2.5GHz, that is only 14% more than 2.2GHz and for that you don't need +50% bandwidth and + 50% more IC compared to a hypothetical 2.2GHz RDNA2 GPU, which is 1/2 of N21(40CU, 64ROPs, 64MB iC, 128bit 16GHz GDDR6 ).

leoneazzurro · Jan 28, 2021

Cache is likely to be tied to the target resolution and memory channel configuration. AMD is targetting the 6800 series at 4K, 6700 should be 1440p, lower end cards should be 1080p. The chart in the previous page shows that hit rate at 1440p for 64 Mbytes of cache is lower than 128Mbytes at 4K, that is, it may be too low for the target AMD is having with Navi22. So 96Mbytes could be the optimal size for 1440p. Also, it is very possible that the cache is directly linked to the memory channels, so if you want to get good performance at 1440p 192bits AND 96Mbytes of cache could be the the optimum even if it's likely you will be compute limited instead of being bandwidth limited.

Mopetar · Jan 28, 2021

TESKATLIPOKA said:
Ok, so where is your proof that N22 absolutely needs 192bit GDDR6 + 96MB IC? That AMD slide doesn't really show that.
Even If N22 is clocked at ~2.5GHz, that is only 14% more than 2.2GHz and for that you don't need +50% bandwidth and + 50% more IC compared to a hypothetical 2.2GHz RDNA2 GPU, which is 1/2 of N21(40CU, 64ROPs, 64MB iC, 128bit 16GHz GDDR6 ).

Navi 22 is the Navi 10 replacement. Navi 10 had 40 CUs (same as Navi 22) and used a 256-bit bus to obtain a 448 GB/s memory bandwidth. We already know from Navi 21 that RDNA2 is able to reach much higher clock speeds and there's some expectation that with fewer CUs we could see Navi 22 clock even higher. Even if it were just a straight clock speed boost without any performance improvements due to architectural changes, we'd be looking at a Navi 22 card that has somewhere around 30% more processing power than Navi 10.

If Navi 22 used the same GDDR6 memory as Navi 10 then it only has 336 GB/s of memory bandwidth. It can get 384 GB/s of bandwidth if it uses the same faster memory that Navi 21 cards have, but there's no guarantee that will happen. I think it's hard to make an argument that Navi 22 could get by with a smaller memory bus or less infinity cache, because it not only has 25% less memory bandwidth (worst-case) than Navi 10, but also has 30% more processing power than Navi 10.

If we assumed that Navi 10 were perfectly balanced (it's not, but we'll assume so for the sake of argument) in that the memory bandwidth is has is precisely what it needs to keep the CUs fed and not waiting for data, then increasing the processing power of the chip through a 30% clock speed bump would also require 30% more bandwidth or approximately 582 GB/s. Now consider that we have to do this on a narrower bus, which means that the VRAM on a 192-bit bus would need to have a clock speed roughly 75% faster in order to produce that amount of bandwidth.

I don't know if there's been extensive testing on how to relate the inclusion of infinity cache to effective memory bandwidth, but on one of the presentation slides AMD had some bars which showed 128 MB of infinity cache was an effective 2.17x bandwidth increase. Sloppy napkin math would put 96 MB of infinity cache around the same 75% uplift that Navi 22 would need relative to Navi 21. There's also the other issue that reducing the bus size below 192-bit would have likely capped the cards out at 8 GB of memory, which will probably be an issue for some titles at 1440p released in the next 2 - 5 years.

If AMD really thought they could have used less, they probably would have. Memory controllers and 32 MB of cache eat up a fair bit of die space. With Navi 21 the memory controllers and infinity cache accounted for around a quarter of the die space. With Navi 22 it will be an even larger percentage since it has half (as opposed to three-quarters) the CUs. If we look at Navi 23 which has 8 fewer CUs but the speculated 128-bit memory bus and only 64 MB of infinity cache, the die size decreases by around 100 mm^2. There are likely some other changes there as well to save on die area, but a big part of those savings are the memory controllers and cache.

TESKATLIPOKA · Jan 28, 2021

So Navi 22 has 30% more processing power at 2.5GHz than Navi 10 and because of smaller bus width 192 vs 256bit It needs to compensate the limited bandwidth by also using 96MB IC, right?
Ok, let's say It needs It.
Now my question is, If a 12.8 TFlops N22(2.5GHz) needs 192bit + 96MB IC, then why N23(2.2GHz) with 9 TFlops has only 128bit + 64MB IC or N21(2.3GHz) with 22.5 TFlops has only 256bit + 128MB IC?

N23 vs N22
9TF vs 12.8TF (+42%)
128bit vs 192bit (+50%)
64MB vs 96MB (+50%)

N22 vs N23
12.8TF vs 22.5TF (+76%)
192bit vs 256bit (+33%)
96MB vs 128MB (+33%)

Navi 22 has higher bandwidth and IC per TFlop than both N21 and N23.
So either N22 has more bandwidth and IC than It needs or N23 and especially N21 don't have enough.

Glo. · Jan 28, 2021

TESKATLIPOKA said:
Navi 22 has 30% more processing power at 2.5GHz than Navi 10 and because of smaller bus width 192 vs 256bit It needs to compensate the limited bandwidth by also using 96MB IC.
Ok, let's say It needs It.
Now my question is, If a 12.8 TFlops N22(2.5GHz) needs 192bit + 96MB IC, then why N23(2.2GHz) with 9 TFlops has only 128bit + 64MB IC or N21(2.3GHz) with 22.5 TFlops has only 256bit + 128MB IC.
N23 vs 22
TFlops
Navi 22 has higher bandwidth and IC per TFlop than both N21 and N23.
So either N22 has more bandwidth and IC than It needs or N23 and especially N21 don't have enough.

Looking at the recent news about Tesla infotainment system having 10 TFLOPs and that most likely it uses Navi 23 - expect that its actually 2.5 GHz clock speed, and not 2.3 GHz...

TESKATLIPOKA · Jan 28, 2021

Glo. said:
Looking at the recent news about Tesla infotainment system having 10 TFLOPs and that most likely it uses Navi 23 - expect that its actually 2.5 GHz clock speed, and not 2.3 GHz...

Then we can forget about 130W TBP with that clock speed or the rumors about >200W TBP for N22 don't make any sense.

Glo. · Jan 28, 2021

TESKATLIPOKA said:
Then we can forget about 130W TBP with that clock speed or the rumors about >200W TBP for N22 don't make any sense.

The TBP is correct. Its up to 130W TBP.

TESKATLIPOKA · Jan 28, 2021

Glo. said:
The TBP is correct. Its up to 130W TBP.

Then N22 shouldn't have over 200W TBP and even that is a lot by looking at Its specs vs N23.

moinmoin · Jan 28, 2021

Tesla appears to use N23.

https://twitter.com/x/status/1354854817697566723

Mopetar · Jan 28, 2021

TESKATLIPOKA said:
Navi 22 has higher bandwidth and IC per TFlop than both N21 and N23.
So either N22 has more bandwidth and IC than It needs or N23 and especially N21 don't have enough.

We ultimately don't know what the perfect amount is (especially since we don't have Navi 22 or 23 products to test yet), but considering that some testing by users who are overclocking the memory show good performance gains from doing so, I think it's somewhat reasonably to conclude that Navi 21 probably doesn't have enough bandwidth or that it could use additional infinity cache such that the hit rate is high enough to better compensate for that. Some of the comparisons against the latest Nvidia cards which do have much higher memory bandwidth show that AMD falls off a bit at 4K relative to the other resolutions and the lack of memory bandwidth could be a part of that result.

If you look at the data AMD presented, it's pretty clear that at 128 MB, 1080p games have pretty much hit the point of demising returns at the curve for the hit rate will be essentially flat. For 4K gaming, it looks more linear up to and including 128 MB, so unless it falls off just past that (which seems unlikely) additional infinity cache would improve results. I expect that AMD will include more infinity cache when they move to 5nm. Infinity cache works better at a conceptual level the better the hit rate is and does a better job at compensating for a lack of memory bandwidth. Even being able to go from a hit rate of 70% to 80% is massive. It might only seem like a 10% improvement when viewed that way, but it actually represents a 33% reduction in the times you need to go out to memory.

Of course this leaves a question as to why AMD wouldn't include as much as they perhaps should have with Navi 21, but I think that it's somewhat obvious why given that Navi 21 is already quite large. It's the second largest die they've ever released as a consumer product with only Fiji coming in larger. They may have also anticipated better results than they were able to actually achieve, so it's possible AMD undershot. Another possibility is that AMD was able to get far better clock speeds than they may have initially anticipated, which would also lead to an imbalance.

There's also the nature of graphics themselves. If I want to use a set of textures and other data to construct a 1080p image, I can do so in some fixed amount of time. If I want to use that same set of data to construct a 4K image, I have four times as many pixels to push which will require more shaders to do that work in the same amount of time, but I haven't necessarily increased the amount of memory bandwidth I need by very much. Of course most games will use higher resolution textures, etc. when displaying at 4K as opposed to some other resolution, but the amount of data needed and the memory bandwidth required to feed the additional shaders isn't going to increase by four times. So I don't think there's a linear relationship between how many shaders (or other hardware resources) a GPU needs and the amount of memory bandwidth it requires to keep them all fed.

Hitman928 · Jan 28, 2021

Mopetar said:
We ultimately don't know what the perfect amount is (especially since we don't have Navi 22 or 23 products to test yet), but considering that some testing by users who are overclocking the memory show good performance gains from doing so, I think it's somewhat reasonably to conclude that Navi 21 probably doesn't have enough bandwidth or that it could use additional infinity cache such that the hit rate is high enough to better compensate for that. Some of the comparisons against the latest Nvidia cards which do have much higher memory bandwidth show that AMD falls off a bit at 4K relative to the other resolutions and the lack of memory bandwidth could be a part of that result.

If you look at the data AMD presented, it's pretty clear that at 128 MB, 1080p games have pretty much hit the point of demising returns at the curve for the hit rate will be essentially flat. For 4K gaming, it looks more linear up to and including 128 MB, so unless it falls off just past that (which seems unlikely) additional infinity cache would improve results. I expect that AMD will include more infinity cache when they move to 5nm. Infinity cache works better at a conceptual level the better the hit rate is and does a better job at compensating for a lack of memory bandwidth. Even being able to go from a hit rate of 70% to 80% is massive. It might only seem like a 10% improvement when viewed that way, but it actually represents a 33% reduction in the times you need to go out to memory.

Of course this leaves a question as to why AMD wouldn't include as much as they perhaps should have with Navi 21, but I think that it's somewhat obvious why given that Navi 21 is already quite large. It's the second largest die they've ever released as a consumer product with only Fiji coming in larger. They may have also anticipated better results than they were able to actually achieve, so it's possible AMD undershot. Another possibility is that AMD was able to get far better clock speeds than they may have initially anticipated, which would also lead to an imbalance.

There's also the nature of graphics themselves. If I want to use a set of textures and other data to construct a 1080p image, I can do so in some fixed amount of time. If I want to use that same set of data to construct a 4K image, I have four times as many pixels to push which will require more shaders to do that work in the same amount of time, but I haven't necessarily increased the amount of memory bandwidth I need by very much. Of course most games will use higher resolution textures, etc. when displaying at 4K as opposed to some other resolution, but the amount of data needed and the memory bandwidth required to feed the additional shaders isn't going to increase by four times. So I don't think there's a linear relationship between how many shaders (or other hardware resources) a GPU needs and the amount of memory bandwidth it requires to keep them all fed.

Do you have a link to tests overclocking the memory showing good scaling?

moinmoin · Jan 28, 2021

Mopetar said:
Some of the comparisons against the latest Nvidia cards which do have much higher memory bandwidth show that AMD falls off a bit at 4K relative to the other resolutions and the lack of memory bandwidth could be a part of that result.

Isn't it more like Ampere scales rather badly below 4K where Turing used to do relatively better?

Glo. · Jan 28, 2021

moinmoin said:
Isn't it more like Ampere scales rather badly below 4K where Turing used to do relatively better?

The exact thing that you have quoted we have seen pre-RDNA GPUs.

AMD basically Maxwelled their architectures, while Nvidia GCN'd them. With everything what it implies. The good and the bad.

beginner99 · Jan 29, 2021

moinmoin said:
Tesla appears to use N23.

https://twitter.com/x/status/1354854817697566723

Why the heck would I need an N23 in a car? Let alone a battery powered one? Sounds like a waste of power. Not for AI inference right? Right?

insertcarehere · Jan 29, 2021

beginner99 said:
Why the heck would I need an N23 in a car? Let alone a battery powered one? Sounds like a waste of power. Not for AI inference right? Right?

https://www.notebookcheck.net/Tesla...lt-in-10-teraflop-PC-gaming-rig.517282.0.html

Its literally a gaming rig for the passengers, seems like a real easy way to get carsick, but if nothing else is better than the cookie-cutter infotainment we get from other brands.

In other news this mean there's gonna be even less Navi 2x stock for PC buyers, hooray!

TESKATLIPOKA · Jan 29, 2021

How likely is that in Tesla car will be an OC version of N23? I think pretty low, Instead I would expect the clock speed to be more conservative and 2450MHz(10TFlops) is not unless desktop will see even higher clocks.
If nothing else then N23 could be very interesting in mobile, If you don't mind only 8GB Vram and most likely worse RT performance.
Another thing is If you will be able to buy one for a reasonable price.

leoneazzurro · Jan 29, 2021

8gbytes of Vram should be OK with N23 target resolutions, but in any case the Tesla uses 16 Gbytes (it's written in the diagram which was leaked) so it's not impossible to pair it with more VRAM if needed.

moinmoin · Jan 29, 2021

insertcarehere said:
Its literally a gaming rig for the passengers, seems like a real easy way to get carsick, but if nothing else is better than the cookie-cutter infotainment we get from other brands.

In other news this mean there's gonna be even less Navi 2x stock for PC buyers, hooray!

Yeah, it's purely for infotainment. What's odd is that it apparently is still using some Atom SoC on the CPU side, talk about unbalanced system.

Don't think there's such a huge quantity of Tesla cars coming that PC buyers can seriously blame them for Navi 2x scarcity though.

Mopetar · Jan 29, 2021

Hitman928 said:
Do you have a link to tests overclocking the memory showing good scaling?

It wasn't an article, but some user posts here or in other forms. I don't have a link to them offhand unfortunately.

moinmoin said:
Isn't it more like Ampere scales rather badly below 4K where Turing used to do relatively better?

It's probably more of a question of what a particular card was designed for and how it performs relative to that. Having enough shaders to handle 4K at max settings and high FPS rarely scales well downward because the work can't easily be split across the shaders and so many sit idle.

Even if you had a really slick engine that could split the work up really well and keep the additional shaders fed, Amdahl's law kicks in and caps the scaling you could expect to see.

But even if you could get around that or had something capable of perfect parallelism on the GPU the game eventually becomes CPU bound and you can't scale because of that.

The point is that a card is designed for a certain segment and that's where it's going to perform best. Moving up or down segments will pit it against cards designed for those other specific segments and better situated to compete in them.

Glo. · Jan 29, 2021

TESKATLIPOKA said:
How likely is that in Tesla car will be an OC version of N23? I think pretty low, Instead I would expect the clock speed to be more conservative and 2450MHz(10TFlops) is not unless desktop will see even higher clocks.
If nothing else then N23 could be very interesting in mobile, If you don't mind only 8GB Vram and most likely worse RT performance.
Another thing is If you will be able to buy one for a reasonable price.

The GPU has to be able to be fit in 90W, laptop thermal envelopes, and that includes both: GPU and VRAM power draw.

gdansk · Jan 29, 2021

Seems like overkill but maybe there were sick of bad frame rates when drawing the interface. But the battery life is a near non-concern. Those cars have massive batteries. Simply don't use it if it does worry you, the Navis idle efficiently. Based on the rumored clock speed Navi 22/23 might be small monsters.

Hitman928 · Jan 29, 2021

Mopetar said:
It wasn't an article, but some user posts here or in other forms. I don't have a link to them offhand unfortunately.

It's probably more of a question of what a particular card was designed for and how it performs relative to that. Having enough shaders to handle 4K at max settings and high FPS rarely scales well downward because the work can't easily be split across the shaders and so many sit idle.

Even if you had a really slick engine that could split the work up really well and keep the additional shaders fed, Amdahl's law kicks in and caps the scaling you could expect to see.

But even if you could get around that or had something capable of perfect parallelism on the GPU the game eventually becomes CPU bound and you can't scale because of that.

The point is that a card is designed for a certain segment and that's where it's going to perform best. Moving up or down segments will pit it against cards designed for those other specific segments and better situated to compete in them.

The 6800XT has a much higher effective memory bandwidth than even the 3090 so it's hard for me to believe that it would benefit all that much from a memory overclock. In terms of scaling, it's more about the balance in the design. GCN designs were very compute heavy compared to their geometry throughput capabilities which meant they needed high resolutions to really shine. The same here is true with Ampere, it is a very compute heavy design and so it becomes more inefficient as the resolution decreases and needs high resolutions to show the strength of the architecture. RDNA2 is a more balanced design and scales better across resolution. Of course, if all you care about is 4K gaming, no big deal, but some people buy high end cards for 1080p high framerate gaming and there, Ampere stumbles a bit respective of the competitive tiers.

TESKATLIPOKA · Jan 30, 2021

Glo. said:
The GPU has to be able to be fit in 90W, laptop thermal envelopes, and that includes both: GPU and VRAM power draw.

That's not a problem, they just need to lower clocks and voltage to get under 90W. Even N21 can be undervolted and underclocked to be within 90W, but the performance will take a very big hit.
BTW I don't think that 10TFlops N23 in Tesla is a mobile version, It has too high clocks for that.

insertcarehere · Jan 30, 2021

Glo. said:
The GPU has to be able to be fit in 90W, laptop thermal envelopes, and that includes both: GPU and VRAM power draw.

The GPU could be 90W or 290W TBP and it wouldn't matter for Tesla. 100kwh battery packs and the power delivery to deliver up to 500+kw of power to the motors without blowing up means the infotainment power/heat is a rounding error either way.

moinmoin said:
Yeah, it's purely for infotainment. What's odd is that it apparently is still using some Atom SoC on the CPU side, talk about unbalanced system.

Don't think there's such a huge quantity of Tesla cars coming that PC buyers can seriously blame them for Navi 2x scarcity though.

Not of the Model S by itself no, but it's very likely that this infotainment system trickles down to the other Tesla models. ~130k cars (GPUs) per quarter isn't nothing with how tight supply is projected to be.

beginner99 · Jan 30, 2021

insertcarehere said:
ut if nothing else is better than the cookie-cutter infotainment we get from other brands.

I drive a 2012 car. It play music from sd card, usb or ipod or bluetooth. Only gripe I really have that it doesn't support flac. And maybe it's a bit slow to start. But that's about it.

Oh, and it has pyhscial buttons and not effing capactive ones or worse touchscreens which in some countries are actually forbidden to operate during driving. The rate car usability is goign down in terms of lack of buttons, I will probably keep mine for quiet some time till this trend is fixed.

Question Speculation: RDNA2 + CDNA Architectures thread

Platinum Member

Platinum Member

Golden Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Platinum Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Senior member

Diamond Member