Question Speculation: RDNA2 + CDNA Architectures thread

uzzi38 · Apr 28, 2020

All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html

DJinPrime · Sep 18, 2020

eek2121 said:
3) The RX 5700 XT already had a high TDP and AMD is done playing that game.

So, not scalable at higher speed or core counts or both then. It was a small chip on 7nm from tsmc, I don't remember hearing yield issues.

Kenmitch said:
Navi was by design targeting the mid-range performance level. Simple economics as greater sales are to be had in the lower to mid-range vs the higher end market. Putting Big Navi on the back burner doesn't mean they couldn't have made it earlier if it made sense at the time to do so. It was in AMD's best interest to use their resources and wafer supply for their other offerings and commitments at that time.

Also not scalable then, since it was only designed for mid range performance. If you believe that, then it means there's something about the design that doesn't work at high performance. But that's not how historically GPU have been design. All the lower tiers are basically the big chip with clusters removed, there's not a whole lot of difference in design. Whatever is their biggest chip is their best chip. Where the performance lands is what it is.
Let's do some simple math, including all the cost of the card, 5700 ended up selling for $400. So, you think that basically having the same board and maybe add another 4GB of memory and have a 50% bigger chip that sells for $800-$1000 wouldn't have bigger margins? And it's not like you're wasting the chip, bin the bad ones down to 5700 and 5500. If that's really how it is, no wonder NV is on top, AMD is not even trying. I prefer to think that AMD tried but could not scale up for some technical reason. No shame there, sometime things just didn't work the way you expected and the next version will be better. Also, I hope you're wrong about the wafer availability, cause there's tons more pressure now at 7nm. New xbox, ps5, zen 3, new iphones. If they have the same mentality, how many big Navi will we even be made?
Why bother even, with the price of 3070 and 3080, might as well not even make any big Navi. /s

Hitman928 · Sep 18, 2020

DJinPrime said:
So, not scalable at higher speed or core counts or both then. It was a small chip on 7nm from tsmc, I don't remember hearing yield issues.

Also not scalable then, since it was only designed for mid range performance. If you believe that, then it means there's something about the design that doesn't work at high performance. But that's not how historically GPU have been design. All the lower tiers are basically the big chip with clusters removed, there's not a whole lot of difference in design. Whatever is their biggest chip is their best chip. Where the performance lands is what it is.
Let's do some simple math, including all the cost of the card, 5700 ended up selling for $400. So, you think that basically having the same board and maybe add another 4GB of memory and have a 50% bigger chip that sells for $800-$1000 wouldn't have bigger margins? And it's not like you're wasting the chip, bin the bad ones down to 5700 and 5500. If that's really how it is, no wonder NV is on top, AMD is not even trying. I prefer to think that AMD tried but could not scale up for some technical reason. No shame there, sometime things just didn't work the way you expected and the next version will be better. Also, I hope you're wrong about the wafer availability, cause there's tons more pressure now at 7nm. New xbox, ps5, zen 3, new iphones. If they have the same mentality, how many big Navi will we even be made?
Why bother even, with the price of 3070 and 3080, might as well not even make any big Navi. /s

GPUs are AMD's least profitable 7nm product. Priority was given to more profitable product lines both in terms of development funding and wafer supply. AMD's GPU group had been stripped down to a bare bones operation to give more funding to the Zen team and the graphics team are only recently seeing significant increases in funding. Without the required funding, AMD decided to focus on the higher volume segments to try and stay relevant in regards to market share. Now that they have more funding they are expanding to a full product lineup. That's it, it's not that complicated.

Ongoing wafer availability is a concern and no one knows how many wafers AMD has purchased/is purchasing for RDNA2. It might be tight for a bit with Zen3, RDNA2, and consoles all launching at the same time, but with mobile chips migrating to 5 nm, that should free up a lot of space for the medium to long term until AMD makes the move to 5 nm as well.

Olikan · Sep 18, 2020

Krteq said:
Hmm, is he referring to this patent?

ADAPTIVE CACHE RECONFIGURATION VIA CLUSTERING

That's quite a big change to cache subsytem

There is crazy impressive research from AMD, that pretty much IS this patent....

https://t.co/nZopFRUt9V?amp=1

Just a quote:
"We extensively evaluate our proposal across 28 GPGPU applications. Our dynamic scheme boosts performance by 22% (up to 52%) and energy efficiency by 49% for the applications that exhibit high data replication and cache sensitivity without degrading the performance of the other applications. This is achieved at a modest
area overhead of 0.09 mm2/core."

Hitman928 · Sep 18, 2020

Olikan said:
There is crazy impressive research from AMD, that pretty much IS this patent....

https://t.co/nZopFRUt9V?amp=1

Just a quote:
"We extensively evaluate our proposal across 28 GPGPU applications. Our dynamic scheme boosts performance by 22% (up to 52%) and energy efficiency by 49% for the applications that exhibit high data replication and cache sensitivity without degrading the performance of the other applications. This is achieved at a modest
area overhead of 0.09 mm2/core."

I'm assuming by core they mean compute unit? If it's stream processor, that modest overhead is not so modest, lol.

eek2121 · Sep 18, 2020

Hitman928 said:
GPUs are AMD's least profitable 7nm product. Priority was given to more profitable product lines both in terms of development funding and wafer supply. AMD's GPU group had been stripped down to a bare bones operation to give more funding to the Zen team and the graphics team are only recently seeing significant increases in funding. Without the required funding, AMD decided to focus on the higher volume segments to try and stay relevant in regards to market share. Now that they have more funding they are expanding to a full product lineup. That's it, it's not that complicated.

Ongoing wafer availability is a concern and no one knows how many wafers AMD has purchased/is purchasing for RDNA2. It might be tight for a bit with Zen3, RDNA2, and consoles all launching at the same time, but with mobile chips migrating to 5 nm, that should free up a lot of space for the medium to long term until AMD makes the move to 5 nm as well.

I would ignore him, he is just trolling people.

Krteq · Sep 18, 2020

Olikan said:
There is crazy impressive research from AMD, that pretty much IS this patent....

https://t.co/nZopFRUt9V?amp=1

Just a quote:
"We extensively evaluate our proposal across 28 GPGPU applications. Our dynamic scheme boosts performance by 22% (up to 52%) and energy efficiency by 49% for the applications that exhibit high data replication and cache sensitivity without degrading the performance of the other applications. This is achieved at a modest
area overhead of 0.09 mm2/core."

Nice

But after some research I assume both patents are more related to CDNA then RDNA

Olikan · Sep 18, 2020

Hitman928 said:
I'm assuming by core them mean compute unit? If it's stream processor, that modest overhead is not so modest, lol.

Yes, they used a 28CU GPU at 1.4Ghz...
Read the paper, is mind blowing

Hitman928 · Sep 18, 2020

Olikan said:
Yes, they used a 28CU GPU at 1.4Ghz...
Read the paper, is mind blowing

I'll try to over the weekend, don't have time to really digest a technical paper right now. Thanks for the link.

Olikan · Sep 18, 2020

Krteq said:
Nice

But after some research I assume both patents are more related to CDNA then RDNA

Maybe... The paper is all about how friendly the code is to data sharing with the L1 cache... dunno how games behave.

A worst case is 2% performance increase...

blckgrffn · Sep 18, 2020

eek2121 said:
Do you really consider it a loss? I paid $800 for my 1080ti and I hope to retire it this year, but it won’t be a loss at all for me. I got my money’s worth out of it, and I can’t even sell it (it is going in another PC).

That's fair to ask - but if I could sell today and get nearly $400 and then in a five weeks there is such a bountiful crop of AMD Navi cards that I lose ~$200 on resale then that seems like lost money to me. Like deciding when to sell a stock...

And I've got a years use out of this thing, so I could look it at is as ~no cost per month of usage (sell now) or ~$20 per month (sell post launch) OR just pass it down to my son like I intended to and like you just get years of functional use out of it.

If used GPU prices hadn't been so crazy last year I probably would have tried to find a Vega 56 or something to nurse myself into RDNA2. I was so close to buying a Fury Nano on eBay for ~$105 shipped - I am kind of annoyed I didn't because of how niche that card was (I put in an offer for $100 and he countered at $105 and I let it expire)

A/// · Sep 18, 2020

Anyone else passing the time by reading the insane theories online?

Markfw · Sep 18, 2020

A/// said:
Anyone else passing the time by reading the insane theories online?

I am getting bored about
1) Hearing about what AMD is going to do with their next video cards...
2) hearing about what Intel is GOING to do with anything !
3) Hearing about what AMD is doing with Zen3....
4) hearing about when we might be able to buy a 3000 series Ampere.

Come on ! I want something to discuss !!! 3000 is faster, but you can't buy it, so that leaves.....

NOTHING !!

reb0rn · Sep 18, 2020

I must say there is so many misinformation, no one can even speculate on memory bandwith to start from it
like 16GB can only be 256/512bit or HBM2
and 12GB is 384bit

if its just 256bit i can`t see being any if at all faster then 3070

A/// · Sep 18, 2020

Markfw said:
I am getting bored about
1) Hearing about what AMD is going to do with their next video cards...
2) hearing about what Intel is GOING to do with anything !
3) Hearing about what AMD is doing with Zen3....
4) hearing about when we might be able to buy a 3000 series Ampere.

Come on ! I want something to discuss !!! 3000 is faster, but you can't buy it, so that leaves.....

NOTHING !!

I personally love, maybe even loathe the threads on other sites of what AMD should do. How AMD can counter bots. How AMD has failed already based on leaks, namely the 256 bit bus engineering sample card, or that they should sell RTG off to NVidia.

Saylick · Sep 18, 2020

Found this on Reddit. 20% IPC gains incoming?

https://adwaitjog.github.io/docs/pdf/sharedl1-pact20.pdf

Abstract:

Graphics Processing Units (GPUs) concurrently execute thousands of threads, which makes them effective for achieving high through-put for a wide range of applications. However, the memory wall often limits peak throughput. GPUs use caches to address this limitation, and hence several prior works have focused on improving cachehit rates, which in turn can improve throughput for memory intensive applications. However, almost all of the prior works assume a conventional cache hierarchy where each GPU core has a private local L1 cache and all cores share the L2 cache. Our analysis shows that this canonical organization does not allow optimal utilization of caches because the private nature of L1 caches allows multiple copies of the same cache line to get replicated across cores.
We introduce a new shared L1 cache organization, where all ccores collectively cache a single copy of the data at only one location (core), leading to zero data replication. We achieve this by allowing each core to cache only a non-overlapping slice of the entire address range. Such a design is useful for significantly improving the collective L1 hit rates but incurs latency overheads from additional communications when a core requests data not allowed to be present in its own cache. While many workloads can tolerate this additional latency, several workloads show performance sensitivities. Therefore, we develop lightweight communication optimization techniques and a run-time mechanism that considers the latency-tolerance characteristics of applications to decide which applications should execute in private versus shared L1 cache organization and reconfigures the caches accordingly. In effect, we achieve significant performance and energy efficiency improvements, at a modest hardware cost, for applications that prefer the shared organization, with little to no impact on other applications.

GodisanAtheist · Sep 18, 2020

Chugga chugga choo choo!!!

blckgrffn · Sep 19, 2020

A/// said:
Anyone else passing the time by reading the insane theories online?

And by writing needless replies to people that have differing opinions and 0% likelihood of changing said opinions? Yup.

blckgrffn · Sep 19, 2020

reb0rn said:
I must say there is so many misinformation, no one can even speculate on memory bandwith to start from it
like 16GB can only be 256/512bit or HBM2
and 12GB is 384bit

if its just 256bit i can`t see being any if at all faster then 3070

Fine. I promise that Big Navi is Hawaii reborn with a 512 bit bus. And infinity cache. Pinky swear.

Don’t ask for sources because I don’t have any. I just can’t let this train slow down.

If only I had a YouTube channel where I got paid per view 🤔

DiogoDX · Sep 19, 2020

reb0rn said:
I must say there is so many misinformation, no one can even speculate on memory bandwith to start from it
like 16GB can only be 256/512bit or HBM2
and 12GB is 384bit

if its just 256bit i can`t see being any if at all faster then 3070

12GB can be 192bits too.

eek2121 · Sep 19, 2020

blckgrffn said:
That's fair to ask - but if I could sell today and get nearly $400 and then in a five weeks there is such a bountiful crop of AMD Navi cards that I lose ~$200 on resale then that seems like lost money to me. Like deciding when to sell a stock...

And I've got a years use out of this thing, so I could look it at is as ~no cost per month of usage (sell now) or ~$20 per month (sell post launch) OR just pass it down to my son like I intended to and like you just get years of functional use out of it.

If used GPU prices hadn't been so crazy last year I probably would have tried to find a Vega 56 or something to nurse myself into RDNA2. I was so close to buying a Fury Nano on eBay for ~$105 shipped - I am kind of annoyed I didn't because of how niche that card was (I put in an offer for $100 and he countered at $105 and I let it expire)

Maybe if you consider GPUs an investment? I usually end up giving them away. I routinely rebuild PCs and give them to family, friends, and those less fortunate (not necessarily in that order). To me it's a sunk cost. A part of my hobby.

reb0rn said:
I must say there is so many misinformation, no one can even speculate on memory bandwith to start from it
like 16GB can only be 256/512bit or HBM2
and 12GB is 384bit

if its just 256bit i can`t see being any if at all faster then 3070

What's funny is that the bus size can actually be any size. Most people don't realize this, but yes, it's possible to have 16gb of GDDR6 and a 352 or 384 bit bus. There are a number of ways to do this (though to be fair they aren't used as far as I'm aware). I'll leave it to your imagination to figure this out.

Saylick said:
Found this on Reddit. 20% IPC gains incoming?

https://adwaitjog.github.io/docs/pdf/sharedl1-pact20.pdf

Abstract:

It is my understanding that the actual "IPC" (in quotes because can one really use the term 'IPC'?) of the architecture, including everything (rendering, shaders, etc.) is closer to 7%. We will see, however. My information is based mostly on console related stuff. I've seen numerous rumors and leaks that indicate that PC RDNA2 parts are at least somewhat different from console parts, but I'm not sure those changes will help "IPC". AMD is going to reach performance by scaling CU count upwards. An "IPC" increase isn't needed. It's just icing on the cake. Coincidentally, a 50% perf/watt increase would allow them to have a 72CU part run at the same TDP and same clocks as the RX 5700 XT. Food for thought. Assuming that they are able to scale up performance with CU count, well...

I know some people here may not understand the concept of AMD delivering solid execution, but they've been literally "executing" Intel. Anyone that claims they can't do the same thing to NVIDIA should stop posting here and short AMD stock.

EDIT: As an addendum to why "IPC" isn't really valid for GPUs, the "TFLOPs" measurement is the closest you'd get to IPC, which as you can see is wildly abused (NVIDIA claims double FP32 TFLOPs with the 3080 over the 2080ti, yet as we've witnessed, it performs 20-30% faster). Once you start factoring in geometry, textures, clocks, shaders, etc. all bets are off.

EDIT 2: As an example of why IPC can't really be measured. Vega64 has 12.66 TFLOPs of compute power, or nearly 30% more than the RX 5700 XT. However, you'll note that the RX 5700 XT beats the Vega 64 soundly in gaming. No, AMD isn't making up the TFLOPs number. Vega has really strong compute performance, but not so great gaming performance.

Tup3x · Sep 19, 2020

Well, new Xbox does have this weird memory layout. It might work for it since CPU will mainly use the slower pool but I'm not so sure how well it would work for top end discreet GPU. GTX 550 Ti also had similar thing.

Krteq · Sep 19, 2020

Saylick said:
Found this on Reddit. 20% IPC gains incoming?

https://adwaitjog.github.io/docs/pdf/sharedl1-pact20.pdf

Abstract:

That's exactly the same document like Olikan posted yesterday. Read previous posts pls
That document is describing perf. gains related to heavy compute scenarios rather then rendering

Olikan · Sep 19, 2020

Saylick said:
Found this on Reddit. 20% IPC gains incoming?

https://adwaitjog.github.io/docs/pdf/sharedl1-pact20.pdf

Abstract:

It the same paper that i posted , i repeat, read it ppl, it's mind blowing...

pandemonium · Sep 19, 2020

It really is.

If I'm understanding what they're laying out in theory, they want to basically AI the entire pipeline from the start, by task.

Given their wide range of compute tests they used, I can see this having an impact on real-time rendering. Like DLSS improving vastly over a generation, this could have broad ramifications for how efficiently GPGPUs handle their tasks.

GodisanAtheist · Sep 19, 2020

pandemonium said:
It really is.

If I'm understanding what they're laying out in theory, they want to basically AI the entire pipeline from the start, by task.

Given their wide range of compute tests they used, I can see this having an impact on real-time rendering. Like DLSS improving vastly over a generation, this could have broad ramifications for how efficiently GPGPUs handle their tasks.

- But is it going to be ready for a top to bottom RDNA2 stack? This type of radical technological shift looks like it would be a prime candidate for a pipe cleaner product or a mid gen refresh, not a top to bottom stack launch.

Wonder if this is the kind of thing that's being kept in the pipe for an RDNA3 launch or even further down the line.

After all the promises of the new pathways and discard accelerators etc in Vega and Polaris I would be less surprised if AMD managed to bork the physical design so the feature is useless than not.

Question Speculation: RDNA2 + CDNA Architectures thread

Platinum Member

Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Senior member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Moderator Emeritus, Elite Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Golden Member

Senior member

Platinum Member

Golden Member

Diamond Member