Question Speculation: RDNA2 + CDNA Architectures thread

Page 17 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

CastleBravo

Member
Dec 6, 2019
119
271
96
I think AMD knew from the outset that RDNA 1 was just a stepping stone (which they have mentioned) and with limited 7nm capacity (at that time) they went with the mainstream market, which significantly outsells the high end market.

If your main limitation is fab capacity, you would be better off prioritizing high margin products over high volume products.
 
Reactions: beginner99

JPB

Diamond Member
Jul 4, 2005
4,064
89
91
AMD's next-gen RDNA 2 'major leap forward' up to 225% faster than RDNA

The new rumors for AMD Big Navi, or RDNA 2 aka NVIDIA Killer have it up to 225% faster than RDNA which powers Radeon RX 5700 XT.

AMD's next-gen RDNA 2 'major leap forward' up to 225% faster than RDNA

The new rumors for AMD Big Navi, or RDNA 2 aka NVIDIA Killer have it up to 225% faster than RDNA which powers Radeon RX 5700 XT.

A bunch of hot RDNA 2 information dropped today from Tom on Moore's Law is Dead, with his sources telling him that AMD's next-gen Big Navi will offer up to 225% more performance over RDNA... the GPU that powered the Radeon RX 5700 XT.

According to these sources estimates are that RDNA is a "major leap forward for them 195% to 225% of the current available cards. But their internal estimates are still projecting the performance per dollar cards up to the upper end of the range".

Another tease is that the "top RDNA 2" would feature 72 compute units, which is where the leaks on the flagship RDNA 2 graphics card being 40-50% faster than NVIDIA's current-gen flagship GeForce RTX 2080 Ti graphics card.

AMD uses 40 compute units in the Navi 10 GPU that powers the Radeon RX 5700 XT, so in raw compute units alone the upgrade to 72 CUs on the flagship RDNA 2 card is impressive. On top of that, we're looking at an estimated 7% improvement in IPC performance, with these numbers possibly being "much higher" than this.

Mix into that higher GPU clocks of somewhere in the 20-25% range and we could have a card that wouldn't just beat NVIDIA's best GeForce RTX graphics card right now -- that would be the GeForce RTX 2080 Ti -- but it could throw some serious blows at NVIDIA's next-gen Ampere GeForce RTX 3000 series graphics cards.

Latest Big Navi / RDNA 2 highlights:

RDNA 2 = 40-50% faster than GeForce RTX 2080 Ti: We have rumored performance improvements inside of RDNA 2 that would make it 40-50% faster than the GeForce RTX 2080 Ti. This is impressive, as the flagship Radeon RX 5700 XT is nowhere near the RTX 2080 Ti in raw performance.

RDNA 2 = 195-225% faster than RDNA aka RX 5700 XT: RDNA 2 being a huge 195-225% faster than RDNA means that it will be a truly kick ass 4K gaming card, as well as being able to pump away at 1080p and 1440p resolutions at 120/144/165FPS much easier than the Radeon RX 5700 XT.

7% IPC improvement: AMD is bound to have at least 6-7% IPC improvement in RDNA 2 over RDNA, I would say 10% and above, but lowering estimates now will only improve what we think when AMD unleashes in November 2020.

NVIDIA Killer? Maybe: If the rumors are true, then like what Jason Momoa said when he was asked if he thought the Snyder Cut of Justice League (now known and confirmed as Zack Snyder's Justice League releasing in 2021 as a HBO Max exclusive) then I have the following answer to the question is Big Navi an NVIDIA Killer: f*** yeah.

November 2020 launch: This is the key. November gives AMD a couple of months to see how Ampere does, and launch it when the next-gen consoles are launching in the PlayStation 5 and Xbox Series X.

Big Navi / RDNA 2 launching in November key points

November 2020 launch: Firstly, AMD aiming for November means it puts the next-gen Radeon a couple of months behind NVIDIA is the rumors (and my sources) are true about an "August 2020 launch, September 2020 release" for the next-gen Ampere-based GeForce RTX 3000 series graphics cards.

How AMD benefits from a November launch: It will allow the company to have a couple of months to see where NVIDIA has priced its next-gen cards, and to gauge how performance is once it hits the hands of customers. AMD would then have a couple of months to be ready for a possible KO blow (at least for 2020) with Big Navi.

Thanksgiving launch: This is big for obvious reasons, but AMD being able to be inside of the next-gen Xbox Series X, the next-gen PlayStation 5, and have an NVIDIA Killer with Big Navi -- all for thanksgiving? What better gift can AMD bring to the table for Thanksgiving than that?!

Launching alongside PlayStation 5 and Xbox Series X: The stars have truly aligned for AMD to make this happen, and if performance of Big Navi is truly double that of RDNA1 and a 40-50% boost over the GeForce RTX 2080 Ti? Well, we could see AMD take out the end of 2020 with Intel and NVIDIA destroying products. Amazing.

Lisa Su did say Big Navi was coming in 'late 2020': but November is pretty damn spot on with "late 2020", right?!
 
Last edited:

soresu

Diamond Member
Dec 19, 2014
3,206
2,474
136
19 Tflops? That would be for example 72CU at 2.05GHz, that is doable.
I have to wonder, If RDNA2 is really so much better than RDNA1 and because of that AMD didn't bother to make a bigger RDNA1 chip to combat 2080 Ti or It was because of some limitation in RDNA1.
The 19 TFLOPS figure was my estimate for non ML FP32 general compute on CDNA/Arcturus.
I think AMD knew from the outset that RDNA 1 was just a stepping stone (which they have mentioned) and with limited 7nm capacity (at that time) they went with the mainstream market, which significantly outsells the high end market.
I think it made both a convenient commercial stepping stone to relieve Vega in the markets and get the ball rolling on RDNA wave32 based drivers.

As I've theorised before, I think it also served as the major 'proof of concept' from which the 2 next gen consoles graphics uArch customisation phases diverged.

Obviously more features like RT were added since then, but the basic cut and thrust of the new uArch was in there - a rough hewn stone if you will.
 
Reactions: FaaR and Tlh97

FaaR

Golden Member
Dec 28, 2007
1,056
412
136
I doesn't make any sense to me to clock FP64 cores so low. If they want 9.5 TFLOPs in FP64, then It's much better to clock FP64 cores as the rest of the chip and save a lot of space having less FP64 cores in the GPU.
AMD has traditionally not had separate 64-bit (or 16-bit either, for that matter) cores. If they're doing it now in CDNA, it's a first time ever for them. You sure that's what's actually going on here?
 

soresu

Diamond Member
Dec 19, 2014
3,206
2,474
136
AMD has traditionally not had separate 64-bit (or 16-bit either, for that matter) cores. If they're doing it now in CDNA, it's a first time ever for them. You sure that's what's actually going on here?
Definitely not, the FP16 and FP32 figures suggest ML specific ops getting a boost from new hardware (matrix FMA?) specially designed for that purpose.

It's not so much separate FP32/16 cores as separate ML focused logic to complement general compute logic across all types in the uArch.

Presumably it's embedded in each CU, but I wouldn't go taking anything for granted until we know exactly what CDNA is, as we have not had an RDNA style uArch rundown of it yet as we did a couple of months before Navi 10 launched*.

The FP64 figure lines up with a conservative server/workstation clockspeed for 120 CU's running at half rate as expected.

*I would expect a decent rundown of both CDNA and RDNA2 at more or less the same time if they are both launching this year.
 
Last edited:

soresu

Diamond Member
Dec 19, 2014
3,206
2,474
136
Now there are dedicated Ray Intersection Units in each CU
Weren't they in there already, or was that texture unit implementation separate from the CU?
//New "Primitive Shader" patent
Optimising an optimisation is good - moar polys, moar Stephanie!

Seriously though this combined with the UE5 tech could mean some incredible increases in game polycounts during the next decade.
 

FaaR

Golden Member
Dec 28, 2007
1,056
412
136
Optimising an optimisation is good - moar polys, moar Stephanie!

Seriously though this combined with the UE5 tech could mean some incredible increases in game polycounts during the next decade.
It's basically only worth a damn though if it can be utilized invisibly to developers. Vega had a couple enhancements that ended up completely abandoned by AMD because it would have required specific software support, increasing software development requirements.

Let's say AMD gets UE5 support for their Gee Whiz Thingamajig. OK, great, but how many games out of all the games released until forever will be on UE? A very small minority. Versus if the GWT can operate completely independently, it would have a chance to benefit basically everything, completely automatically. It would probably be more effective if you do things a certain way, but chances are most software would see at least some benefit.

So the less time AMD spends researching and developing stuff that would only work if you code specifically for it, the better for everyone in the end - AMD included I would say!
 

maddie

Diamond Member
Jul 18, 2010
4,879
4,951
136
It's basically only worth a damn though if it can be utilized invisibly to developers. Vega had a couple enhancements that ended up completely abandoned by AMD because it would have required specific software support, increasing software development requirements.

Let's say AMD gets UE5 support for their Gee Whiz Thingamajig. OK, great, but how many games out of all the games released until forever will be on UE? A very small minority. Versus if the GWT can operate completely independently, it would have a chance to benefit basically everything, completely automatically. It would probably be more effective if you do things a certain way, but chances are most software would see at least some benefit.

So the less time AMD spends researching and developing stuff that would only work if you code specifically for it, the better for everyone in the end - AMD included I would say!
Isn't every console game and game developer going to use these? Vega never was in the wider market.
 

FaaR

Golden Member
Dec 28, 2007
1,056
412
136
Isn't every console game and game developer going to use these?
Only on the new consoles, which have an installed user base of 0 so far. On PC, NV and Intel have a big numerical advantage and this will remain so for years to come even if AMD was to become market dominant tomorrow.
 

soresu

Diamond Member
Dec 19, 2014
3,206
2,474
136
Let's say AMD gets UE5 support for their Gee Whiz Thingamajig. OK, great, but how many games out of all the games released until forever will be on UE? A very small minority.
I don't think you realise just how many games are made on UE4 currently.

It's a lot, and I would expect the number of games released with UE5 for the coming console generation to be at least as high if not higher.
Vega had a couple enhancements that ended up completely abandoned by AMD because it would have required specific software support, increasing software development requirements.
I was under the impression that the initial 16nm Vega iteration was not exactly a well finished article where those new features were concerned.
 

FaaR

Golden Member
Dec 28, 2007
1,056
412
136
I was under the impression that the initial 16nm Vega iteration was not exactly a well finished article where those new features were concerned.
It's quite possible they had bugs, but as I recall AMD denied that at the time and said the features were functional and that driver support for them was being worked on. Well, that changed later, as we all know.

Sad, really. As a Vega owner myself it would have been fun to know what the GPU was really capable of if the full feature set was enabled and working...
 

Geranium

Member
Apr 22, 2020
83
101
61
Perhaps, but nothing more than a pinprick compared to the impact of their previous low PC market share and what it would look like if they lost their console deals.
Pinrpick? Then why AMD giving 24CU Navi12 only Apple? And we got RX 5500 and RX 5500 XT with same 22CU.
 

Geranium

Member
Apr 22, 2020
83
101
61
Those numbers are really weird.

9.5 FP64 TFLOPS, 42 FP32 TFLOPS and 150 FP16 TFLOPS.

The 150 FP16 TFOPS number makes sense from tensor/matrix logic, but FP32 is insane at 42 TFLOPS.

I can only assume that ML focused HW augments FP32 numbers too for ML work.

The 9.5 FP64 TFLOPS makes perfect sense though - you only need 1.16 Ghz to reach that number at half rate with 128 CU's in old GCN reckoning.
Something is off with the FP numbers. AMD usually have 1/2 , 1/4, 1/8 ,1/16 FP64 rating for their gpu. 9.5T FP64 is 1/4.42 of FP32 rate, which is very unlikely.
Or another possibility MI100 will downclock when running FP64 operations to maintain TBP and thermals.
 

soresu

Diamond Member
Dec 19, 2014
3,206
2,474
136
Pinrpick? Then why AMD giving 24CU Navi12 only Apple? And we got RX 5500 and RX 5500 XT with same 22CU.
I though Navi 12 was 40 CU? As in basically fixed errata Navi 10 + HBM.

It performs closer to 24 CU because the clocks are running much lower than Navi 10 for low power operation.

The answer to your question is that Apple paid for the thing presumably.

Just to note I meant a pinprick loss RELATIVE to other factors, like their previously low dGPU market share and any possible loss of their console deals which shift far more units.
 

soresu

Diamond Member
Dec 19, 2014
3,206
2,474
136
Something is off with the FP numbers. AMD usually have 1/2 , 1/4, 1/8 ,1/16 FP64 rating for their gpu. 9.5T FP64 is 1/4.42 of FP32 rate, which is very unlikely.
Or another possibility MI100 will downclock when running FP64 operations to maintain TBP and thermals.
I've now said this multiple times!

The different figures relative to FP64 for FP32/16 are from new ML op focused hardware which performs better for ML tasks - I don't expect those high numbers to be reflected on general compute tasks that do not fit that use case.

The FP64 number fits comfortably for a 120 CU server/workstation part designed to be run as one of up to 8 in a rack, so lower clockspeeds as with EPYC are to be expected.

Otherwise you are implying that a 120 CU part is running at what, 2.734 Ghz?

You must realise how ridiculous that sounds on any variant of 7nm node.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |