Question Speculation: RDNA2 + CDNA Architectures thread

Page 75 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,703
6,405
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,689
1,224
136
5529x4435 = 251 mm2
532x2948 = 16.054 mm2 <= 8 CUs
8 CUs * 5 = 80.268 mm2\251 mm2 = 3.127x 40 CUs to rest of chip.
^-- Navi 10

580x434 = 197.05 mm2
60x359 = 16.861818687 mm2 <= 8 CUs
8 CUs * 10 = 168.618186874 mm2
^-- Van Gogh/Mero
w/ Navi10 rate => 527.2691114 mm2

1797x1323 = 360.4 mm2
1433x156 = 33.888133536 <= 16 CUs
16 CUs * 5 = 169.440667679 * 3.127x
^-- Arden
w/ Navi10 rate => 529.840967832 mm2

>505 mm2 probably if DUV.
476.856871052 mm2 for 7nm EUV.
7nm+ is inline with 505 with added components

Min bound: 190.742748421(Alchips) Upper bound: 317.904580701(Marvel-more likely) mm2 for 5nm EUV.
5nm is inline with a Tahiti/Tonga/Vega20 die size => 352 mm2/359 mm2/331 mm2

Tested it against the other Navi die => 160.9416868 mm2 which isn't far from its actual 158 mm2 die size. However, with Navi10 selections it is 150 mm2.

Which for 5nm it is inline with Arcturius's 400 ~ 450 mm2 guess for 128 CUs.

Q2 2017 = 7nm risk production // ~90 masks <- 2nd fastest ramp
Q3 2018 = 7nm+ risk production // ~80 masks <- No ramp
Q1 2019 = 5nm risk production // ~70 masks <- Fastest ramp, highest yield.

Tue May 15 14:59:32 UTC 2018 => drm/amdgpu: Add vega20 pci ids
April 2017(v1.0 pdk) to May 2018 => 13 months <== N7
Mon Jun 17 19:26:04 UTC 2019
=> 26 months <== However, it is N7P which launched July 2019. With most tapeouts occurring: "alternatively there is N7P - an improved N7. TSMC had already announced the tape-out of an unknown chip in October 2018." - in regards to N7P and A13. - 11. Februar 2019, 11:00 Uhr

Tue Sep 15 18:24:09 UTC 2020 => drm/amdgpu: add device ID for sienna_cichlid (v2)
June 2018(v1.0 pdk) to September 2020 => 27 months <== N7+
March 2019(v1.0 pdk) to September 2020 => 18 months // to June 2020 => 15 months <== N5
¯\_(ツ)_/¯

If 7nm/7nm+, they are off schedule for 5nm GPUs.
 
Last edited:

Mopetar

Diamond Member
Jan 31, 2011
8,011
6,454
136
Except this time none of what we know about the upcoming consoles would make sense if there weren't some significant improvement in the RDNA arch.

I was mainly talking about some of the very Specific examples that get brought up in these theads.

AMD is still making little improvements all the time and a huge uplift isn't hard to expect for a second generation card. There's always a lot of low hanging fruit to pick on newer architectures.

NVidia got a lot of RT uplift from improving their design instead of just throwing more hardware at the problem. There's no reason to think AMD couldn't develop better implementations for existing tech they already use. Not everything has to be due to some brand new technologies we've only just uncovered patents for.
 

jamescox

Senior member
Nov 11, 2009
642
1,104
136
That's the thing. Ever since I spotted Andrei's post I've been mulling over it. Makes sense for static data or raytrace material. Isn't so useful for constant switch of live data. But even then, if you take that out the then rumored 256 bit bus makes even less sense. Why would you, if that was the big Navi, cut it off at the kneecaps. My only suggestive reasoning is because RTG was assigned a slew of Zen engineers to help or takeover. This could all also be a smoke screen by AMD. Those were heat sensitive labels on that leak photo we saw. This whole charade is bizarre.

I find it a little bizarre that we've barely heard anything about either of AMD's upcoming products other than what they willingly have said, and yet we're supposed to believe that someone at AMD's overseas development centers leaked a photo of their flagship card that's coming out in a month?

Or those photos Jason from Jayz made a video on... If it's a mockup, it's done in-house. Even the renders officially released are a bit of a stretch.

I haven’t really read into this that much from the GPU side. My interest is almost entirely about how it could be used In Epyc. A lot of places have a vendor lock-in because their entire code base is in CUDA. CPUs can actually be switched much more easily; my work might be getting an Epyc test system soon with Nvidia gpu though. Nvidia using Epyc in their DGX A100 systems was, I think, a big wake-up call. Intel just does not have a suitable solution for Nvidia DGX without a lot of compromise. They don’t have PCI-e 4.0 and it probably would have taken 4 Intel Xeons to connect all of the GPUs. Going up to a 4 cpu board is generally avoided. The board size and expense just gets out of hand. With Epyc, Nvidia gets very fast access to 16 DDR4 channels with dual processors.

For this cache rumor, it may be a lot more useful for rasterization than people expect. Modern GPUs have switched to essentially tile based architectures. Nvidia seems to have switched with maxwell. For AMD it was Vega with their draw stream binning rasterizer:


If infinity cache allows them to cache a lot more of the data needed for a bin (tiles essentially) in addition to the probably large on die caches, then it could increase performance significantly. It may allow them to use larger bins (tiles) which would be more efficient also; less overhead. It would reduce the external memory bandwidth required, so 256-bit GDDR6 may be plenty.

AMD has done a lot of work adding CPU style virtual memory management for GPUs. This allows much more efficient use of memory since allocations do not need to be contiguous. They can satisfy an allocation as long as they have enough pages available and those pages can be mapped anywhere in physical memory. The ability to swap pages out to cpu memory (GPU memory viewed as cache) may be useful if an application uses more gpu memory than available. Most applications are going to try to load everything they *might* need into gpu memory. A lot of that will not actually be used at any given time. In the link above, only about half the allocated memory was used. A page based system can just allocate a virtual address range and then only use pages when something is actually copied into that memory. If nothing is copied into it, then a real Page is never created. If it isn’t accessed, then it can be pushed out to secondary storage if there more demand for gpu memory than can be satisfied.

It will be interesting to see how this rumored cache would work. I would expect it will be some size cache line rather than caching whole pages, but I don’t know how big AMD gpu pages are. CPU memory is usually 32-byte cache lines and 4KB pages. I have seen cases where the 4KB pages hurt performance significantly, but the newer 2MB pages seem to still not be supported as well as they could be under Linux. I would think that a 32-byte cache line would be rather small for a gpu, although some types of things may have limited locality. Even for CPUs, a 32-byte line might be getting a bit small. With AVX256, 32 bytes is just one operand and FMA takes 3 operands. A larger size may be indicated. When these things move to being on an interposer, they may be transferring 1024-bits (128 bytes) per clock or more, so they may want to move to a larger than 32-byte line.

Given AMDs‘s work with optimizing the memory system, they may be able to make much better use of the cache hierarchy, both on chip and off. They may be using some of the same cache designs or ideas across cpu and gpu caches, so some of that knowledge may come from their cpu designs and cache coherent infinity architecture. Nvidia wants their own CPU architecture such that they can offer a complete solution like Intel and AMD are offering for some of the next generation super computers, hence the purchase of ARM. Now I am wondering if part of the ARM purchase is to get IP and engineers with virtual memory management experience. I know Nvidia has some features for letting the system automatically manage memory, but I don’t know how that compares to AMD’s virtual memory management. I thought the Nvidia solution was a driver level thing rather than hardware level virtual MMU.

We may not have seen much about this since the cache chip would probably be on the gpu package rather than on the board. You would, at a minimum, need to pull the cooler. With multiple chips in the package, they may have put a lid to protect the die or handle slightly different height chips. Removing the heat sink may not tell you anything.
 

Gideon

Golden Member
Nov 27, 2007
1,714
3,937
136
Navy Flounder has 40 CUs and 192 bit bus.

Its that 12 GB GPU from that Rogame's twitt regarding the VRAM buffers.
Wait, that's Navi 22 (the GPU previously rumored to have 60CUs) and the chip that should presumably go against GA104?

If that's the case, I Highly doubt Navi 21 doubles the CUs from that.

EDIT:
never mind
  • Sienna Cichlid: 2 SEs x 2 SHs x 10 CUs = 40 total CUs
  • Navy Flounder: 4 SEs x 2 SHs x 10 CUs = 80 total CUs
 
Last edited:
Reactions: Tlh97 and Elfear

Devilek

Junior Member
Sep 10, 2020
1
0
6
Seems legit -> this Navi 22 (Navy Flounder) will be released next year right? Because rumors told us something about Navi 21 = 2020, everything else 2021... On the other hand, 12GB card could be released in 2020 right? (I am confused now...)
 

Viking Warrior

Junior Member
Aug 25, 2020
4
3
41
I come across a diagram of Big Navi showing the die in great detail.I would like opinions of these numbers.

4 shader engines 0-3,10 WGP's per 20 CU's
80 dual CU 160 TMU 64 ROP's ALU's 5120

Ray tracing accelerator built into Texure processor

L0 cache 16kb x 160 2560kb
L1 cache 128kb x 8 1024kb
L2 cache 256kb x 24 6144kb

RAM 2 X 16 Bit x 12
24 GB GDDR6
384 bit bus 768 GBs
 
Last edited:

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
Seems legit -> this Navi 22 (Navy Flounder) will be released next year right? Because rumors told us something about Navi 21 = 2020, everything else 2021... On the other hand, 12GB card could be released in 2020 right? (I am confused now...)

All we know is that Lisa said RDNA2 would launch this year. One could assume its at minimum Navi20, but we will find out next month.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,429
2,914
136
I come across a diagram of Big Navi showing the die in great detail.I would like opinions of these numbers.

4 shader engines 0-3,10 WGP's per 20 CU's
80 dual CU 160 TMU 64 ROP's ALU's 5120

Ray tracing accelerator built into Texure processor

L0 cache 16kb x 160 2560kb
L1 cache 128kb x 8 1024kb
L2 cache 256kb x 24 6144kb

RAM 2 X 16 Bit x 12
24 GB GDDR6
384 bit bus 768 GBs
The number of TMUs and ROps is the same as RX5700XT so what you wrote is wrong. Maybe you could provide a link to that diagram so we can have a look.
 

Viking Warrior

Junior Member
Aug 25, 2020
4
3
41
Yeah, I would also like to see that diagram
Here's the crazy thing about this,there are 160 CU's on this die, 80 x 2 in dual configuration,each CU has 1 TMU,which means a ratio of 1:1,this chip should have at least 320 TMU's and 128 ROP's.I'm definitely missing something.I'm going to try and track down the origins of the diagram,if it is fake someone went to a lot of trouble because the work is as good as it gets.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,429
2,914
136
Here's the crazy thing about this,there are 160 CU's on this die, 80 x 2 in dual configuration,each CU has 1 TMU,which means a ratio of 1:1,this chip should have at least 320 TMU's and 128 ROP's.I'm definitely missing something.I'm going to try and track down the origins of the diagram,if it is fake someone went to a lot of trouble because the work is as good as it gets.
It's enough If you you just provide that diagram.
 

Konan

Senior member
Jul 28, 2017
360
291
106
The 40 CU/192 bit GPU is this year.

So this is Navy Flounder = reported on several times as Navi22 which has already been said to be @ 340mm2 40CU and 192bit therefore technically (with 14-16,000GHz 12GB GDDR6) should only reach a 2080S maximum. Between 2070S and 2080S.
 

Glo.

Diamond Member
Apr 25, 2015
5,763
4,667
136
So this is Navy Flounder = reported on several times as Navi22 which has already been said to be @ 340mm2 40CU and 192bit therefore technically (with 14-16,000GHz 12GB GDDR6) should only reach a 2080S maximum. Between 2070S and 2080S.
Nope.

Three biggest dies are 500, 340, 240 mm2. If 40 CU die is 340 mm2 with 40 CUs, then the largest die cannot be 500 mm2 with 80 CUs.

That 40 CU/192 bit die is 240 mm2 one.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,689
1,224
136
I can see 505 mm2 and 340 mm2 for Arcturus and Sienna Cichlid if they were 5nm pitched.
240 mm2 for Navy Flounder is obviously 7nm.
Navi23 = 24 CUs, similar size to Navi14.
Navi24 = 10 CUs, reduced die size from Polaris 550.

Navi21 = 256-bit
Navi22 = 192-bit (-64-bit)
Navi23 = 128-bit (-64-bit) // Same as Van Gogh/Mero and Xbox Series S. 1032=1033=1040
Navi24 = 64-bit (-64-bit)

(240 * 2) x 0.7 [Average pitch(SerDes/SRAM/Logic)] => 336 mm2
 
Last edited:
Reactions: Konan and Summerfun

eek2121

Diamond Member
Aug 2, 2005
3,051
4,276
136
So this is Navy Flounder = reported on several times as Navi22 which has already been said to be @ 340mm2 40CU and 192bit therefore technically (with 14-16,000GHz 12GB GDDR6) should only reach a 2080S maximum. Between 2070S and 2080S.

The leaked specs are very likely from the RX 6600 XT.
 

leoneazzurro

Golden Member
Jul 26, 2016
1,010
1,608
136
So this is Navy Flounder = reported on several times as Navi22 which has already been said to be @ 340mm2 40CU and 192bit therefore technically (with 14-16,000GHz 12GB GDDR6) should only reach a 2080S maximum. Between 2070S and 2080S.

It was supposed that N22 was Navy flounder. There is no official info about that.
 
Reactions: Konan

eek2121

Diamond Member
Aug 2, 2005
3,051
4,276
136
So, I am going to risk some speculation here based on available information. I am not all that confident, but we will see.
  1. RX 6500 XT - 36 CUs - small die - $199? - ~2070S perf
  2. RX 6600 XT - 40 CUs - small die - $299? - ~2080S perf
  3. RX 6700 XT - 52 CUs - med die - $399? - a bit faster than the 3070.
  4. RX 6800 XT - 56 CUs - med die - $499? Behind the 3080, well ahead of the 3070.
  5. RX 6900 XT - 72CUs - large die - $599? a bit ahead of the 3080. Around 40%-50% faster than the 2080ti. 275W TDP?
  6. RX 6950 XT - 80CUs - large die (launches later?) - 3090 equiv. Likely $799-$999 depending on yields. 24gb?
Non XT parts slot in between. The 80CU part will not launch with the rest due to the extra time needed for binning. It doesn’t matter, however, as perf gains won’t be big.

On most gaming workloads, AMD will win or at least be tied. On very specific workloads that can take advantage of NVIDIA’s new core configuration, NVIDIA will win.

Rasterization only. No idea about RT.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
So, I am going to risk some speculation here based on available information. I am not all that confident, but we will see.
  1. RX 6500 XT - 36 CUs - small die - $199? - ~2070S perf
  2. RX 6600 XT - 40 CUs - small die - $299? - ~2080S perf
  3. RX 6700 XT - 52 CUs - med die - $399? - a bit faster than the 3070.
  4. RX 6800 XT - 56 CUs - med die - $499? Behind the 3080, well ahead of the 3070.
  5. RX 6900 XT - 72CUs - large die - $599? a bit ahead of the 3080. Around 40%-50% faster than the 2080ti. 275W TDP?
  6. RX 6950 XT - 80CUs - large die (launches later?) - 3090 equiv. Likely $799-$999 depending on yields. 24gb?
Non XT parts slot in between. The 80CU part will not launch with the rest due to the extra time needed for binning. It doesn’t matter, however, as perf gains won’t be big.

On most gaming workloads, AMD will win or at least be tied. On very specific workloads that can take advantage of NVIDIA’s new core configuration, NVIDIA will win.

Rasterization only. No idea about RT.

How 40 CUs , same amount as RX5700XT will reach 2080 Super performance that is 20% faster today ??
52 CUs will not get close to 2080Ti/3070, perhaps close to 2080 Super/3060
72 CUs will not get 40-50% faster vs 2080Ti

Hell the entire stack is way off
 
Reactions: Konan

Glo.

Diamond Member
Apr 25, 2015
5,763
4,667
136
So, I am going to risk some speculation here based on available information. I am not all that confident, but we will see.
  1. RX 6500 XT - 36 CUs - small die - $199? - ~2070S perf
  2. RX 6600 XT - 40 CUs - small die - $299? - ~2080S perf
  3. RX 6700 XT - 52 CUs - med die - $399? - a bit faster than the 3070.
  4. RX 6800 XT - 56 CUs - med die - $499? Behind the 3080, well ahead of the 3070.
  5. RX 6900 XT - 72CUs - large die - $599? a bit ahead of the 3080. Around 40%-50% faster than the 2080ti. 275W TDP?
  6. RX 6950 XT - 80CUs - large die (launches later?) - 3090 equiv. Likely $799-$999 depending on yields. 24gb?
Non XT parts slot in between. The 80CU part will not launch with the rest due to the extra time needed for binning. It doesn’t matter, however, as perf gains won’t be big.

On most gaming workloads, AMD will win or at least be tied. On very specific workloads that can take advantage of NVIDIA’s new core configuration, NVIDIA will win.

Rasterization only. No idea about RT.
Nope.
 
Reactions: kurosaki
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |