Question Speculation: RDNA2 + CDNA Architectures thread

Page 19 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
edited this post, because it is not so meaningful in light of recent Mesa updates.

It seems Sienna and Navy are both GFX1030. Sienna = HBM and Navy = GDDR6. Both Sienna and Navy are Big Navi.
At least it seems both are similar (i.e. Polaris 12/11/VegaM)


C:
    case CHIP_RENOIR:
        return "gfx909";
    case CHIP_ARCTURUS:
        return "gfx908";
    case CHIP_NAVI10:
        return "gfx1010";
    case CHIP_NAVI12:
        return "gfx1011";
    case CHIP_NAVI14:
        return "gfx1012";
    case CHIP_SIENNA_CICHLID:
    case CHIP_NAVY_FLOUNDER:
        return "gfx1030";

So it is untrue that Navy Flounder is Navi22/GFX1031 as reported here
 
Last edited:
Reactions: Mopetar

TESKATLIPOKA

Platinum Member
May 1, 2020
2,507
2,991
136
2 renoir sized CCX's are pretty small and remember a regular GPU has a 16x PCI interface, the consoles if they have a south bridge would likely only need 12 and none of the other misc IO.

edit: here is ps4pro , not much "uncore " about
2 Renoir CCX's are ~40-50mm2 and GDDR6 PHY and memory controller is much bigger than HBM2 PHY, just look at Navi10 vs Navi12. Here is also a dieshot of Renoir here
So there is no reason why a 72CU GPU should be much bigger than Xbox with 56CU. Keep in mind that CUs are very small 2.1mm2 for RDNA1.
 
Last edited:
Reactions: Tlh97

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
Hmmm... no wonder code names and commit messages, commit cherry picking were so obscure and often misleading.

That's the goal - to get upstream support in place before launch without anyone knowing what happened

On the plus side, ROCm will land for Navi1x/2x.
 

DiogoDX

Senior member
Oct 11, 2012
747
279
136
Meaning? Is it only the Ti models considered as high end?
You can call the top of the performance as you like.

The terrible Vega was more that 15% faster than the 980Ti. 0 to 15% faster that the 2080Ti will be another disaster unless is a reative small chip like 400mm tops. For the rumorerd 505mm chip will be a joke.
 

Zstream

Diamond Member
Oct 24, 2005
3,395
277
136
You can call the top of the performance as you like.

The terrible Vega was more that 15% faster than the 980Ti. 0 to 15% faster that the 2080Ti will be another disaster unless is a reative small chip like 400mm tops. For the rumorerd 505mm chip will be a joke.

If it's cheaper, why complain? If you think people are going to continue to pay for high GPU prices in the midst of C19, and other world factors, you have another thing coming. The console is finally pretty good in the graphics department.
 
Reactions: Tlh97 and maddie

GodisanAtheist

Diamond Member
Nov 16, 2006
7,146
7,638
136
If it's cheaper, why complain? If you think people are going to continue to pay for high GPU prices in the midst of C19, and other world factors, you have another thing coming. The console is finally pretty good in the graphics department.

I think it would be a problem for a few reasons:

- Its not a sustainable business model in this market to pump out inferior products and sell them for narrower margins than the competition. AMD being perpetually behind NV means they exit the discreet GPU space or become irrelevant in due time.

- Lower margins means less money to put into the intangibles, like additional software features that are more and more separating the wheat from the chaff.

- Not having the halo product in this space means that your competition will effectively always be able to match your performance and likely have more room to play around with price, which comes back to the first point.

- Most distressing, IMO, is it means AMD doesn't have the mental muscle or will to catch NV even if NV makes a mistep (if all the rumors surrounding the node drama are true).
 
Reactions: Tlh97

Saylick

Diamond Member
Sep 10, 2012
3,509
7,766
136
Coreteks seems to think Big Navi isn't really that big. Supposedly only 15% faster than 2080 Ti in AMD-optimized titles:
I'm a little skeptical... Only 15% faster, and that's with AMD optimized titles?

According to ComputerBase, the 2080 Ti FE is on average 53% faster than the 5700XT at 4K across a wide variety of games. If we assume 72 CUs and similar clocks, Big Navi should be 80% faster than a 5700XT, which makes it ~18% faster than the 2080 Ti. If you add in IPC gains or the chance that there's actually 80 CUs, not 72, then it ought to be closer to 30% faster on average.

https://www.computerbase.de/thema/grafikkarte/rangliste/#diagramm-performancerating-3840-2160
 
Reactions: Tlh97

Geranium

Member
Apr 22, 2020
83
101
61
Does It?
Isn't the new Xbox SoC just 360mm^2? It has 56CU RDNA2 GPU and 8Cores + 320bit GDDR6 memory bus, south bridge and so on.
You cant measure dedicated GPU die size from Custom APU.
Xbox's APU dont have many unit that the dedicated GPU will have. Xbox's APU has less sophisticated display engine, no PCI-e root complex, probably no FP64 unit, less complex encode/decode engine, memory controller without ECC support, also probably less cache.
 

Geranium

Member
Apr 22, 2020
83
101
61
I'm a little skeptical... Only 15% faster, and that's with AMD optimized titles?

According to ComputerBase, the 2080 Ti FE is on average 53% faster than the 5700XT at 4K across a wide variety of games. If we assume 72 CUs and similar clocks, Big Navi should be 80% faster than a 5700XT, which makes it ~18% faster than the 2080 Ti. If you add in IPC gains or the chance that there's actually 80 CUs, not 72, then it ought to be closer to 30% faster on average.

https://www.computerbase.de/thema/grafikkarte/rangliste/#diagramm-performancerating-3840-2160
53% is best case senerio for RTX 2080 Ti. On TechPowerUp list 2080 Ti is only 34% faster at 1080p, 42% at 1440p and only 49% on 2160p. But for high resolution 2080 Ti has more Sadder, ROP and Bandwidth than RX 5700 XT.
 

DiogoDX

Senior member
Oct 11, 2012
747
279
136
People will get good price on GPU's if people keep buying Nvidia.
I had a 5970 and 7970. Only miss the 290X that was the last AMD good high end card and that was freaking 2013.

Then 980Ti and 1080Ti. If AMD lauch the 3rd turd in a row (Fiji, Vega) who buys high end will continue to only have Nvidia as option.
 

Qwertilot

Golden Member
Nov 28, 2013
1,604
257
126
I'm a little skeptical... Only 15% faster, and that's with AMD optimized titles?

According to ComputerBase, the 2080 Ti FE is on average 53% faster than the 5700XT at 4K across a wide variety of games. If we assume 72 CUs and similar clocks, Big Navi should be 80% faster than a 5700XT, which makes it ~18% faster than the 2080 Ti. If you add in IPC gains or the chance that there's actually 80 CUs, not 72, then it ought to be closer to 30% faster on average.

https://www.computerbase.de/thema/grafikkarte/rangliste/#diagramm-performancerating-3840-2160

The 5700xt is clocked way past its efficient point. Scale it up sizewise and the TDP very quickly starts to limit your performance. You're much more likely to scale up at 5700 style clocks.

Even with the marketing,so a priori dubious, 50% efficiency gain claim.

We'll see.
 

Geranium

Member
Apr 22, 2020
83
101
61
I had a 5970 and 7970. Only miss the 290X that was the last AMD good high end card and that was freaking 2013.

Then 980Ti and 1080Ti. If AMD lauch the 3rd turd in a row (Fiji, Vega) who buys high end will continue to only have Nvidia as option.
Are you telling that GTX 980 Ti was a turd also cause it was only 2-5% faster than Fury X.
Only GTX 1080 Ti was 30% faster than Vega 64 but was also expensive than Vega 64. And Vega 64 was so turd that Apple used those on their expensie iMac.
 
Last edited:

Geranium

Member
Apr 22, 2020
83
101
61
edited this post, because it is not so meaningful in light of recent Mesa updates.

It seems Sienna and Navy are both GFX1030. Sienna = HBM and Navy = GDDR6. Both Sienna and Navy are Big Navi.
At least it seems both are similar (i.e. Polaris 12/11/VegaM)


C:
    case CHIP_RENOIR:
        return "gfx909";
    case CHIP_ARCTURUS:
        return "gfx908";
    case CHIP_NAVI10:
        return "gfx1010";
    case CHIP_NAVI12:
        return "gfx1011";
    case CHIP_NAVI14:
        return "gfx1012";
    case CHIP_SIENNA_CICHLID:
    case CHIP_NAVY_FLOUNDER:
        return "gfx1030";

So it is untrue that Navy Flounder is Navi22/GFX1031 as reported here
Not surprising. Apple will use those, so HBM is must.

^ 2 module in 2048-bit configuration can give 921.6 GBps of bandwidth, which would be enough for "Big Navi".
 

Kedas

Senior member
Dec 6, 2018
355
339
136
CDNA MI100 seems to be 46TFLOPS for FP16.
46 vs 30 about +50%

That is faster than the A100 of nvidia for SGEMM and with a smaller die.
But I'm not sure if this score of the A100 is correct.
According to the specs the A100 should be faster. (also has more transistors)

MI60 https://www.techpowerup.com/gpu-specs/radeon-instinct-mi60.c3233
MI100 https://www.techpowerup.com/gpu-specs/amd-mi100.g927
A100 https://www.techpowerup.com/gpu-specs/a100-pcie.c3623
 
Last edited:
Reactions: Tlh97 and Olikan

SpaceBeer

Senior member
Apr 2, 2016
307
100
116
I don't get this one
If MI100 is 2,4x faster in FP32 and 2x slower in FP16,where in A100 FP16 is 4x FP32, it would mean MI100 FP32 is faster than its FP16? Ok, slide says "up to", but still it would mean FP32=FP16 in TFLOPS?
I mean, MI50/60 have FP16:FP32 as 2:1, so FP16 in MI100 should be at least the same as in A100. Similar is for FP64.

I mean this is CDNA, this chip should be optmized for Compute, it makes no sense to have lower FP64 and FP16 performance than MI60

Edit:
So I just found this one from few days ago
Here they say it has 9,5 TFLOPS FP64 (same as A100) and 150 TFLOPS FP16 (2x more than A100)
So what am I missing?
 
Last edited:

soresu

Diamond Member
Dec 19, 2014
3,206
2,474
136
150 TFLOPS FP16 (2x more than A100)
Eh? That's not right, that number is half the A100 FP16 (312 TFLOPs) more like. Link.

Though I'm not sure if the A100 number doesn't come from the sparse AI tensor ops peak perf:



If so and the 150 FP16 Arcturus figure is accurate then CDNA1 and Ampere may well be close on certain counts.

With all these numbers floating around (hahaha pun), we will probably go mad speculating on their meaning until the official AMD presentation and slides give us both accurate numbers and an explanation to fit them.

Given that they just reaffirmed Zen3, RDNA2 and CDNA1 are all coming out this year, it's likely a huge mega announcement is coming that covers it all at once with deep dives to follow
 

soresu

Diamond Member
Dec 19, 2014
3,206
2,474
136
Something to bear in mind, AMD made the very odd move of talking about CDNA2 before CDNA1 products had even been announced, and I don't think that this was simply because of the future HPC/supercomputer contract(s) that they had recently won with CDNA2.

It seems that much as with RDNA1 we will get a not so dramatically impressive move forward to the new compute accelerator (not GPU) paradigm with CDNA1 - but that the xDNA2 generations will be the real intended demonstrators for AMD's new diverged accelerator uArch strategy.
 

SpaceBeer

Senior member
Apr 2, 2016
307
100
116
Eh? That's not right, that number is half the A100 FP16 (312 TFLOPs) more like. Link.
But that is from Tensor cores, and IIUC, can be used only for DL/NN applications, not for general computing. Or even if it could, does it require additional code change or nVidia's hardware/software handles it automatically?
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
But that is from Tensor cores, and IIUC, can be used only for DL/NN applications, not for general computing. Or even if it could, does it require additional code change or nVidia's hardware/software handles it automatically?

Tensor cores can only perform specific types of matrix math. They can not be used for anything outside of that.
 
Reactions: Tlh97 and soresu

soresu

Diamond Member
Dec 19, 2014
3,206
2,474
136
But that is from Tensor cores, and IIUC, can be used only for DL/NN applications, not for general computing.
Given that, if you go off the A100 FP64 figure of 9.7 TFLOPS x4 (39.2 TFLOPS) then both GPU's should be within about 800 GFLOPS of each other for FP16 general compute use cases (Arcturus being 38 TFLOPS) - assuming those shader cores even do general compute for FP16 in A100 that is.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |