Question Speculation: RDNA2 + CDNA Architectures thread

uzzi38 · Apr 28, 2020

All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html

DisEnchantment · Jul 14, 2020

Quite interesting(for me) patch by AMD

[PATCH] drm/amdgpu: load ta firmware for sienna cichlid

call psp_int_ta_microcode() to parse the ta firmware.

RAS firmware is added to Sienna. This likely means Sienna will end up being a Pro card as well. (So there could be a Radeon VII Pro like card derived from Sienna, 5700 Pro cards had no RAS features that I know of. Or not enabled at least.)

MrTeal · Jul 14, 2020

In addition to hot climates without AC, ambient is also local to the console. Even if your room is 25°C or 20°C, if the console is in an entertainment center pumping out 200W it's very possible its intake air is 35°C plus, especially if there's other gear in there.

Veradun · Jul 14, 2020

DisEnchantment said:
New patches to support Navy Flounder

[PATCH 00/42] Navy Flounder support

So far has all features of Sienna.
- Has lesser SDMA engines than Sienna (same like Navi10).
- Has only one VCN instance
- GFX 1032/~~N23~~ N22

Seems like Sienna will be the top part.

EDIT:
Seems like it is N22 according to @KOMACHI_ENSAKA

You lost me here, what's an SDMA?

soresu · Jul 14, 2020

Veradun said:
You lost me here, what's an SDMA?

According to Phoronix "SDMA in this context is short for System DMA (Direct Memory Access) and is a new asynchronous DMA engine originally added to Radeon GPUs with GCN 1.1 Sea Islands. "

What exactly that means in the context of performance I have no idea.

Geranium · Jul 20, 2020

soresu said:
I fail to see the point you are making.

Financing for RDNA iteration R&D need not have anything to do with CDNA development directions - quite the opposite in fact, which was my point, CDNA frees them from uArch design dependency on the likes of Sony and MS.

Besides which, the Apple part of their custom division only accounts for a small fraction of their Sony/MS business, and likely did not even exist in the time of RDNA's early development - the fact that Navi 12 is a more errata free iteration on the same RDNA1 uArch in Navi 10 and 14 supports this.

Given Apple are breaking away from contracting AMD it was and is clearly a good direction to be going in - doubly so if there is any lack of certainty over mid term console refreshes, let alone another full console generation after PS5/XSX.

Though returning to your "keeping afloat" point, to be brutally accurate, the entire Radeon/RTG division first kept the CPU division afloat during the Bulldozer mess - back then consoles were not the only driving earner for AMD during early the GCN era, before either intrinsic uArch shortfalls or iteration R&D mismanagement broke its scalability.

The pendulum then swung around during the late Volcanic Islands to Polaris/Vega timeframe, when the newly rebounding CPU group under Zen was keeping the GPU group afloat, in combination with semi custom deals from various sources*.

*Including mid term and next gen Sony/MS console SoC's, Subor, Apple, and the Hygon licensing deal.

AMD sell quite good number of GPU's to the Apple. If not why Apple always gets full die of AMD GPU first since 2012/13.
#. R7 260X had full Bonaire, while R7 360 cut down version and full die went to Apple.
#. Full die Polaris11 was only available to Apple for the first year.
#. Most of first Vega shipment went to Apple as Pro Vega 64 and Pro Vega 48
#. Full Vega20 die is only available to only Apple and datacenter while consumer only got the 60CU version. Even Pro VII is only 60CU.
#. Full Navi14 die is only available to only Apple version of Pro RX 5500M and Pro 5500
Company don't do business like this with their small partner. Company do this kind of business where good money is made.

And Navi12 is not a custom GPU. Vega12 was not either. Custom one has own names compared to general name like Navi1x/Vega1x . Navi12 may not even an Apple exclusive either, it just happens to be that Apple is the only one who use it, just like Vega12 GPU.

Apple breakout will hurt AMD's gpu division surely if AMD dont increase their sell on Windows side of the market.

soresu · Jul 20, 2020

Geranium said:
Apple breakout will hurt AMD's gpu division surely if AMD dont increase their sell on Windows side of the market.

Perhaps, but nothing more than a pinprick compared to the impact of their previous low PC market share and what it would look like if they lost their console deals.

soresu · Jul 20, 2020

Geranium said:
And Navi12 is not a custom GPU. Vega12 was not either. Custom one has own names compared to general name like Navi1x/Vega1x . Navi12 may not even an Apple exclusive either, it just happens to be that Apple is the only one who use it, just like Vega12 GPU.

It is custom in so much as it is a different stepping with more fixed uArch errata from Navi 10, and it also has HBM controllers instead of GDDR6 - which is not a trivial amount of silicon to change out.

Vega 12 however is completely custom made for Apple.

There is no equivalent chip like it in any other line up that I have seen - it is not half of any larger GPU, it is not an exact double of the Raven Ridge GPU.

It is just a chip entirely by itself in design, and the only other discrete Vega uArch GPU apart from Vega 10 on the 16/14/12nm processes.

We can speculate that perhaps V12 was originally intended to be a twin brother to Vega 11 (which AMD has stated is not RR GPU), but speculation is all we have on its origins as AMD have been pretty tight lipped about it as things go.

Konan · Jul 21, 2020

Sharing if not seen yet. Can google translate.
Chiphell post from leaker wjm47196 regarding RDNA2. Can search his name for trusted track record if needed too Was correct on Polaris 30 (Radeon RX 590) launching in Q4 2018, Radeon VII in Q1 2019 and 7nm Navi mainstream cards arriving before the high-end enthusiast-grade variants in 2019 and lastly AMD unveiling RDNA 2 at CES earlier this year - so quite accurate)

Summary -

All the current RDNA2 rumors are fake and this is because..
AMD has not even finished designing the PCB yet, the toolkits for manufacturing are also not ready. Rumours claiming X times of performance are apparently BS
Due to Covid, the AMD engineers from US/CAN are needed and can't travel
However, Q4 launch for RDNA2 is still on
Don't expect AIB RDNA2 boards for launch | AMD will be sending PCB design resources to AIB in the next 2 weeks.
GPU validation sample was sent to Shanghai (apparently different from final mass producing PCB) for driver development (AMD's GPU driver is coded in AMD Shanghai?)
wjm47196 also said the Flagship RDNA2 has 16GB VRAM (and with the famous leaker saying this - it could even mean HBM2 could be back on the table now) and that Ampere will launch first in September

DiogoDX · Jul 21, 2020

Konan said:
Sharing if not seen yet. Can google translate.
Chiphell post from leaker wjm47196 regarding RDNA2. Can search his name for trusted track record if needed too Was correct on Polaris 30 (Radeon RX 590) launching in Q4 2018, Radeon VII in Q1 2019 and 7nm Navi mainstream cards arriving before the high-end enthusiast-grade variants in 2019 and lastly AMD unveiling RDNA 2 at CES earlier this year - so quite accurate)

Summary -

All the current RDNA2 rumors are fake and this is because..

AMD has not even finished designing the PCB yet, the toolkits for manufacturing are also not ready. Rumours claiming X times of performance are apparently BS

Due to Covid, the AMD engineers from US/CAN are needed and can't travel

However, Q4 launch for RDNA2 is still on

Don't expect AIB RDNA2 boards for launch | AMD will be sending PCB design resources to AIB in the next 2 weeks.

GPU validation sample was sent to Shanghai (apparently different from final mass producing PCB) for driver development (AMD's GPU driver is coded in AMD Shanghai?)

wjm47196 also said the Flagship RDNA2 has 16GB VRAM (and with the famous leaker saying this - it could even mean HBM2 could be back on the table now) and that Ampere will launch first in September

This makes much more sense and mirros the RTG in last few years. Late, no custom cards near launch and infant drivers. Lets see if at least the performance will be good and not another vega disaster.

uzzi38 · Jul 21, 2020

Konan said:
Sharing if not seen yet. Can google translate.
Chiphell post from leaker wjm47196 regarding RDNA2. Can search his name for trusted track record if needed too Was correct on Polaris 30 (Radeon RX 590) launching in Q4 2018, Radeon VII in Q1 2019 and 7nm Navi mainstream cards arriving before the high-end enthusiast-grade variants in 2019 and lastly AMD unveiling RDNA 2 at CES earlier this year - so quite accurate)

Summary -

All the current RDNA2 rumors are fake and this is because..

AMD has not even finished designing the PCB yet, the toolkits for manufacturing are also not ready. Rumours claiming X times of performance are apparently BS

Due to Covid, the AMD engineers from US/CAN are needed and can't travel

However, Q4 launch for RDNA2 is still on

Don't expect AIB RDNA2 boards for launch | AMD will be sending PCB design resources to AIB in the next 2 weeks.

GPU validation sample was sent to Shanghai (apparently different from final mass producing PCB) for driver development (AMD's GPU driver is coded in AMD Shanghai?)

wjm47196 also said the Flagship RDNA2 has 16GB VRAM (and with the famous leaker saying this - it could even mean HBM2 could be back on the table now) and that Ampere will launch first in September

I'm 100% certain portions of this are either wrong or not referring to Navi21.

The die size of which I am confident in.

uzzi38 · Jul 21, 2020

Speaking of which, this talks about one of them.

AMD Navi21 /Sienna Cichlid 情報近況　―― GDDR6と省電力とサーバ向けと【2020/07/21】 | Coelacanth's Dream

Index VRAM には GDDR6 を採用か AVFSモジュールと Graphics Power Optimizer Navi10 から倍近い数の AVFSモジュール GPO RAS機能と EEPROM をサポート VRAM には GDDR6 を採用か Sienna Cichlid は GPUメモリに、Navi10/14

www.coelacanth-dream.com

GDDR6, not HBM2. For consumer dies anyway.

Konan · Jul 21, 2020

uzzi38 said:
Speaking of which, this talks about one of them.

AMD Navi21 /Sienna Cichlid 情報近況　―― GDDR6と省電力とサーバ向けと【2020/07/21】 | Coelacanth's Dream

Index VRAM には GDDR6 を採用か AVFSモジュールと Graphics Power Optimizer Navi10 から倍近い数の AVFSモジュール GPO RAS機能と EEPROM をサポート VRAM には GDDR6 を採用か Sienna Cichlid は GPUメモリに、Navi10/14

www.coelacanth-dream.com

GDDR6, not HBM2. For consumer dies anyway.

I agree about GDDR6. I think that HBM2 is too expensive.

Die size seems reasonable too.

uzzi38 · Jul 22, 2020

Exclusive: AMD Radeon Instinct MI100 Specs, Performance, and Features

Technology Vision

adoredtv.com

Something actually worth discussing for once.

Also I said in another thread there's a possibility for RDNA2 on the 28th, well looks like there's not.

soresu · Jul 22, 2020

A quick Google translate from an associated webpage brings this text:

"Looking at the embedded patches (not found) without going through reviews on The amd-gfx Archives, Sienna Cichlid seems to allow connectivity with up to 4 GPUs."

4 GPU connectivity sounds like a heck of a lot for a company expecting very little from xfire / mgpu going forward.

uzzi38 · Jul 22, 2020

soresu said:
A quick Google translate from an associated webpage brings this text:

"Looking at the embedded patches (not found) without going through reviews on The amd-gfx Archives, Sienna Cichlid seems to allow connectivity with up to 4 GPUs."

4 GPU connectivity sounds like a heck of a lot for a company expecting very little from xfire / mgpu going forward.

It's worth remembering that Instinct from here on out will have 0 display capabilities.

AMD need RDNA to handle certain tasks, such as VDI. That's why you'll see mGPU support and IF bridges on RDNA2, even if AMD considers xFire dead.

DisEnchantment · Jul 22, 2020

soresu said:
A quick Google translate from an associated webpage brings this text:

"Looking at the embedded patches (not found) without going through reviews on The amd-gfx Archives, Sienna Cichlid seems to allow connectivity with up to 4 GPUs."

4 GPU connectivity sounds like a heck of a lot for a company expecting very little from xfire / mgpu going forward.

Sienna will be a Pro part as well like the VII Pro. It has SRIOV, enhanced RAS, and self diagnostics to inform user if a fault threshold is reached. And this is also used for RMA. It has non volatile storage to save faults detected by the self diagnosis and when a threshold is reached it will not initialize anymore unless forced.
If not for Games, other loads would benefit from the XGMI.

Update:
I find it awkward that we are replying to a forum post in quick sucession. Working from home really changed my browsing habits.
I went to office on Monday to try to work from Office as part of a gradual return to normalcy, but I now find it weird to work from Office.

soresu · Jul 22, 2020

uzzi38 said:
Exclusive: AMD Radeon Instinct MI100 Specs, Performance, and Features

Technology Vision

adoredtv.com

Something actually worth discussing for once.

Also I said in another thread there's a possibility for RDNA2 on the 28th, well looks like there's not.

Those numbers are really weird.

9.5 FP64 TFLOPS, 42 FP32 TFLOPS and 150 FP16 TFLOPS.

The 150 FP16 TFOPS number makes sense from tensor/matrix logic, but FP32 is insane at 42 TFLOPS.

I can only assume that ML focused HW augments FP32 numbers too for ML work.

The 9.5 FP64 TFLOPS makes perfect sense though - you only need 1.16 Ghz to reach that number at half rate with 128 CU's in old GCN reckoning.

DisEnchantment · Jul 22, 2020

uzzi38 said:
Exclusive: AMD Radeon Instinct MI100 Specs, Performance, and Features

Technology Vision

adoredtv.com

Something actually worth discussing for once.

Also I said in another thread there's a possibility for RDNA2 on the 28th, well looks like there's not.

Doubt.

def FeatureISAVersion9_0_8 : FeatureSet<
[FeatureGFX9,
HalfRate64Ops,

This is one of those exclusives compiled from open info. They just can't help themselves can they?
Ironic that they feel they need to confirm AMD's publicly shared information.

TESKATLIPOKA · Jul 22, 2020

soresu said:
Those numbers are really weird.
....
The 9.5 FP64 TFLOPS makes perfect sense though - you only need 1.16 Ghz to reach that number at half rate with 128 CU's in old GCN reckoning.

I doesn't make any sense to me to clock FP64 cores so low. If they want 9.5 TFLOPs in FP64, then It's much better to clock FP64 cores as the rest of the chip and save a lot of space having less FP64 cores in the GPU.

soresu · Jul 22, 2020

TESKATLIPOKA said:
I doesn't make any sense to me to clock FP64 cores so low. If they want 9.5 TFLOPs in FP64, then It's much better to clock FP64 cores as the rest of the chip and save a lot of space having less FP64 cores in the GPU.

It does if you are packing in 8 of them in a single rack.

Remember this is CDNA meant for servers, datacenters and some workstations.

Best not to think of it as a GPU at all anymore - absolute perf per card matters less than perf/watt per system or rack when you go to server level.

DisEnchantment · Jul 22, 2020

soresu said:
It does if you are packing in 8 of them in a single rack.

Remember this is CDNA meant for servers, datacenters and some workstations.

Best not to think of it as a GPU at all anymore - absolute perf per card matters less than perf/watt per system or rack when you go to server level.

I think he is talking about the half rate DPFP does not makes sense with the exclusive info. LLVM considers CDNA/Arcturus to support HalfRate64Ops
If a card can do 40+ FP32 TFLOPS, it will do the half DPFP i.e 20TF with the same FP32 clocks. Conversely, if it does 10TF DPFP it will do 20 TF FP32.
That is how MI50/60 work as well.
It would be strange to drop clocks by half of FP32 to get those DPFP values.
It is just that either LLVM is wrong or the exclusive info is made up stuff.

TESKATLIPOKA · Jul 22, 2020

soresu said:
It does if you are packing in 8 of them in a single rack.

Remember this is CDNA meant for servers, datacenters and some workstations.

Best not to think of it as a GPU at all anymore - absolute perf per card matters less than perf/watt per system or rack when you go to server level.

As DisEnchantment mentioned, I wasn't talking about perf/W.
If FP32 is 42 TFlops then FP64 shouldn't be 9.5 Tflops, but 21 TFlops If It's 1/2 or 10.5TFlops If It's 1/4, to get 9.5 Tflops you would need to have different clock speed and that doesn't make a lot of sense.

soresu · Jul 22, 2020

TESKATLIPOKA said:
As DisEnchantment mentioned, I wasn't talking about perf/W.
If FP32 is 42 TFlops then FP64 shouldn't be 9.5 Tflops, but 21 TFlops If It's 1/2 or 10.5TFlops If It's 1/4, to get 9.5 Tflops you would need to have different clock speed and that doesn't make a lot of sense.

They may be quoting figures for FP32 operations that favour ML workloads specifically - likely operations accelerated by the new matrix/tensor logic.

I would expect 19 TFLOPS for truly general purpose workloads - unless they have doubled the per CU FP32 compute capacity somehow.

Edit: nVidia's recent Ampere announcement was being creative on their figures apparently*, AMD may have just decided fair is fair if nVidia are going to play games.

*something to do with sparse numbers or the like - I think it's similar to geometry culling, except with unnecessary tensor computation and then quoting a performance figure that acts as if those operations were computed anyway.

I have heard speculation that nVidia's RT gigaray figures are similarly inflated based on how Turing performs with denoising and DLSS as if it is handling far more rays per second than it actually is.

TESKATLIPOKA · Jul 22, 2020

Everything is possible.
19 Tflops? That would be for example 72CU at 2.05GHz, that is doable.
I have to wonder, If RDNA2 is really so much better than RDNA1 and because of that AMD didn't bother to make a bigger RDNA1 chip to combat 2080 Ti or It was because of some limitation in RDNA1.

Stuka87 · Jul 22, 2020

TESKATLIPOKA said:
Everything is possible.
19 Tflops? That would be for example 72CU at 2.05GHz, that is doable.
I have to wonder, If RDNA2 is really so much better than RDNA1 and because of that AMD didn't bother to make a bigger RDNA1 chip to combat 2080 Ti or It was because of some limitation in RDNA1.

I think AMD knew from the outset that RDNA 1 was just a stepping stone (which they have mentioned) and with limited 7nm capacity (at that time) they went with the mainstream market, which significantly outsells the high end market.

Question Speculation: RDNA2 + CDNA Architectures thread

Platinum Member

Golden Member

Diamond Member

Senior member

Diamond Member

Member

Diamond Member

Diamond Member

Senior member

Senior member

Platinum Member

Platinum Member

Senior member

Platinum Member

Diamond Member

Platinum Member

Golden Member

Diamond Member

Golden Member

Platinum Member

Diamond Member

Golden Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member