Question Speculation: RDNA2 + CDNA Architectures thread

uzzi38 · Apr 28, 2020

All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html

DDH · Sep 3, 2020

maddie said:
This is an even stronger case for more than 80CU in a ~500mm^2 GPU with HBM controllers. 1 CU = 3.2mm^2.

Either we get more than 80 CU (96 or even more) OR Big Navi is much less than 500mm^2.

Yeah I was wondering this myself. Maybe there is big Navi, and then bigger Navi

Because remember, the mi100 has 128CUs with 120 active

Also, 1 CU is smaller than 3.2. I calculated through pixel measurement that 1 DCU was ~ 3.6mm2, by 28 = 100mm2 for CU only. Which seems accurate as there is a lot of GPU space dedicated to other things besides the CUs, cache probably.

The XBSX die shot is online so please verify my estimations if you feel like it

jpiniero · Sep 3, 2020

DDH said:
Yeah I was wondering this myself. Maybe there is big Navi, and then bigger Navi

Because remember, the mi100 has 128CUs with 120 active

That's CDNA though. No rendering stuff.

DDH · Sep 3, 2020

jpiniero said:
That's CDNA though. No rendering stuff.

Yes that's right, but no one is expecting a 120CU big Navi. Just an example of what is possible

A/// · Sep 3, 2020

Read something interesting in the comments section of some site. I forget which. But apparently drivers were hampered on the RX5000 lineup because they had to split the code between GCN and RDNA. And that RDNA2 drops all legacy code. IIRC GCN was a bastard child of CDNA, right?

maddie · Sep 3, 2020

kurosaki said:
I would be equally satisfied if "Big"navi~80CU turned out to be ~300mm^2. that would possibly mean a very good performance, without the need for costs being as high as for eg the 3070.

Or at least a nicely priced mid Navi with 80CU. Won't prevent a larger die >80CU from existing for the glory seekers.

DisEnchantment · Sep 3, 2020

No speculation here, die shot of both XSX and N10 are public. You can use ImageJ to calculate.

For reference IO/MM/PHY/MC on Navi 10 is around 113 mm2.

maddie · Sep 3, 2020

DDH said:
Yeah I was wondering this myself. Maybe there is big Navi, and then bigger Navi

Because remember, the mi100 has 128CUs with 120 active

Also, 1 CU is smaller than 3.2. I calculated through pixel measurement that 1 DCU was ~ 3.6mm2, by 28 = 100mm2 for CU only. Which seems accurate as there is a lot of GPU space dedicated to other things besides the CUs, cache probably.

The XBSX die shot is online so please verify my estimations if you feel like it

Just did it and got the 3.2 number again by only using the GPU portion from image here. This still has the CPU cores, and if this is removed, it leaves enough room for two instances of the XBox series X GPU with HBM2 memory controllers in a 500mm^2 die.

Stuka87 · Sep 3, 2020

A/// said:
Read something interesting in the comments section of some site. I forget which. But apparently drivers were hampered on the RX5000 lineup because they had to split the code between GCN and RDNA. And that RDNA2 drops all legacy code. IIRC GCN was a bastard child of CDNA, right?

CDNA is more of an offspring of GCN. GCN came long long before CDNA. But GCN is compute heavy, and its very good at compute. Which is why it was the basis for CDNA.

Not sure that comment you mention makes sense though. RDNA is a new ISA, so the drivers are brand new for it (Which is why they had some growing pains). This didn't impact old GCN drivers, and the GCN drivers would not have any direct impact on RDNA drivers. What would have was potential resource limitations of the driver team being split between new and legacy.

DDH · Sep 3, 2020

maddie said:
Just did it and got the 3.2 number again by only using the GPU portion from image here. This still has the CPU cores, and if this is removed, it leaves enough room for two instances of the XBox series X GPU with HBM2 memory controllers in a 500mm^2 die.

View attachment 29150

I think they could get more than 80cus in 500mm2. 172mm2 56cu X2 is 344, + 100mm2 for a 512bit gddr bus = 444. Of course this leaves out the controllers and probably lots of other things, but this would have been 102cus and 16 ggdr bus's. Just a fun speculative though

DDH · Sep 3, 2020

Stuka87 said:
CDNA is more of an offspring of GCN. GCN came long long before CDNA. But GCN is compute heavy, and its very good at compute. Which is why it was the basis for CDNA.

Not sure that comment you mention makes sense though. RDNA is a new ISA, so the drivers are brand new for it (Which is why they had some growing pains). This didn't impact old GCN drivers, and the GCN drivers would not have any direct impact on RDNA drivers. What would have was potential resource limitations of the driver team being split between new and legacy.

The post was on Reddit, linked on OCUk. I'll see if i can dig it up

uzzi38 · Sep 3, 2020

RDNA is still using the GCN ISA. RDNA2 is almost certainly going to be the same. I think AMD have clarified in the past the ISA is here to stay.

Just clearing that up.

blckgrffn · Sep 3, 2020

Veradun said:
Ah.

Marvelous. We should not discuss those "numbers" then :>

I hope you "heard" my comment in jest and not me trying to be a jerk. This isn't FB or heaven forbid twitter, and I just want to be clear that I thought it was a chance to be chuckle worthy. Apologies if it was taken any other way. We are all here to escape the rest of the world.

A/// · Sep 3, 2020

DDH said:
The post was on Reddit, linked on OCUk. I'll see if i can dig it up

Don't use Reddit myself. It may have been copy pasted now that I think about it. Can you karma glam on Disqus?

DisEnchantment · Sep 3, 2020

Stuka87 said:
CDNA is more of an offspring of GCN. GCN came long long before CDNA. But GCN is compute heavy, and its very good at compute. Which is why it was the basis for CDNA.

Not sure that comment you mention makes sense though. RDNA is a new ISA, so the drivers are brand new for it (Which is why they had some growing pains). This didn't impact old GCN drivers, and the GCN drivers would not have any direct impact on RDNA drivers. What would have was potential resource limitations of the driver team being split between new and legacy.

GCN 1.x architecture is not actually compute heavy. RDNA 1 has the same compute throughput like VII clock for clock for example, if not more. Of course, RDNA can do additional scalar operations and better at branching code.
The issue is that a full wave64 need 4 cycles in GCN to complete using 4x SIMD16.

With compute loads it is always possible to keep the pipeline busy because the SIMD16 can always engage every cycle, executing something that is part of consecutive wavefronts.
So compute loads are better suited for GCN.
For graphics it could be that the whole wave has to complete to have something before scheduling the next wave(so there is a 4 cycle latency), or it could be that the wavefront is not so wide.
Thus GCN/Vega struggles to keep the SIMDs engaged always and this results in lower performance even though theoretically the TFLOPs is fairly high.

Instruction wise, RDNA HW can run all the GCN instructions.
Besides if we are talking for PC, the shader compiler will JIT the shader code anyway. Unlike consoles where the shader binaries are shipped precompiled.
That said, LLVM introduces a new set of instructions and extensions for RDNA2 which older GCN HW will not be able to run.

Kenmitch · Sep 3, 2020

Let's not get the hype train going too fast. After all there isn't even a conductor....Raja left for Intel and so far there hasn't been any volunteers.

senseamp · Sep 3, 2020

A/// said:
Dang, I guess TSMC and AMD didn't realize they had to start and workout a production schedule so they could deliver 10M processors and graphics chips to Sony by the end of March 2021.

Guess we won't see Zen3 or RDNA until next fall. See ya later, alligators!

You might see them in small quantities, pricier, or delayed until more mobile players move on to 5nm. If Ryzen is selling like hotcakes, those dies are far more profitable for AMD than GPUs, unless the GPU can be in the $700+ range.

blckgrffn · Sep 3, 2020

Kenmitch said:
Let's not get the hype train going too fast. After all there isn't even a conductor....Raja left for Intel and so far there hasn't been any volunteers.

That sounds boring. I think we are doing a good job of driving the train by committee?

I shudder to think of what might happening over in r/amd

DDH · Sep 3, 2020

Kenmitch said:
Let's not get the hype train going too fast. After all there isn't even a conductor....Raja left for Intel and so far there hasn't been any volunteers.

No brakes

Profanity (and memes/images that serve no tech purpose)
are not allowed in the tech forums.

AT Mod Usandthem

eek2121 · Sep 3, 2020

DisEnchantment said:
GCN 1.x architecture is not actually compute heavy. RDNA 1 has the same compute throughput like VII clock for clock for example, if not more. Of course, RDNA can do additional scalar operations and better at branching code.
The issue is that a full wave64 need 4 cycles in GCN to complete using 4x SIMD16.

With compute loads it is always possible to keep the pipeline busy because the SIMD16 can always engage every cycle, executing something that is part of consecutive wavefronts.
So compute loads are better suited for GCN.
For graphics it could be that the whole wave has to complete to have something before scheduling the next wave(so there is a 4 cycle latency), or it could be that the wavefront is not so wide.
Thus GCN/Vega struggles to keep the SIMDs engaged always and this results in lower performance even though theoretically the TFLOPs is fairly high.

Instruction wise, RDNA HW can run all the GCN instructions.
Besides if we are talking for PC, the shader compiler will JIT the shader code anyway. Unlike consoles where the shader binaries are shipped precompiled.
That said, LLVM introduces a new set of instructions and extensions for RDNA2 which older GCN HW will not be able to run.

View attachment 29152

That applies to Navi, not Navi2X. We already know that AMD has made changes in this area based on commits to the Mesa source code.

maddie · Sep 3, 2020

Kenmitch said:
Let's not get the hype train going too fast. After all there isn't even a conductor....Raja left for Intel and so far there hasn't been any volunteers.

Hype is one thing, but we have the XBox die shot to work with. If you just remove the CPU clusters you are left with ~ 317mm^2 for a 56CU + 320 bit GDDR6 interface + all the IO + multimedia circuitry.

Even if we do a simple ratio analysis (the worst one as it expands every part equally), we get 158% for a 500mm^2 die or ~ 88CU.

It strongly suggests that even with a GDDR6 512 bit memory bus, the 80CU @ ~ 500mm^2 die is wrong. We either have more than 80CU or a smaller die and HBM2 controllers will allow for even more CU.

Where am I so wrong in this?

blckgrffn · Sep 3, 2020

maddie said:
Hype is one thing, but we have the XBox die shot to work with. If you just remove the CPU clusters you are left with ~ 317mm^2 for a 56CU + 320 bit GDDR6 interface + all the IO + multimedia circuitry.

Even if we do a simple ratio analysis (the worst one as it expands every part equally), we get 158% for a 500mm^2 die or ~ 88CU.

It strongly suggests that even with a GDDR6 512 bit memory bus, the 80CU @ ~ 500mm^2 die is wrong. We either have more than 80CU or a smaller die and HBM2 controllers will allow for even more CU.

Where am I so wrong in this?

Could be greater hardware allocation to RT type compute? More dead space to allow for effective cooling?

eek2121 · Sep 3, 2020

Assuming Navi2X has a similar density to what is speculated in the Xbox die shot, AMD could have taken a couple of different routes:

Beef up GPU performance by fixing bottlenecks, widening things, etc. I suspect this really isn’t needed.
Sell Big Navi for slightly cheaper than the 3080. Everyone’s collective jaws would drop if AMD pushed out a part that was competitive with the 3080, but only costed $399-$499.

maddie · Sep 3, 2020

blckgrffn said:
Could be greater hardware allocation to RT type compute? More dead space to allow for effective cooling?

Fair enough. Although these factors should already have been accounted for by the XBox die seeing as it has a good frequency and RT hardware (CU based).

eek2121 · Sep 3, 2020

maddie said:
Hype is one thing, but we have the XBox die shot to work with. If you just remove the CPU clusters you are left with ~ 317mm^2 for a 56CU + 320 bit GDDR6 interface + all the IO + multimedia circuitry.

Even if we do a simple ratio analysis (the worst one as it expands every part equally), we get 158% for a 500mm^2 die or ~ 88CU.

It strongly suggests that even with a GDDR6 512 bit memory bus, the 80CU @ ~ 500mm^2 die is wrong. We either have more than 80CU or a smaller die and HBM2 controllers will allow for even more CU.

Where am I so wrong in this?

A third option just occurred to me: tensor cores.

maddie · Sep 3, 2020

eek2121 said:
A third option just occurred to me: tensor cores.

Nah. Even Nvidia is deprecating the use of tensor cores on the sly.

Ray tracing denoising shaders are good examples that might benefit greatly from doubling FP32 throughput.

Question Speculation: RDNA2 + CDNA Architectures thread

Platinum Member

Member

Lifer

Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Member

Member

Platinum Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Lifer

Diamond Member

Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member