Question Speculation: RDNA2 + CDNA Architectures thread

Page 21 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
Rumor from reddit: the RBE can do more than 4pix/clock...

Thats the only way to explain the same Render Back End (RBE) count of 16 in Navi 21. If the RBE in Navi 2x can do 8 pixels/clock then it would be able to rasterize 2x the pixels vs Navi 10 per clock cycle.

Could you link where you found it.
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
4,881
4,951
136
Thats the only way to explain the same Render Back End (RBE) count of 16 in Navi 21. If the RBE in Navi 2x can do 8 pixels/clock then it would be able to rasterize 2x the pixels vs Navi 10 per clock cycle.

Could you link where you found it.
If you assume 100% efficiency, then 16 RBE can do (16*4*2,000,000,000) pixels/sec @ 2 GHz. Even at 4K (3840*2160) you still get a huge potential pixel output. Potentially 15,432 operations/pixel/second. 120HZ gives you 128 ops/pixel/frame.

Is it possible that they have simply (Ha) increased the output efficiency of a RBE?
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,923
3,550
136
If you assume 100% efficiency, then 16 RBE can do (16*4*2,000,000,000) pixels/sec @ 2 GHz. Even at 4K (3840*2160) you still get a huge potential pixel output. Potentially 15,432 operations/pixel/second. 120HZ gives you 128 ops/pixel/frame.

Is it possible that they have simply (Ha) increased the output efficiency of a RBE?
efficiency is a product of triangle size, so unless its a completely new way to rasterize where an RBE can process more then 1 triangle a clock i wouldn't expect efficiency gains.
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
If you assume 100% efficiency, then 16 RBE can do (16*4*2,000,000,000) pixels/sec @ 2 GHz. Even at 4K (3840*2160) you still get a huge potential pixel output. Potentially 15,432 operations/pixel/second. 120HZ gives you 128 ops/pixel/frame.

Is it possible that they have simply (Ha) increased the output efficiency of a RBE?

100% efficiency is unlikely. I would expect 8 pixels/clock per RBE in Navi 2x. Navi 23 with 240 sq mm is most likely going to be 40 CU which is the same CU count as Navi 10. Navi 10 had 64 ROPs ( 16 RBE x 4 = 64 ROPs) . I doubt AMD improved their ROP efficiency by a factor of 2x in a single generation. The way I see it ROPs per RBE has been doubled from 4 to 8 in Navi 2x.
 

maddie

Diamond Member
Jul 18, 2010
4,881
4,951
136
It
100% efficiency is unlikely. I would expect 8 pixels/clock per RBE in Navi 2x. Navi 23 with 240 sq mm is most likely going to be 40 CU which is the same CU count as Navi 10. Navi 10 had 64 ROPs ( 16 RBE x 4 = 64 ROPs) . I doubt AMD improved their ROP efficiency by a factor of 2x in a single generation. The way I see it ROPs per RBE has been doubled from 4 to 8 in Navi 2x.
It's obvious that a ROP will not output 1 pixel/clk, except for the most trivial of cases. As I stated before 16RBE will give 1.28 x 10^11 pixels/sec @ 2 GHz if that were the case, so each ROP is taking many cycles for each pixel. If it wasn't then, 16 RBE @ 2 GHz is enough for 17,361 fps at 4K resolution. Not a limit at all if we expect a pixel from each ROP/Hz.

That's why I mentioned that they might have improved the efficiency (execution rate) of the RBE.

16 RBE max output (100%) @ 2GHz
(16x4x2,000,000,000) =1.28 x 10^11
 

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146

jpiniero

Lifer
Oct 1, 2010
15,177
5,717
136
There is more to that than just shader count though. Sienna has XGMI, dual VCN, besides others which Navy Flounder does not have.

Could still be there, just physically disabled on Navy Flounder.

If they were two physically different chips it would make more sense if NF was simply the second tier chip with less than 80 CUs and the stuff not needed taken out. That doesn't appear to be the case.
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
If you're thinking of who I think you're thinking of (not sure if Bondrewd is the same guy), then he used to be here on these forums before getting banned for profanity.

For those that are on Twitter, it's probably Spec/TB-03 Devilfish.
Is he lordofdawn at reddit? His crazy hyliperbolic answers seems so

Anyway, is he legit? His rumour makes alot of sense
 

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
Is he lordofdawn at reddit? His crazy hyliperbolic answers seems so

Anyway, is he legit? His rumour makes alot of sense
At the very least on the consumer focusing GPUs for RDNA2 and probably even RDNA3 yes.

Also for Zen 3 and if he says something for Zen 4 too. Past those I'd advaise caution, occasionally he mixes in opinions as fact so it's always worth being a little hesitant.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
Meanwhile ... at RTG

NGG culling is crashing on Navi21...


NGG was not working in Vega10. It was not working properly in RDNA1. These bugs are not really inspiring confidence in RDNA2 NGG
Lots of bugs from RDNA1 were solved in RDNA2 but still...
Also CDNA1 has a bug that after each MFMA operation there has to be some other instruction inserted due to HW issue.
 
Last edited:

soresu

Diamond Member
Dec 19, 2014
3,214
2,490
136
NGG was not working in Vega10. It was not working properly in RDNA1. These bugs are not really inspiring confidence in RDNA2 NGG
It's early days yet, they might have 'working silicon' in the labs - but that does not necessarily mean that this represents the final errata fix stepping prior to shipping production chips.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
Just posting my thoughts on the memory topic for Sienna again.

Looking at this chart from Micron below, G6X offers a very slight decrease in energy per bit transferred over G6, but considering the very significant increase in BW (from 14 to 21 Gbps ) and capacity ( from 8 to 16 GiB), total energy expended on the memory subsystem will be significantly increased (30-35%).

Now consider HBM2E, those power savings are more pronounced than for G6X. Granted assuming it is run within spec (i.e. 3.2 Gbps or 820GB/s for 2048bit )
This seems like a good design opportunity that AMD capitalized on with Sienna.

Memory subsystem should be running at lesser power compared to the VII when running within spec (3.2 Gbps)
Due to increased capacity, Sienna could make do with two stacks instead of four with the VII. This reduces the complexities of the RDL, die bonding and the interposer. In the end, costs could be shaved off a bit.
Sienna could save premium die space , we have to recollect not only bigger dies are costlier but they are more likely to be hit by defects at the same defect density and yield. Versus a 384 bit G6X bus , 2048 bit HBM2e could reduce around ~75mm2 of die space.

Regarding costs, Radeon VII was available for 699 USD. This is using four stacks. Considering that these top end RDNA2 cards are going to be selling around 1K or probably even higher(if they perform well), the HBM2e is much more justified than it were for the VII/V64/56 actually.
For Sienna, smaller die wasted on G6 PHY, lesser energy expended, cheaper memory than the VII or at worst same, less complex PCB... sounds like a win on all counts imo. Should help keep that last mile TBP in control.

Aug 2019

“When we built HBM2, we wanted to expand the market breadth the device could attack, but also add in two dimensions—capacity and more bandwidth,” said Joe Macri, corporate vice president and chief tech officer of the compute and graphics division at AMD. AMD is a major partner with Samsung in the development of HBM. “It’s still 1,024 bits wide, but doubled the frequency to two gigachannels and added Error Correction Code (ECC) to get into data center and AI and machine language, since the entire data center market is built on a trusted data model.”

With HBM2E, AMD, one of the co-developers of HBM, is turning the same levers again. “The only bits added to the interface were to increase addressability, but it’s the same interface, it just runs at a higher interface of 3.2 gigatransfers per second,” Macri said.

Dec-2019

Three years ago, HBM cost about $120/GB. Today, the unit prices for HBM2 (16GB with 4 stack DRAM dies) is roughly $120, according to TechInsights. That doesn’t even include the cost of the package.

Both Hynix and Samsung can run HBM2e beyond JEDEC standard 3.2Gbps. Samsung can run at a mind boggling 4.1-4.2 Gbps. And Hynix at 3.6Gbps. Besides throughput increase, latency is really low at such speeds.
 

DiogoDX

Senior member
Oct 11, 2012
747
279
136
Some slides of the XBOX presentation on HotChips: https://www.tomshardware.com/news/microsoft-xbox-series-x-architecture-deep-dive

RT accelerator looks to be inside the TMUs like the patent. 4 textture OR 4 rays means that RDNA2 will trade texture fillrate for raytracing performance?



BVH run in parallel with the shader.




Pixel fillrate gain is massive. There was a rumor of some changes on the ROPs.



Direct ML can be used for resolution scaling. AMD response to DLSS?

 

Saylick

Diamond Member
Sep 10, 2012
3,513
7,776
136
More like 250-275W.
Power is a cubic relation to clocks, if I'm not mistaken.

130W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^3 = 263W
140W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^3 = 283W
150W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^3 = 303W

Also, assuming the PS5 is similar in architecture:

130W * (36 CUs / 52 CUs) * (2.23 GHz / 1.825 GHz)^3 = 164W
140W * (36 CUs / 52 CUs) * (2.23 GHz / 1.825 GHz)^3 = 177W
150W * (36 CUs / 52 CUs) * (2.23 GHz / 1.825 GHz)^3 = 189W

EDIT: Adding in some more power ranges.
 

Glo.

Diamond Member
Apr 25, 2015
5,802
4,776
136
Power is a cubic relation to clocks, if I'm not mistaken.

150W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^3 = 303W

Also, assuming the PS5 is similar in architecture:

150W * (36 CUs / 52 CUs) * (2.23 GHz / 1.825 GHz)^3 = 189W
And how do you know how CU power draw scales?

Its not linear, I can tell you that.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
Pretty much what is in the patent as I wrote before. Also the Ray Intersection unit is outside the Texture units directly coupled to L0 and LDS. So output stage from SIMD shaders goes into Ray unit via LDS and vice versa.



TL;DR;
AMD's Ray Intersection Unit containing HW for accelerating ray-box and ray-triangle intersection tests is present in the CU along side the SIMDs and not within the Texture Processor.
The previous circulated assumptions of the Ray Intersection Unit within the TMU is probably incorrect.
The Engine can perform 4 ray-box tests or 1 ray-triangle tests per cycle.
Therefore the publicized numbers of 380 Billion ray intersection tests per second for XSX is actually for ray-box intersection.
 

Saylick

Diamond Member
Sep 10, 2012
3,513
7,776
136
And how do you know how CU power draw scales?

Its not linear, I can tell you that.
Yeah, so if you want to subtract out the power consumption outside of the CUs then yeah, the CU portion will be less than the 130-140W estimated. My numbers are conservative at the end of the day.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |