Question Speculation: RDNA2 + CDNA Architectures thread

Page 116 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,703
6,405
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
You dont need more than a 256bit and 16gbit/s configuration to provide such performance. A RTX3070 is on par with a 2080TI and has 28% less bandwidth.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
RTX3070 has 20TFLOPs with the advertised boost clock and will be around 21TFLOPs with "gaming" clock. 16gbit/s is enough for 20TFLOPs to provide ~20% more performance over a RTX2080TI.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,429
2,914
136
RTX3070 has 20TFLOPs with the advertised boost clock and will be around 21TFLOPs with "gaming" clock. 16gbit/s is enough for 20TFLOPs to provide ~20% more performance over a RTX2080TI.
20-21TFlop RTX 3070 performs as RTX 2080Ti, right?
RTX2080Ti FE has officially 14.2Tflops and in reality 15.9TFlops(1824Mhz is the average clockspeed), that's a lot less than RTX 3070.
Not to mention you compare bandwidth needs of two different architectures Ampere vs RDNA2(doesn't have 2xFP32 per CU as Ampere per SM) and you don't even know what is the real performance of RTX 3070 compared to RTX 2080Ti in 4K.
BTW 16Ghz is only 14% more than 14Ghz.
 
Last edited:
Reactions: Tlh97

insertcarehere

Senior member
Jan 17, 2013
639
607
136
The MS Team clearly thought 256bits GDDR6 bandwidth wasn't enough to feed 12tflops RDNA2 + Zen 2 (which shouldn't take much in itself) , hence the awkward dual bandwidth memory solution. Curious to see how AMD plans to feed 20+ tflops RDNA2 SKUs on 256bit GDDR6 if that's true.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
20-21TFlop RTX 3070 performs as RTX 2080Ti, right?
RTX2080Ti FE has officially 14.2Tflops and in reality 15.9TFlops(1824Mhz is the average clockspeed)!
Not to mention you compare bandwidth needs of two different architectures Ampere vs RDNA2 and you don't even know what is the real performance of RTX 3070 compared to RTX 2080Ti in 4K.
BTW 16Ghz is only 14% more than 14Ghz.

The 3080 has 70% more bandwidth than the 3070 but it wont perform 70% better. Scaling architectures up will be getting harder and harder for gaming without increasing compute workload. It doesnt make sense to go overboard with bandwidth when the benefit isnt there. Microsoft and Sony have it easier with their 10/12TFLOPs consoles and certain bandwidth like Sony's is standard today.

I dont think there is an actual problem going with 256bit and 16gbit/s for AMD. This should be enough for ~90% of 3080 performance in average.
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,634
180
106
The 5700XT wasn't particularly efficient because it was clocked outside its efficiency range so AMD could compete with the 2070 in the market, while creating almost two 5700XT chips for every 2070 nvidia made (die area comparison). That's not to say AMD hasn't been improving in that area with RDNA2 but the difference might be a bit misleading.
Let me put it this way - if ampere was on TSMC 7nm instead of samsung it wouldn't be such a power hog and probably a way different spec with similar or higher performance.

If AMD gets a shot at being close or beating the 3080 it can thank NVIDIA choice of fab.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,429
2,914
136
The MS Team clearly thought 256bits GDDR6 bandwidth wasn't enough to feed 12tflops RDNA2 + Zen 2 (which shouldn't take much in itself) , hence the awkward dual bandwidth memory solution. Curious to see how AMD plans to feed 20+ tflops RDNA2 SKUs on 256bit GDDR6 if that's true.
i kinda don't understand how It's wired. Xbox X has 16 chips, each is a 14GHz chip. 10 chips have 320bit bus or 32bit per chip and the last 6 have 192bit or also 32bit per chip.
How many PHY does Xbox SoC actually have or how It's wired?

edit: now I know. It has only 320bit GDDR6(10x 32bit chips) and If the 4 1GB chips are full, you can't use them and you have left 6x 32bit 2GB chips, that's why the bandwith is 336GB/s(6x32=192bit and 14GHz)
 
Last edited:

TESKATLIPOKA

Platinum Member
May 1, 2020
2,429
2,914
136
The 3080 has 70% more bandwidth than the 3070 but it wont perform 70% better. Scaling architectures up will be getting harder and harder for gaming without increasing compute workload. It doesnt make sense to go overboard with bandwidth when the benefit isnt there. Microsoft and Sony have it easier with their 10/12TFLOPs consoles and certain bandwidth like Sony's is standard today.

I dont think there is an actual problem going with 256bit and 16gbit/s for AMD. This should be enough for ~90% of 3080 performance in average.
1. The increase in Ampere TFlops don't reflect the actual increase in gaming performance compared to Turing and RDNA1-2, because each SM has 2x as much FP32 compared to Turing per SM or RDNA1-2 per CU. Why do you think I mentioned TFlops for RTX2080Ti?
2. If you compare TFlops of RTX 3070 vs RTX 3080 then you will find out why It's not 70% better or even close to that number even with 70% higher bandwidth.
3. Please check what specs RX 5700 XT has and ask yourself again If 14% higher bandwidth or 512GB/s (256bit 16Ghz) is enough to feed twice as much CUs at possibly higher cockspeed.
 
Last edited:
Reactions: Tlh97 and Elfear

Glo.

Diamond Member
Apr 25, 2015
5,763
4,667
136
The 3080 has 70% more bandwidth than the 3070 but it wont perform 70% better. Scaling architectures up will be getting harder and harder for gaming without increasing compute workload. It doesnt make sense to go overboard with bandwidth when the benefit isnt there. Microsoft and Sony have it easier with their 10/12TFLOPs consoles and certain bandwidth like Sony's is standard today.

I dont think there is an actual problem going with 256bit and 16gbit/s for AMD. This should be enough for ~90% of 3080 performance in average.
Stop spreading BS about RTX 3070.

RTX 3080 has 54% more SMs than RTX 3070, and has 70% more memory bandwdith. At best, RTX 3080 is 30% faster than RTX 2080 Ti. Suddenly, you come here, to AMD thread, to talk about how magically RTX 3070 will be 20% faster than RTX 2080 Ti.

That GPU will not be faster than RTX 2080 Ti EVEN IN 1440P. It does not have enough bandwidth, nor enough horsepower in the SMs to beat 2080 Ti.

Nobody in this thread is stupid enough to buy what you are selling about RTX 3070.
 

Veradun

Senior member
Jul 29, 2016
564
780
136
The MS Team clearly thought 256bits GDDR6 bandwidth wasn't enough to feed 12tflops RDNA2 + Zen 2 (which shouldn't take much in itself) , hence the awkward dual bandwidth memory solution. Curious to see how AMD plans to feed 20+ tflops RDNA2 SKUs on 256bit GDDR6 if that's true.
Speculation here: maybe the desktop variant has a crapload of cache the apu version doesn't?
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
The MS Team clearly thought 256bits GDDR6 bandwidth wasn't enough to feed 12tflops RDNA2 + Zen 2 (which shouldn't take much in itself) , hence the awkward dual bandwidth memory solution. Curious to see how AMD plans to feed 20+ tflops RDNA2 SKUs on 256bit GDDR6 if that's true.
The patent on the elimination of cache data replication could explain the lowered bandwidth needed. If you don't need to load multiple instances of either shader code or data, then that would free up a lot of data movement.

Let's assume a fps of 60, then you have 16.66ms/frame to store all of the necessary code and data for that frame. If you can now drop that data volume by 50% due to the elimination of replicated data in the local caches, you can get by with a 1/2 sized bus.

Maybe this is why AMD is concentrating on 4K in their messaging, as it seems that with higher framerates (lower rez), this sort of optimization will start to be overwhelmed as the time allowed for updating new data is reduced. Bandwidth x frametime (total data) falls below the necessary number needed for that new frame, even with no replication of data.

Nail waiting to be hammered prediction:
If this is close to being correct, we should see the RX 6xxx cards having less of a % reduction in framerate as resolution increases. This is opposite to what we've become accustomed to.


Wanted to add:
This caching scheme appears to be the intermediate step to full chiplet based GPUs for gaming. Accessing cache data from far regions at an energy efficient cost. Probably still too expensive on 7nm but doable at 5nm especially if used with advanced packaging to lower pj/bit.
 
Last edited:

reb0rn

Senior member
Dec 31, 2009
222
58
101
And the point would be? I don't see a good reason to use up more die space for 128MB L2 or L3 instead of adding another 128bit GDDR6 memory controller + PHY, when the effect would be the same at best.
Where did you see 128MB its FUD... they just redesign so each CU could read each other L2/L1 data and save on bandwith, for me its very smart move I bet NV will do something similar with next GPU
 

Kuiva maa

Member
May 1, 2014
181
232
116
Let me put it this way - if ampere was on TSMC 7nm instead of samsung it wouldn't be such a power hog and probably a way different spec with similar or higher performance.

If AMD gets a shot at being close or beating the 3080 it can thank NVIDIA choice of fab.

I specifically spoke about RDNA1 vs Turing. AMD had the node advantage and it used it for market reasons. A theoretical navi with 60CU from a roughly 400mm2 die could match 5700XT-2070 in performance by utilizing low clocks while consuming way less than either. But AMD is not in the business of maximizing perf/watt but in the business of making profits, so they chose to manufacture more, smaller, more power hungry GPUs of equal performance than that theoretical bigger part. Makes sense? As to what would have happened if Ampere vs RDNA2 was done in the same node, we don't have enough data yet to guesstimate. Nvidia does have a big disadvantage however, at least in traditional graphics workloads- their chips now carry tensor and RT cores and those demand die area and consume power. Partly explains why nvidia is pushing DLSS so hard, else a considerable chunk of the die would be unused. Maybe AMD would have had a chance either way, by fielding an equally as large chip as GA102 that could be fully used for rasterization, no? We will see.
 
Reactions: Tlh97

Martimus

Diamond Member
Apr 24, 2007
4,488
153
106
There was an ES 3080Ti, it was supposed to be between 3080 and 3090.. but because we now know there's hardly any worthwhile gap, 10% between the two, it's a pointless SKU.

3090 is this gen's 3080Ti, its the fastest GAMING GPU from NV on this architecture. It's no Titan, don't let that bs marketing attempt slide.
I'm pretty sure everything pointed to a SKU between the 3070 and 3080, not between the 3080 and the 3090. Lots of rumors of a "3070 ti", and a picture of a partner with the unnamed SKU between the 3070 and 3080 on a slide.

Of course we will only know for sure if they actually launch. So I'm not too worried right now as I can wait.
 
Mar 11, 2004
23,179
5,641
146
The MS Team clearly thought 256bits GDDR6 bandwidth wasn't enough to feed 12tflops RDNA2 + Zen 2 (which shouldn't take much in itself) , hence the awkward dual bandwidth memory solution. Curious to see how AMD plans to feed 20+ tflops RDNA2 SKUs on 256bit GDDR6 if that's true.

I'd guess that's because of the new overall data system, tied to implementing NAND. They probably need some dedicated channels to manage the data just to/from that.

And the point would be? I don't see a good reason to use up more die space for 128MB L2 or L3 instead of adding another 128bit GDDR6 memory controller + PHY, when the effect would be the same at best.

And your evidence showing that's the same? I'd guess the weird setup in the consoles is due to the NAND, where they probably need some channels linked to it (serving as buffer). Plus the consoles have specialized compression/decompression blocks, so its possible that Microsoft wants all the bandwidth they can get since they're working with a total pool of memory comparable to just the GPU memory (where with dGPU they'll use that extra memory to just keep it in VRAM versus juggling it to and from the SSD like the new consoles will apparently be doing so you can switch between games quickly).

I'm catching up now so has it been confirmed that's what the cache is? If not, then it seems odd to be making arguments without knowing. If it is, I think you have to also consider that RDNA2 isn't meant to be the final implementation (people talk about it as though it is revolutionary, but while it is obviously very significant, keep in mind, its an iteration on their GPU development path). So, its possible that the cache implementation is the start and we'll see it change in the future, where it later becomes a separate chip altogether, or integrated into I/O or some other chiplet. Which, I think that was the idea/plan for HBM (where it would function both as cache and memory), but for whatever reason it didn't work out that way.

As for why the consoles wouldn't have it, costs would be a big reason. Its why console versions of things tend to have smaller caches and the like, its an easy cost cutting part, where the highly leveraged programming of the consoles mixed with other limitations mitigates that somewhat.

One last bit. I personally have a hunch that Microsoft actually had a stronger chip (I think they were looking at 15TF possibly room to push higher) planned, but reined it in when they saw the PS4 was going to be substantially below them. Which, perhaps they kept CU count low (maybe they were looking at 60-64CU or something) for yields, or maybe they plan on iterating more quickly (since new consoles still won't apparently be quite enough for flawless 4K, and then add in ray-tracing), where perhaps they can add CUs without messing with memory configs or anything.
 
Last edited:

TESKATLIPOKA

Platinum Member
May 1, 2020
2,429
2,914
136
Where did you see 128MB its FUD... they just redesign so each CU could read each other L2/L1 data and save on bandwith, for me its very smart move I bet NV will do something similar with next GPU
It was mentioned even here many times. And I also think It's BS. Can this redesign save at least a 1/3rd of the required bandwidth? If not, then I say we will see 384bit GDDR6.
 

reb0rn

Senior member
Dec 31, 2009
222
58
101
if you see rdn2 with 256bit is faster then 3070 with same bus, then it can, we just don`t know in which games it save enough and what is the limit memory bus or GPU

for sure miners are DEAD and will not buy AMD anymore as mining is strictly tied by raw memory bandwith and bus

also don't forget wider bus is a lot expansive PCB design and GPU side controller
 
Reactions: Tlh97 and maddie

Kenmitch

Diamond Member
Oct 10, 1999
8,505
2,249
136
I'm waiting for the official announcement on the 28th to see how the cards perform. I don't have any intentions of upgrading my GPU anytime soon, but would like to at least see how Big Navi performs.

It's nice reading some of the speculation, but the bickering back and forth seems pointless at this time.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,429
2,914
136
if you see rdn2 with 256bit is faster then 3070 with same bus, then it can, we just don`t know in which games it save enough and what is the limit memory bus or GPU

for sure miners are DEAD and will not buy AMD anymore as mining is strictly tied by raw memory bandwith and bus

also don't forget wider bus is a lot expansive PCB design and GPU side controller
But I was talking about 80CU Big Navi, you don't need that to compete with RTX 3070.
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |