Discussion RDNA4 + CDNA3 Architectures Thread

Page 282 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,754
6,631
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:

Hans Gruber

Platinum Member
Dec 23, 2006
2,448
1,316
136
I don't get it. They say RDNA 4 is not coming until March. For more than a year RDNA 4 was considered a skip it GPU generation. Now people are optimistic. Why has AMD not released the entire RNDA 4 lineup by now? People can say what they want about Intel GPU's but at least they pushed them out first. Nvidia has 90% of the market and AMD is waiting until March 2025?
 

ToTTenTranz

Senior member
Feb 4, 2021
276
522
136
Still???

What are they going to be running FSR4 on then - shaders again?

Tensor ops / matrix multiplication run at a higher rate in RDNA3 due to the nature of dependency when doing matrix algebra, and the hardware takes advantage of both that and rapid packed math with smaller variable sizes to achieve a much higher throughput.

RDNA4 increases throughput further by taking advantage of sparsity. Most of the matrixes have a lot of zeros inside and their ops will always result in zero, which if detected early will save a lot of unneeded computation cycles.


At least that's my understanding of this, but I'm not an expert.
 

Win2012R2

Senior member
Dec 5, 2024
647
609
96
That's great, but if they still have to use shaders for that then it's not as good solution as what Nvidia was doing since like what - 2018? It's 2025 already
 

Hans Gruber

Platinum Member
Dec 23, 2006
2,448
1,316
136
Their driver team work tirelessly adding x4 fake frames fancy slider in Adrenalin.
I actually like the Adrenalin concept. Nvidia doesn't have a dashboard. AMD admitted they were not competing with Nvidia for the high end with RDNA 4. So what is with the GPU delay?
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,652
6,107
136
I don't see how 390mm can possibly be true for 64CUs and 256bit bus.

That is bigger than N32 + the MCDs and besides the MCDs being on N6 there is also the fact both the GCD and MCD contain the links to talk to each other which is die area that does not need to be used in a monolithic design.

To me the realistic upper bound is probably in the 320mm region assuming the new RT features and other changes make the CUs larger than in RDNA3.

So the belief that AMD has caught up on PPA, is based on disbelief of the current information on die size?

That sounds like it's mostly a pick and choose fantasy.
 

BurnItDwn

Lifer
Oct 10, 1999
26,233
1,717
126
It was a cool display core gimmick I guess.
But AMD is the only company not Apple still making good display engines so nothing changed.
Eyefinity was great.
I used to run a pair of 6870s back around 2010-2013 with 3x 1080p displays. The games that ran well with crossfire handled it great. The games that didn't do crossfire well, i would have to set graphics down, or have to run on a single display.
When the Radeon 290 came out, it more or less ran my eyefinity setup great on its own until I eventually replaced it with a single 21:9 1440p display. (Went down from ~6M pixels to ~5M pixels.)

It may have been a bit gimmicky, but, it was MUCH cheaper at the time to run 3 cheap crappy 21.5 inch 1080p displays than to try to buy a single higher resolution display with comparable resolution.
 

Win2012R2

Senior member
Dec 5, 2024
647
609
96
Why not good? You think the math is less right when done "via" a shader?
It's not good because if it's in the shaders then upscaling of already done frame can't be done in parallel whilst new frame is being rendered in those shaders, specialised tensors like Nvidia did is clearly the way to go, but it's probably 5 cents more expensive than AMD's solution so we can't have that.
 

beginner99

Diamond Member
Jun 2, 2009
5,281
1,695
136
Cancelling Navi 41 could end up being Lisa Su's biggest blunder as AMD CEO.
From what we know about the expected performance of RTX 5000 lineup and 9070 XT, along with die size info, it seems RDNA4 has similar or close enough PPA in the same node against Nvidia, using considerable worse memory. This has to be the first time this happens since god knows when. Also, finally good enough Raytracing and a good upscaler in FSR4.
RDNA4 was the Zen moment, it is a HUGE wasted opportunity.

Even if true it ignores the huge cost for such an additional die that AMD would have to sell likley for $500 less than a 5090 simply because AMD. Then there is the fact that 5090 is a cut-down chip and NV actually makes the most profit from this chip selling it as "Quadro" cards for a much higher price. (I think they aren't named quadro anymore the RTX 6000 using AD102 costs like >$6000).
 

Win2012R2

Senior member
Dec 5, 2024
647
609
96
Even if true it ignores the huge cost for such an additional die that AMD would have to sell
What huge cost?

N4 wafers are $20k max, if that's too much for a monolithic chip then go back to RDNA3 chiplets - with fixed bugs and much larger GCD and GDDR7.

And maybe take some risks - get CUDA working on AMD via compatibility layer of sorts, IANAL but software APIs were cleared to be reimplemented.
 

Win2012R2

Senior member
Dec 5, 2024
647
609
96
Violates CUDA tos.
ToS can say they will have your first born child, does not mean it's enforceable - Google vs Oracle settled issue on API front and in other than US jurisdictions that crazy notion wasn't even legally enforceable in the first place, AMD should have support for CUDA via their own effective translation - like few years ago, they are big enough to deal with lawsuits that they will win if they have to.

Clean room implement AMD DLL to support it, most of the job done, then just keep it updated.
 

MrTeal

Diamond Member
Dec 7, 2003
3,747
2,136
136
Even if true it ignores the huge cost for such an additional die that AMD would have to sell likley for $500 less than a 5090 simply because AMD. Then there is the fact that 5090 is a cut-down chip and NV actually makes the most profit from this chip selling it as "Quadro" cards for a much higher price. (I think they aren't named quadro anymore the RTX 6000 using AD102 costs like >$6000).
I'm not sure even $500 off would be enough to move volume on a AMD Halo that matches a 5090. 5080 performance (inc. RT) at $750 would move a good number of units competing against Nvidia.
Unless it is clearly faster than the top Nvidia card, the difference between $2k and $1500 is pretty meaningless to 99% of the people in the market for a $2k GPU.
 
Reactions: Tlh97

adroc_thurston

Diamond Member
Jul 2, 2023
4,714
6,501
96
ToS can say they will have your first born child, does not mean it's enforceable - Google vs Oracle settled issue on API front and in other than US jurisdictions that crazy notion wasn't even legally enforceable in the first place, AMD should have support for CUDA via their own effective translation - like few years ago, they are big enough to deal with lawsuits that they will win if they have to.

Clean room implement AMD DLL to support it, most of the job done, then just keep it updated.
Again, all not relevant stuff.
I'm not sure even $500 off would be enough to move volume on a AMD Halo that matches a 5090.
Oh you don't match it. You smash it, it's a turd.
If they actually shipped Navi40, it would've not even been close. Alas.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
7,478
8,250
136
AMD has always gone with the "juiced SP" method with their RDNA Cards. Take one of the SPs in each CU and bulk it out with the ability to better execute RT instructions, Matrix instructions (i.e. RDNA3 "AI" Cores), etc.

Alton Brown GPU design philosophy.

Not surprising they are doing the same thing with RDNA4. If the juiced SP can execute the specialized instructions in parallel and almost as well as a tensor core... then who cares?

 

Timorous

Golden Member
Oct 27, 2008
1,795
3,383
136
So the belief that AMD has caught up on PPA, is based on disbelief of the current information on die size?

That sounds like it's mostly a pick and choose fantasy.

A theoretical N31 as a monolithic die would be about 400mm. 300 for the GCD less 20mm for the PHYs to the MCDs + 120mm for the cache and GDDR6 PHYs and that does not factor in shrinking the cache or the GDDR6 PHYs from the physical size they are on the N6 MCD. N32 would be in the region of 270mm as a monolithic part using the same sort of napkin maths.

So are we saying that AMD have gone from 400mm worth of N5 silicon to 390mm of N4 silicon and actually regressed slightly in raster performance? Also the 7900XTX and 4080Super are pretty similar in raster performance and the AD103 die is 380mm so if you normalise it the PPA difference was not actually that huge with Ada Vs RDNA3 in raster.

The other factor is that the 400mm RDNA3 part has 6SEs with 16CUs each and a 384bit bus and 96MB cache. N48 seems to be 4SEs with 16 CUs and 256bit bus and 64 MB L3 cache. For N48 to be 390mm would Indicate an absolutely massive increase in the size of each CU, which is not impossible, but to me just seems unlikely even with the added RT. We are talking about 50% more area than a monolithic N32, it just does not seem credible that AMD would spend that much real estate on RT and CU bloat to make it that large.

So yes, to me an extra 20% or so die area over a theoretical monolithic N32 seems far more in the realm of reasonable.

Now this is Radeon division so watch me get proven incorrect.
 
Reactions: Tlh97
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |