Discussion RDNA4 + CDNA3 Architectures Thread

Page 284 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,754
6,631
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:

Hans Gruber

Platinum Member
Dec 23, 2006
2,448
1,316
136

If you ain't first, you're last! There are no trophies for 2nd place. AMD's GPU team didn't get the memo. Nvidia has 90% of the GPU market. So please, take your time AMD. Don't rush out RDNA 4.
 

Timorous

Golden Member
Oct 27, 2008
1,795
3,383
136
The L3 including fabric was 90mm^2, 430mm^2 is about as low as N31 mono could go with Zen5 densified L3.
And N48 is ~350mm^2 based on more thorough measurements, smaller than GB203 and uses far cheaper and plentiful GDDR6.

Cache.

350mm is far more in line with what AMD have built for that middle of the road x7 part. It is a bit larger than what a monolithic N32 would be and what N22 was but not ridiculously so.

N32 by the same method is in the sub 300mm region as a monolith. 200 for the GCD, less 13mm for the no longer needed MCD PHYs add 100 for 64MB cache and 256bit bus and you are in the 290mm region. Add 20% for going to what seems like a 4SE of 16CU design and extra RT stuff and CU improvements and 350mm is far more in line with what one would expect.
 
Reactions: Tlh97

dacostafilipe

Senior member
Oct 10, 2013
779
253
136
It's not good because if it's in the shaders then upscaling of already done frame can't be done in parallel whilst new frame is being rendered in those shaders, specialised tensors like Nvidia did is clearly the way to go, but it's probably 5 cents more expensive than AMD's solution so we can't have that.

Yeah, no, that's not how it works. We can do shading and compute "at the same time" (!) without a problem for years now.
 

Win2012R2

Senior member
Dec 5, 2024
647
609
96
Yeah, no, that's not how it works. We can do shading and compute "at the same time" (!) without a problem for years now.
It might be cheaper, but not better - the undisputed market leader uses dedicated hardware for it and in 2025 neural based upscaling is clearly a must have feature that will be used by a lot of games going forward, this isn't 2018 when AMD could have gotten away with it (if they even had hardware capable enough).
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,994
3,696
136
It's not good because if it's in the shaders then upscaling of already done frame can't be done in parallel whilst new frame is being rendered in those shaders, specialised tensors like Nvidia did is clearly the way to go, but it's probably 5 cents more expensive than AMD's solution so we can't have that.
Wrong memory bandwidth will still limit you. If you have enough memory bandwidth to feed both the alu and the gemm engine at the same time you can probably make it all just work on shaders the same.

Assuming same format gemm engine is mostly a register read/write optimisation over a regular alu, which is really a saving in power at the cost of area.
 

Win2012R2

Senior member
Dec 5, 2024
647
609
96
No, they just made a choice, implemented it and are now updating it gen after gen.

Power/clockgating could be a reason.
But is it a better choice performance wise or not? It seems to me the answer is obvious yes

Originally, in gaming cards, as distinct from compute, tensor complexes were a feature looking for a usage scenario.
And they found it! And still keep them separate it seems, so it must be better performance option
 

linkgoron

Platinum Member
Mar 9, 2005
2,493
1,131
136

blckgrffn

Diamond Member
May 1, 2003
9,501
3,816
136
www.teamjuchems.com
They saw @blckgrffn 's plans and decided to sabotage him in particular.

There are many not-safe-for-this-forum memes I'd love to post. (f that guy in particular ones)

I'm humbled that they have done so much to thwart my "safe" purchase!

I guess it underlines the fact that increasingly, just buy a good deal now when you see it. Yeah, timing it might still be good but realistically future progress is going to be slowed with process tech and most software advancements can/should be backwards compatible because to do otherwise alienates part of your customer base. The fact that even first gen RTX is getting the new Transformer model is cool.

All of the "bad timing" purchases I made last fall largely for nice builds- $370 6800, $699 7900 XT, $720 MSI Fancy 4070 Ti Super, $950 Zotac Amp+ 4080 Super and this fancy XFX XTX for $900 (all prices pre tax) are all.... fine. Its worth noting that with that 7900XT it was part of a bundle that featured a $225 7800X3D

Point is those builds are done and have been running as intended and been enjoyed for months now. If I was going to feel regret, I am over it now.

Figuring out how to set frame cap/use frame gen 2/low lag/power tuning to get an effective, smooth and low(er) power "160" fps in my games with this beastly card since all these features are just sitting there. This seems to be the new gpu tweaking frontier instead of searching for a few more mhz.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,994
3,696
136
Ok, so Nvidia is dumb to have dedicated tensors, why do they do it?
No but you should lean to comprehend what I said , I gave all the key words needed to understand the advantage of a gemm unit.

You need to look at things holistically, nv like big gemm units because they sell cards who's workload is gemm only, in this situation the gemm unit performance advantage is the big reduction in register reads and writes.

But in a mixed workload there is more total computation power across alu and gemm then there is bandwidth. So bandwidth limits you because you still need your inputs and outputs.

There is still an advantage to the gemm unit in this case but it's just not as big of a win as marketing make out because you could have also just had more ALU.
 

gaav87

Senior member
Apr 27, 2024
452
794
96
Well 5090 is cpu limited even at 4k...
Just give us, a freakin 384bit 24gb glued 899$ card no need for more it will be cpu limited anyway.
Or even 9080xt with gddr7 24gb (3gb)...
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |