Discussion RDNA4 + CDNA3 Architectures Thread

Page 366 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,770
6,719
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:

DisEnchantment

Golden Member
Mar 3, 2017
1,770
6,719
136
man...... if only AMD had a 96CU // 384bit part ... my 7900XTX is 6 months olds , i'll still probably buy a 9090XTX. But right now its like RDNA 4 can do lower precision better , but has less memory (24gb vs 16gb matters for local LLM's ) and doesn't perform any better until FSR4 makes XESS on RDNA3 look like crap.

the distilled DS models would love RDNA4 , RDNA3 // 7900XTX was already at 4090 levels of performance in it.
I am totally with you on this. I also am a bit disappointed with only 64CUs and 16GB VRAM.

I could be in the minority of folks who love AI/ML, but last year I got around 120K+ lines of code in C++ and Rust with coding assistants doing 60% of the work and I was really impressed.
I found a good way to use Coding Assistants, basically to do lots of repetitive jobs and asking it to fill things which I already know how it would look like but just let it automate itself.
No going back, now I feel odd to not have the coding assistant whenever I am coding something.

I can also use it to analyze start up logs, compilation problems, possible security vulnerabilities, and communication patterns and it really amplified my productivity at work.

With 96/128CUs +32G we could have been able to run all these distilled models locally, especially the bigger models not the absolute bottom 1B+ models, it can be a lot more pervasive where to use ML/AI for daily work stuffs.

Personally, I got Topaz Photo AI and have been able to restore really old photos of my pop who is gone and it was bringing back so many strong memories. I love this app especially the denoise, upscale and face restore.
It works well with AMD cards and I hope in the future it can leverage FP8 support

Have they improved dual issue to be at least on par with Nvidia's?
Currently from LLVM code, they cannot feed more than 4 vector operands in VOPD, the other 2 operands can only be scalar or constants. So nothing much can change here. VOPD is limited as usual.
I had hoped for more than 4 register banks in RDNA3 but apparently wasn't the case with RDNA3/4.
 

soresu

Diamond Member
Dec 19, 2014
3,613
2,927
136
If they're really building an OoO shader core, pretty much anything is possible.
OoO is not without it's notable downsides though.

Power and area/complexity also increases significantly as a result, and it's not like AMD's CUs haven't been getting progressively chonkier in each generation already.

Area especially would be problematic considering how close we are to diminishing returns in area scaling these days.

That's why this research going on to find alternative architectural approaches like Forward Slice Core interest me so much (thanks to Nosta for providing some practical interesting information).

It's not all the way to OoO perf, but it's a big increase over in order at minimal increase to power and area.

That being said, if they have already been laying the ground work for a shift to OoO in RDNA4 and RDNA3 µArchs then the shift may not be so detrimental as I fear.
 

eek2121

Diamond Member
Aug 2, 2005
3,270
4,795
136
Other than an enforced memoc on the XT, not much can be done.
Vanilla 9070 I could certainly see a driver update to a 260W TBP to counter a 5070 Super or whatever in tandem with a price cut.
The current config is built to win the efficiency graphs, nothing more.
It is possible we may get an XTX variant down the road with higher clocks and more (possibly faster?) memory. There have been a few rumors suggesting that AMD is playing with something in house.
Raytracing as a workload is a bespoke kind of horror for GPUs: it's very divergent and very very latency-sensitive, and GPUs naturally suck at that.

Man, I still remember when it was “impossible” to do hardware RT. We have come a long way.

Maybe?
If they're really building an OoO shader core, pretty much anything is possible.
A P6 moment for GPUs would be insane.

Agreed!
 
Reactions: lightmanek

adroc_thurston

Diamond Member
Jul 2, 2023
5,237
7,317
96
Area especially would be problematic considering how close we are to diminishing returns in area scaling these days.
Oh hell no logic scaling is the only thing we have going on these days.
That's why this research going on to find alternative architectural approaches like Forward Slice Core interest me so much (thanks to Nosta for providing some practical interesting information).
Meme academia stuff is meme.
That being said, if they have already been laying the ground work for a shift to OoO in RDNA4 and RDNA3 µArchs then the shift may not be so detrimental as I fear.
Thanks AMD for making GPUs not boring again.
Man, I still remember when it was “impossible” to do hardware RT. We have come a long way.
Well, for most implementations they've sidestepped the problem of doing RT on GPUs by not doing it on the GPU actual.
 
Reactions: Tlh97 and marees

Panino Manino

Senior member
Jan 28, 2017
973
1,226
136
Kingdom come 2 uses Voxel cone tracing, not RT. NV and Intel cards play well with this game.

Oh well at least AMD has CoD and Spider-Man.
It's a cryengine thing. It hates AMD GPUs for idk why reasons.

I don't understand.
When Crytek demonstrated their "Software RT" it was running on a AMD Vega, wasn't it? To prove that you didn't needed Nvidia dedicated hardware.
Shouldn't it run better?
 
Reactions: lightmanek

soresu

Diamond Member
Dec 19, 2014
3,613
2,927
136
Oh hell no logic scaling is the only thing we have going on these days.
It is still scaling, but the end of that scaling is now visible on the horizon.
Meme academia stuff is meme.
Yesterdays meme academia stuff is tomorrows slightly altered corporate patent meme stuff.

Nearly everything the tech corps do started in academia, and often is co created in lock step with academic collaboration.

Just look at how many of nVidia's real time RT papers over the last decade have academic co authors.
 
Reactions: lightmanek

adroc_thurston

Diamond Member
Jul 2, 2023
5,237
7,317
96
It is still scaling, but the end of that scaling is now visible on the horizon.
uh. nope. NOPE.
Yesterdays meme academia stuff is tomorrows slightly altered corporate patent meme stuff.
Nope.
Academia never leaves academia.
Just look at how many of nVidia's real time RT papers over the last decade have academic co authors.
That's normal but the ideas are IHV-driven, not from ivory towers at all.
 

soresu

Diamond Member
Dec 19, 2014
3,613
2,927
136
uh. nope. NOPE.
I was talking about area scaling, and on that front at least my point is valid.

If we are talking about changing transistor device types, materials, vertical scaling and/or wholesale changing compute paradigms (a la reversible computing, Blueshift Memory's Cambridge architecture, optical computing etc etc) that's another whole range of matters entirely - there's definitely plenty of room on that front, no arguments at all there.

But from a perspective of purely area based scaling the horizon within the next 2 decades is most definitely visible simply from a perspective of the fundamental pitch limits at which a transistor could work at all, let alone be performant and energy efficient.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,601
3,160
136
And with 300W TBP.
In a way we can say that RTX 5080 is pretty bad compared to 4080 Super. Link
TBP: 360W vs 320W (+12.5%)
Real power consumption: 332W vs 290W (+14.5%)
Bandwidth: 960GB/s vs 736GB/s (+30.4%)
Cuda cores: 10,752 vs 10,240 (+5%)
Clockspeed(median): 2662Mhz vs 2730MHz (-2.5%)
Compute performance: 57,244 vs 55,910 (+2.4%)
Average performance: 100 vs 88 (+13.5%)
It looks like ~11% comes from just the faster memory. And I find It surprising that despite the higher power consumption It has lower GPU clocks. Is GDDR7 so power hungry or what?
On the other hand, OC is very good on RTX 5080 -> 13-17% of extra performance according to TPU's findings.

I don't think RX 9070(XT) will have as good OC as Blackwell, but maybe we will receive another pleasant surprise.

@SolidQ True for RX 9070, but that one has low clocks to begin with and It will need 300W board power limit to really shine.
 
Last edited:
Reactions: lightmanek

soresu

Diamond Member
Dec 19, 2014
3,613
2,927
136
Just $$$$$$$$$
So. Much. $$$$$$$.

Across all aspects - design, fabbing and litho.

And for diminishing returns at that.

Eventually the worlds population and all the assorted industries will not offer enough possible revenue to justify the cost of process node development.

Especially as population growth in developed nations is showing serious signs of faltering.
 

soresu

Diamond Member
Dec 19, 2014
3,613
2,927
136
Didnt expect this.

Sapphire does the unthinkable and puts 16-pin power connector on an RX 9070 XT

https://www.tomshardware.com/pc-com...-power-connector-inside-offers-cableless-look
Not only that, but given the recommendation on distance from connector to a bend (35mm) it would seem to be extremely close to the point that I would not trust it.

They could have fixed that simply by leaving a groove in the heatsink fins, which makes them not doing so somewhat stupid.

It's enough to chance a 16 pin connector/cable vs the cost of a modern gfx card already without leaving that further uncertainty there.
 

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,683
2,906
136
Not only that, but given the recommendation on distance from connector to a bend (35mm) it would seem to be extremely close to the point that I would not trust it.

They could have fixed that simply by leaving a groove in the heatsink fins, which makes them not doing so somewhat stupid.

It's enough to chance a 16 pin connector/cable vs the cost of a modern gfx card already without leaving that further uncertainty there.
Some cables are physically incapable of bending the way the Nitro+ will require. I checked my FSP 12v2x6 cable and its been designed to not allow bends anywhere near the connector end. I quite literally could not use the Nitro+ if I wanted with my native 12v2x6.

I really wanted the Nitro+, but I think I'll have to take my second choice in the TUF.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |