Article State of RDNA era AMD

Mahboi · Sep 16, 2024

So the recent news/rumours have got me thinking a lot about what AMD's been doing, and the announcement particularly of UDNA (which I still insist they should call URA-NUS) has changed my opinion of their strategy a lot. Or rather opened my eyes on it.

I think it's worth recapping what they've been doing to understand what is really going on over at the Red Factory.

Or rather 6 years. It's 2018 all over again. Nobody outside of medical labs ever heard of a "chronovirus". Maybe only Red Alert 2 Yuri's Revenge modders.

Look on my GeForce, ye chipmakers, and despair

Nvidia has been dominating the GPU space for a long time. In 2018, they have the hardware performance, obviously. But they also have the software. The high quality encoders for streamers and live video recording. The support for 3rd party software, both libraries and documentation, and experienced developers. All Photoshop acceleration runs on Nvidia, some on OpenCL. They have CUDA, and it keeps getting more entrenched. Almost all server side compute uses CUDA. A machine learning dev, compute dev, maths/physics simulation dev, is basically a CUDA dev. Nvidia has the best documentation, highest userbase by far, and an extremely large array of 3rd party libraries that sit on CUDA. When Adobe, Davinci Resolve or any GPU accelerated software gets a new feature out, it's accelerated on CUDA first.

Nvidia has the public support, streamer support, dev support, docs support, 3rd party support, their dies are bigger, their street cred is through the roof. Enterprise and consumer all agree that there are two choices: CUDA, and more CUDA that costs more. Worse, Nvidia is about to break open a new Money River with Volta, and soon enough with Turing. Their new Tensor cores open a lot of ways towards accelerated compute and particularly AI. They know the money's about to burst open, it's just about pushing the new generation out, and getting it to work from top enterprise all the way to the lowest tier compute. Get the software stack identical all across the spectrum.

What does AMD have to counter that?
Vega.
Vega...an uninspired, far from revolutionary design that was touted as highly impressive and Very Very Powerful, but really fell short most everywhere. It was pretty great compute. It was pretty decent gaming.
But you don't bust a fortress with a "pretty good" battering ram. You especially don't when you are not only not winning on raw power, but you're losing everywhere else too.

Around the same time, Intel is starting to sink in the quicksand called Zen, but it hasn't had any dramatic effects yet. AMD's starting to rake in serious money thanks to it though.
From AMD's seat, there is a lot of doors that are opening all at the same time thanks to Zen 2. But they can't just take over Nvidia's compute empire or gaming empire yet. They are holding on very well, all the big consoles are theirs and Valve is interested in making their own with AMD hardware.
But holding on isn't winning. They need a strategy to break all the aforementioned points:
- They need a stronger GPU than Nvidia
- They need far better software support than they have
- They need an enterprise/3rd party oriented compute stack they can support
- They need all the little side things that matter (encoders, Matrix cores, soon enough upscaling, raytracing, etc etc, where Nvidia has been extending their advantage over them)

The work is large, but they took roughly 6 years of slow pace since the first GPU generation after Bulldozer came out. 2012-2018 was the long diet and now it's about growing again.

But How?

How do you catch up to 6 years of delay? Even if you effectively kept yourself in the race somewhat, and it's more like 3 years here, 2 years here, 4 years here, the general quality of everything is clearly below. Catching up on a budget that will be always tighter than the very well established competition is not going to be any kind of easy. Maybe not even possible at all. As for pilfering money from the now successful CPU division, after you were basically sacrificed for years (Polaris anyone?) to fund them out of Bulldozer, sure, you can. But just a little bit. The brass isn't going to sacrifice a highly successful EPYC server lineup to fund R&D on a Vega 72. You can expect some support from the CPU division, but not any more than the bare minimum.
Your most recognisable product is the RX 580, theirs is the 1080 Ti.
Your most meme product is the Radeon VII, theirs is the 2080 Ti.
It's RTG, a bit of extra money, Vega, GCN, and your big old courage versus Emperor Huang's Computic Empire.

Your odds aren't too great. So how to widen them?

You have two choices after all in this kind of race. Go big, or go home. Just holding on means that you get gapped further and further as the revenue keeps being the same or better on the other side.
So how do you gain more money than the other side while having a worse product?
Think about it and I'll give the solution at the end. Don't cheat now.

Mahboi · Sep 16, 2024

A Dwarf taking on a Giant

So Vega may not be doing too fly, but it is functional, and more importantly, broadly cheap. That means that it can serve in all sorts of APUs, provide a cheap way to get some CUs out where no money should be wasted. It's not much, but it helps. Get ROI where you can.

Outside of getting easy graphical support for CPUs, how do you respond to the 4 points above, while aiming to lose less money than the competition?

Stronger GPU than Nvidia: Spam more CUs, and do so in a product that you're pretty confident that people will buy. This costs a lot.
Far better software support: There are no secrets, it takes time. Software can get an infinity of new versions and still get more beyond that. This costs a lot.
Enterprise/3rd party oriented compute stack they can support: OpenCL is out. They need their own CUDA. Call it ROCm. This costs a lot.
All the little side things that matter: same as software, there are no secrets, everything takes time, deals, or slow progress. This costs a lot of littles. Which still amount to a lot.

Do all of the above without spending as much money as Nvidia: Split the architectures.
Now that one warrants an explanation. Splitting architectures means redoing the work, writing a lot of software twice, having a more complex lineup, annoy your clients, disappoint your daughter, have the dog piss on your slippers. It's obviously costing more than having one unified architecture and one unified SW stack, which is what Nvidia does.

So why? Because saving area on a die is how you save production costs. You're also spending more development efforts towards making something more optimised and compact, efforts that won't be taken making better software or hardware performance. You're more or less penny pinching. You're saving money for yourself, perhaps getting some wattage improvements and the like, but ultimately, not gaining ground on the competition ahead of you, merely focusing on spending less rather than gaining more. In a field where its about being ahead to win, it makes no sense, you're holding yourself back to save some extra money.
For a long time, I thought this was a silly strategy, that it made basically no sense for a company trying to catch up. So how did the silly strategy work for them?

CDNA and RDNA 1

2019 is where the RDNA era begins. CDNA 1 essentially flies off GCN, removes all raster/ROPs and does async compute.
RDNA gets some GCN and Vega DNA, and its own new features.

The only chip born out of CDNA is a massive 750mm² die, but it doesn't have a huge success. It does open the way for CDNA to be the fat, powerful thing that heavy compute demands.

(MI100 "Arcturus")

RDNA 1, on the other hand, tops out at a ridiculous 251 mm² die size. The size isn't the only problem either: while RDNA 1 is doing what Polaris tried to do but better (the 8-class card isn't at roughly the level of Nvidia's 6-class), it's immediately obsoleted by Nvidia's new RTX paradigm. While first generation Raytracing is just awful, it is accompanied with first generation DLSS. Wait, that's awful too. So is RDNA 1 doing...actually...good?

(Navi 10 die shot, it was so small the cameraman missed)

Well, no, RDNA 1 isn't very good. But it can afford to be. It can afford to be tiny, and it can afford to be obsolete. Because its job is to act as a test bench for RDNA after the relative failures of the late GCN and Vega eras. It's not about rocking the market with a monster product, it's about getting something out that's sound, modern, and functional. It's meant to announce that Radeon is back on track to build something competitive, and it is competitive (with Nvidia's 2070 anyway, and 20% cheaper), so it's fine for what it's meant to be.

CDNA and RDNA 2

2020 is where the real first serious punch is thrown since all the way back to the 2015 Islands era. On the CDNA side, we have an actual monster like only AMD knows how to make them: MI 250X. On the RDNA side, it looks like AMD finally decided to stop playing and fires out two real monsters: the first real big die in an eternity, and the little obsession from the CPU side that spills over into GPU.

Allow me to introduce the Hammer:

(MI Instinct 250X, "Aldebaran")

So in case your glasses are really dirty, that's two dies, not one. Technically one but two chips. It's chiplets. Zen has a nephew.
MI250X, despite a raw performance capable of obliterating any benchmark, will not see a massive success either. Oh sure, it's only been put in the biggest supercomputer of the world and stayed on top of any performance chart for years. It's pretty great. But it won't even really start denting Nvidia, simply because that dual chip design is going to make things difficult on developers. And they already have enough things wasting their time.
But although MI250X won't seriously dent Nvidia's Fortress of Compute, it does indicate what's to come. There are forces at work to really go hard, very hard, against them. And they're not playing around, even if their tricks haven't worked yet. CDNA 2 is the moment where the message gets clear: AMD aims high, and won't stop.

On the consumer side, though...

Actually it's just as huge.

If CDNA 2 is the giant hammer trying to dent the fortress walls, RDNA 2 is when AMD takes RDNA 1 and transforms the essay. Essentially a broad tweak/refresh of RDNA, it introduces Infinity Cache and greatly modernises everything that RDNA 1 didn't have the time to do. Mesh shader support, a first generation Raytracing implementation, and more importantly than anything, BIG FREAKING DIES. Real raster monsters that for the first time since maybe 2013, genuinely compete with Nvidia's top end.

Sure, they only ended up being N°2. But for the second gen since the rebirth of Radeon, it was a near perfect back to back competition. Nvidia puts the 3090, the 6900 xt is right below it, very close, and a lot cheaper. AMD pushes the 6950 XT out, beats the 3090, so Nvidia responds by boosting the 3090 to 600W and creates the 3090 Ti.
For the first time in forever, both sworn enemies aren't just playing in the GPU playground, they're actively slapping each other in the face back to back. AMD may lose, but Nvidia walks away with red cheeks and a severe warning that the easy days of their foe being about "value proposition at the midrange/entry level" are gone.

RDNA 2 rekindled a very long lasting hope in the GPU world that AMD would actually fight Nvidia in earnest and perform the miracle of felling the giant again, after doing it to Intel. It was a time of hope, and a time of glory, even if it was mitigated by still being number 2, they went from competing at the lower midrange to competing at the absolute top.
But it was only step two in a very long plan.

Mahboi · Sep 16, 2024

A Good Rasterizer, In a time of Bad Raytracers

While RDNA 2 is unquestionably a surge in the GPU war, after so many years of total Nvidia domination, the success is mitigated.
RDNA 2 does succeed at everything it set out to do in the original goals. But as technology tends to do, the goalposts have moved.

Raytracing, which was frankly pants during the Turing era, has now evolved. It's still pants, Nvidia's cards severely lack in VRAM and below the 8 class cards, it's mostly an unusable or barely usable gimmick. RT was a meme in 2018, and may become the standard for all gaming GPUs in 2028, but in 2020, it's still 85% meme.
Yet, if Ampere's Raytracing is essentially a joke below the 8 class card, it's still a marketing argument. A STRONG marketing argument, in an era where objectively, raster is a done deal. It's not as if there isn't more raster performance to make, there will be even in 15 years, but there is the Raster you need and the Raster you want. Want, RDNA 2 has it aplenty. Need...not much is needed. I'd argue a card as low as the 6700 XT, RDNA 1's top end's successor, is already enough to satisfy essentially all rasterization needs.

So if RDNA 2's raster power isn't that important, how is its RT?
Abysmal.
Truly, abysmal. Unusable. Next to barely worth mentioning. The top end factory overclocked RDNA 2 card, the 6950 XT, can somehow do some token raytracing if you're ok with dropping 60% of your framerate and burning 500W to get a 40 FPS experience. It is not even worth turning on.
Despite being a strong value proposition through its raw power, RDNA 2 fails to really dent Nvidia's position. If it were just Raytracing, it'd be something, but RDNA 2 suffers from the general problems mentioned at the top. The 3rd party software support isn't that good. It lacks strong Matrix/Tensor capabilities. A lot of video editing, encoding, 3D modelling software just doesn't run too well off RDNA 2. Nowhere near as good as Ampere anyway.

So while RDNA 2 has busted the Castle of Raster's front gate with smashing success, Princess Jensen is in another Castle. There was a lot of arguing back in those days about how "raster was the real performance, RT is a gimmick, Nvidia is on the back foot", and I'd be inclined to agree...except one gimmick is one thing. 15 gimmicks is another. Nvidia had years ahead of AMD in the 2012-2018 era, and they could afford to get a lot of strides in comfortably. When you pay $800 for a GPU, you expect a high end experience. Not a "it can do the main job great but most of the side stuff won't be as good as the competition".

Have you ever heard of someone sitting in a Lamborghini and being told "no air conditioner or drive assist, you gotta buy Ferrari for those gimmicks"?

The FSR Paradox

Nowhere does AMD's paradoxical situation with RDNA 2 shows better than with FSR.
FSR 1 was a fairly simple first upscaler implementation. If DLSS 1 was bad, FSR 1 was pretty bad too. They were both uninteresting features that felt more or less gimmicky. But DLSS 1 came out 2 whole years before FSR 1. When DLSS 2 came out, it had a long way ahead of FSR 2, which did something really odd: they didn't use AI for image correction.

DLSS and later DLSS 2 were coherent with the GPUs they were developed for. Turing had Tensor Cores, Ampere had improved versions of those. DLSS was designed to use their AI accelerators to correct the image, even if the core upscaling technology was just a series of algorithms.
FSR, on the contrary, wanted to be universal and open source, and to run on every hardware possible. While laudable, it created this rather insane paradox that AMD still hasn't really found a solution for in 2024: it was both capable of working on weaker cards, and terrible at it.

For upscaling, you create a sort of fake image based off a real, lower resolution image. Logically, the more information you have, I.E the greater the base resolution, the easier it is to upscale to a credible image without any visible flaws.
FSR, which doesn't use AI correction for flaws, thus benefits a lot more than DLSS from higher resolutions. DLSS can make a fairly decent image from 1080p or 720p, arguably from 540p. But FSR cannot. The flaws are visible at every resolution (shimmering especially), but the situation is clearly less visible at 4K than at 1440p, and at 1440p than at 1080p.

So the paradox isn't that FSR isn't as good targeting 1080p than 4K. The paradox is that it's the only one of the two that tries at all.
What is the point of FSR running off Polaris, Vega, Pascal, even Maxwell possibly, if it is never going to be able to get anything better than 1080p with these cards, and yet that's where it will be at its worst?
The Nvidia tech locks in and produces a good image, while AMD bothered themselves with running on every GPU possible and yet produce a poor image especially on those GPUs Nvidia didn't bother trying with.
It's arguable that this still makes sense commercially, to create partnerships with other companies or port FSR over a lot of hardware.

But from a technology standpoint, it's a perfect example of AMD's paradox of the RDNA era: they have the raw power, they have the capacity, but all the side things, all the nice extras, they all feel awkward, unprepared, not thought out, dysfunctional.
Nvidia in the Ampere/Lovelace era (arguably Turing too but that's stretching it) could say: "yes, our RT is just way too demanding and your GPU won't be able to play at a decent framerate on your standard resolution, but just turn on DLSS performance, and it'll work *wink wink*". It was a coherent SW/HW stack: Raytracing capability with specialised hardware, upscaling with AI correction, and the specialised hardware to run that AI fast.
Meanwhile AMD had abysmal raytracing that ran off software based compute (and not all of it could run off the GPU only), upscaling that didn't use any AI and looked poor/buggy, and no AI accelerators even if they wanted to use them.

RDNA 2 may have won the war in terms of perf/price, but the war it won in 2020 was the war Nvidia had considered done and over with by 2016 with Pascal.

Mahboi · Sep 16, 2024

The Legend of the Golden Child

If RDNA and RDNA 2 were the try and conversion, and they raised the AMD Standard on the battlefield, it was merely the engine of a steamroller coughing up before rolling over Nvidia.
For months and months, almost years, the forums and twitters were echoing about the legend that was being forged in the Red Foundries of Austin: RDNA 3.
This wasn't trying anymore. It wasn't fighting Nvidia with a prospect for a very close N°2 spot. It wasn't going for a hit, or slaps, or a good punch. It was E.Honda falling all flat on Jensen's face. It was the Meteor crashing into Gaming City. It was a demon.

https://twitter.com/x/status/1583550898638245889

Isn't that right Mr Leaker?

Alas, Poor ROPick, I thought I knew ye

But sometimes good stories don't end well (american movies lie to you all the time). Sometimes something stupid happens. Something so absurd nobody can even think of it. Like a stupid as hell 10 year old part that just happens to not have been ever questioned because...because why would you? It's 10 years old and has worked since GCN 2.0.
But sometimes, you're betrayed.

Out of all the stupid things, RDNA 3's glorious performance was tested with compute. And apparently nothing else, until very late in its development cycle.
What happens after compute? Rasterization, of course, at least for videogames. It so happened that a late stage rasterization pipeline element, which I believe was the ROP but memory fails me, was not capable of taking RDNA 3's massive 3Ghz clocks. Not without consuming an absurd amount of wattage.

RDNA 3, the Golden Child promising to finally go from N°2 to Nvidia being squarely put in its butt, claiming a 50% performance improvement with a 50% consumption improvement, ended up offering a 50% performance improvement...with a 0% consumption improvement.
If you took Navi 31's full die, and set it versus Navi 21's full die, at 500W, it could reach 30% better performance, maybe...but to reach the promised clocks, to reach the impossible super goals of RDNA 3, to get that monster 50% performance leap...it took 750W.
The entire generation was a total bust.

All of the preparation, all the hard work, years of planning to get there...undone. Even the rumored top Navi 3 die, something way bigger than even the 7900 XTX, was all washed away.

Jensen had won. Without even trying. All his advantages, all the buttresses he was sitting behind, the walls of the Compute Capital had held. Without even feeling so much of a tremor. AMD had failed, and Nvidia won without a real battle.

But although on the battle for gaming GPUs, the slaughter had been complete and swift, the Red Generals fleeing before Lovelace's swift and easy victory...it wasn't the only battle that had been led that gen.

Mahboi · Sep 16, 2024

Aqua Vanjaram

Hey Jack you think that's enough CUs or should we put in some more?

While in the consumer space, RDNA 3 died quite literally from an arrow in the ROP, over on the expensive Big Compute side, slowly and I imagine quite painfully, AMD birthed the biggest baby ever born to any engineering team. A chip so fat it looks like a city of silicon, so demanding that it takes 7 months to get all the production chain to finalise just one.

Behold, MI300. A cool 192Go of HBM3 and faster, way faster compute than anything ever before, on a multi-die monster. Way faster than anything Nvidia ever built. Faster than anything ever made. With interchangeable CPU and GPU parts depending on needs. Advanced Packaging? No, this is more like Advanced Packaging 2.
While the consumer space stumbled and knelt before Jensen's might, the Hammer of MI250X was succeeded finally by something far and away greater than anything built before. If MI250X demanded to get freaky with its dual dies, MI300X demands to just flex out with its 8 GCDs. MI300A even has the luxury of being one of the most amazing APUs ever made, and to trump anything Nvidia has ever made. Forget the mostly plate-passing Grace chips to their Hopper: this is another level of engineering entirely.

Sure, Nvidia still held on to its position despite that. It's not as if people were going to leap out the window Black Monday style because AMD had created a monster. NV still held a far more advanced CUDA stack than what ROCm had. It still held on to NVLink. No matter that 8 MI300X could be glued together on one board, Nvidia could scale its somewhat meh H100s out to hundreds and hundreds. 1.5To of HBM3 is nothing vs 400 * 80Go of the same, all unified into one giant GPU.

But if RDNA 2 only grazed the N°1 spot, MI300X didn't graze the spot, it shoved Hopper off like a super heavyweight beating off an angry featherweight. Where RDNA 3 would console itself with being back to being "price competitive" against the second place from Nvidia's top, MI300X was so damaging to Nvidia's position that they tried to write some ridiculous assassination piece on it on Dec 14 2023, which was responded to the very next day. In case you had any doubts about the response, Nvidia went silent as a tomb after this and never ever mentioned MI300X ever since.

And so, Here we are

I'm writing this as of Sept 2024. RDNA 4 isn't out yet, and even though guesses are aplenty, we only know that it'll be very small. There are no ambitions with the next generation of consumer GPUs. As for MI325X or MI350X, it'll probably carry on being huge and amazing. But as of now, the story has ended.
The article, though, has not. Actually, while the bulk of it is done, there is one important aspect that I've been keeping under my elbow: I've actually been lying to you this entire article.
Or more specifically, AMD has.
I tend to write these things when I have been thinking about something for a long time and something clicked in my head, and I need to empty it all before I forget where my logic came from.
So let's get to what sparked this entire piece.
How has AMD been lying to us all these years?

Mahboi · Sep 16, 2024

Breaking into two...

So I've said that AMD reinvigorated Radeon with a plan that had to comprise 4 goals:
A stronger GPU than Nvidia
Far better software support
An enterprise/3rd party oriented compute stack they can support
All the little side things that matter (encoders, Matrix cores, upscaling, raytracing, etc)

So where is the lie? Isn't that exactly what they've been doing?
They did, throughout all these years, improve on their software support.
Just look at Technotice's opinions on RDNA 3 at the start:

And now:

ROCm is not at CUDA level (ZLUDA sure proved that the gap is still large) but it keeps on advancing.
All the side things are effectively getting there, RDNA 3 made huge leaps with a mostly solid AV1 encoding, WMMA, RT fixes, upscaling, etc...

See it yet? The one that I've missed?
A stronger GPU than Nvidia...

...then faking it for 8 years...

For the longest time, I followed the general narrative that AMD was trying its hardest to compete with Nvidia in the GPU space. That NV was just so far ahead that they couldn't really put in enough resources to catch up. For years I dismissed it because the notion that AMD was faking it made no sense: they were clearly trying to compete on software, to gain high performance, to fight Nvidia with an overall better perf/price, and to do everything Nvidia did, FSR, Raytracing and all. I was annoyed at all the penny pinching methods and principles, but I assumed that it was merely because they felt they couldn't compete with the level of investment Nvidia could put in. After all, if Nvidia owns 80% of the GPU market, they can amortize what they have immensely better than AMD, can't they? It makes sense to remain as reasonable as possible with expenditures...

But no, that was wrong. Insanely, it's the Nvidia redditors that were correct. AMD was never trying to compete with Nvidia. Those types never could give any reason other than "they don't compete because they're lazy", so obviously they get no points for their non-existant brilliance...but they were right, as the left hand side of the midwit meme would be:

But hearing about UDNA and the reunification of compute and gaming archs, I've realised that no. They didn't.

Let's recap the few clues I've left around the article:

Mahboi said:
Do all of the above without spending as much money as Nvidia: Split the architectures. Because saving area on a die is how you save production costs. You're also spending more development efforts towards making something more optimised and compact, efforts that won't be taken making better software or hardware performance.

Mahboi said:
So if RDNA 2's raster power isn't that important, how is its RT?
Abysmal. Truly, abysmal. Unusable. Next to barely worth mentioning.

Mahboi said:
FSR 2, which did something really odd: they didn't use AI for image correction.

Mahboi said:
Jensen had won. Without even trying. All his advantages, all the buttresses he was sitting behind, the walls of the Compute Capital had held. Without even feeling so much of a tremor. AMD had failed, and Nvidia won without a real battle.

Now imagine the other way around on all these elements.
1) The archs do not get split.
2) RDNA 2's RT, or at the very least RDNA 3's RT, is not a 90% software solution with as little silicon as possible spent on it.
3) FSR 2 does use AI from the start.
4) RDNA 3 isn't built as a relatively cheap solution with smaller CUs than RDNA 2 and instead grows in size and power.

If nothing else, imagine a top RDNA 3 card that instead of being a 300mm² GCD with 6 24mm² MCDs, was a full 400mm² GCD with 8 24mm² MCDs. It's far from excessively hard to do. The point of chipletization is that you can scale much more easily. If the 7900 XTX could be a 520mm² chip, why couldn't it be a 600mm²? Or even a 700mm²? Why couldn't AMD, who had the competence to make a truly huge die, didn't? Nvidia pumps out 600mm²+ dies nearly every single generation and they're all monolithic. If AMD can make a 300mm² GCD and plug memory controllers on it, doing a 400mm² one is definitely doable. So why not?

Because they were never intending to try. This was a deliberate strategy to not compete and to stay behind Nvidia at all times.
Why would they do this? Why would they refuse to fight when they clearly had the weapons to? Yes RDNA 3 was borked, but it wasn't unfixable. RDNA 3.5 actually fixed it.
RDNA 2 did not have any real RT hardware, but nothing prevented it being made for RDNA 3. Yet again, they cheaped out. They did not try to get a competitive silicon product out.

Mahboi · Sep 16, 2024

...and getting themselves ready...

So why effectively give up? All these years of actual non-competing while faking, what was it for?
It was about getting ready. Getting the support, 3rd party and all the side things, as well as develop their hardware stack enough to finally hit hard all at once.
It's not as if AMD never intended to compete again. But effectively, since RDNA 1, they haven't. They have LOOKED the part, but didn't really try. It's not just Nvidia that abandoned gamers for the AI/Compute market, it's both of them. Neither of them was putting in any effort, AMD was just better at looking the part.

This explains all the oddities that we've been observing for years:
- software support for RDNA is legendarily slow
- after RDNA 3 came out, it took 7 months to get VR to work without stutters
- crashes are common for months after release with tons of video editing/3D modelling software
- meanwhile the ROCm enterprise stack/Radeon Pro drivers have the reputation of being "solid"
- AI support is very slow to come out despite WMMA, while MI300 gets quick upgrades
- despite RDNA 3 chiplets, AMD somehow has way smaller dies than Nvidia
- despite hardware raytracing being possible, they only produced a token software solution
- FSR doesn't use any AI acceleration even with RDNA 3
- FSR development is very slow compared to DLSS or even XeSS
- every part of the Radeon stack is designed for penny pinching and area saving
- RDNA 3 is borked, and yet doesn't get fixed. No 7950 XTX, 7850 XT or any replacements are made, despite RDNA 2 having had factory overclocks for the 6700 XT, 6900 XT and 6600 XT
- RDNA 3.5 is made, thus fixing RDNA 3, yet the replacement cards still don't come out
- RDNA 3 is borked, and yet RDNA 4 is promising to have very small performance as well, and is immensely less ambitious than RDNA 3
- RDNA 4's midrange to high end is canceled, and yet we hear RDNA 5 will reunite with CDNA 4 into UDNA
- RDNA 4 is probably not canceled because it had real problems, but because there's no point in investing in RDNA anymore

...to complete the circle.

The UDNA announcement is what sparked this entire article. This is not a mere administrative change. It's telltale of two things: one, that AMD effectively wasn't trying all these years. RDNA was designed to be cheap from the start, and the reason it always got the short end of the stick isn't due to some intelligent strategy. There was no intelligent strategy. AMD didn't want to seriously compete, and that's it.

Two, that they feel ready to actually try again.
AMD hasn't been trying to seriously compete since the start. Arguably RDNA 3's failure forced their hand, but it's no accident that they didn't push for a fix, nor did they push for a more ambitious product from the start.
Yet now, as they drop UDNA as the next target, I think everything makes sense.
They are putting the ambitious compute stack and cheap gaming stack all into one. They're breaking away from the intention to keep the gaming segment dirt cheap to make, giving up on their sacro-saint margins and penny pinching. Considering all they've been showing since RDNA 1, I can only think of one reason.

They want to go at it again. RDNA 4 is the last of the fake tries. Whether RDNA 5/UDNA will be successful or not, I obviously have no clue. But if you read their actions all these years as bending hard to leap higher, then their strange strategies and constant "blunders" make perfect sense. It wasn't blunders, it was literally not trying to compete. FSR, RT, die sizes, low support, no fixes, everything. AMD has not been trying for years.

And with UDNA they will.

So what about RDNA 4?

Nothing. I don't think they intend to do anything but fire and forget RDNA 4. It's going to be an interesting experiment with RT finally getting some improvement at Sony's request and probably a few cool new things, but obviously while the SW and design will be interesting, there are no ambitions to expect. 3rd party support and software improvements will carry on slowly much like they have throughout RDNA 3's lifespan. Perhaps at best we can time how long it takes to correct egregious bugs or respond to cries about crashes in popular software.

RDNA 4 will be a forgettable generation much like RDNA 1 was. The RDNA Circle will complete on a whimper, not a roar.
Yet, for the first time in years of paying attention to them, I feel like I finally understand what they've been doing and why. Why all the weird choices and decisions, the outlandish goals and obsession with saving every penny, denying every transistor, holding on to area like Lisa was going to visit with a ruler and an itch for slapping bad engineers who had wasted a nm².

AMD wasn't competing. RDNA 4 is the last chapter of that 2019-2026 cycle, of nearly 7 years or more of pretending to fight Nvidia at the consumer level. Yet, what comes after should be seen as the first genuine strike in a war that was faked for too long. Nvidia's reckoning may totally fail - it wouldn't be the first time. Or it could be that MI300X's far away cousin comes and tramps over all the Empire that Jensen spent so long building.

We will see the Red Devil rise again.

poke01 · Sep 16, 2024

Mahboi said:
Nvidia's reckoning may totally fail - it wouldn't be the first time. Or it could be that MI300X's far away cousin comes and tramps over all the Empire that Jensen spent so long building.

First of all, great write up! I do hope AMD fights Nvidia in all aspects I mean ALL aspects. Nvidia too failed many times but Jenson picks it up again.

Shmee · Sep 16, 2024

So basically, are you saying that AMD graphics will improve dramatically over RDNA 2 and 3, now with UDNA and a new series of architectures? Interesting take, and exciting if true, as I already think some aspects of RDNA 3 and RDNA 2 did quite well.

marees · Sep 16, 2024

Imo, the Radeon team pulled off a mirace with RDNA 2
I blame consumers(& reviewers) for it not being more successful

RDNA 3 seems to have been an unexpected (& probably) last minute failure at the hardware Russian Roulette

UDNA if it releases in 2026 will be 20 years behind CUDA
Coincidentally AMD acquired ATI the same year that CUDA came out. That was the age of "real men had fabs". How things have changed. TSMC is the only game in town for GPUs now.
Wishing my best for UDNA. There is a lot of catching up to do

Things have changed a lot from 7 years ago. The PS5 pro costs between $800 to $1000. Moore's Law is Dead. You need software tricks like PSSR to sell hardware. The margins of Radeon are abysmal. I don't think it makes sense to continue manufacture gaming cards unless it supports UDNA.

I believe Radeon still has ambition to produce cards such as 3990 / 4990 / 5990 / 7990 etc. But nothing is assured if their moonshots will work.

Arctic Islands · Sep 17, 2024

Mahboi said:
What happens after compute? Rasterization, of course, at least for videogames. It so happened that a late stage rasterization pipeline element, which I believe was the ROP but memory fails me, was not capable of taking RDNA 3's massive 3Ghz clocks. Not without consuming an absurd amount of wattage.

Any source of leaks support that? I only know 7900XTX underperforms >15% for some reason, but I don't know why.

coercitiv · Sep 17, 2024

Tried reading last night, but when I realized there's no end in sight on my phone I just went to sleep instead. Maybe I'll try again tonight, the OP really put some effort into it.

We can obviously think of many 3D chess moves in this race, but sometimes it's also a good idea to start small in terms of observations:

Nvidia moved away from unified arch
AMD followed
Nvidia moved back towards unified arch
AMD follows

IMHO AMD is following the money, first they moved away while trying to be more competitive in gaming, now they're coming around because they want gaming to piggy-back on their huge bet in the data center space. To me this isn't necessarily good news, except for the fact that they finally seem to take their software stack seriously. At this point maybe that's the more important aspect, as it transcends multiple hardware generations when done right.

leoneazzurro · Sep 17, 2024

marees said:
UDNA if it releases in 2026 will be 20 years behind CUDA

You are comparing a graphic architecture with a programming environment. It makes no sense.

marees · Sep 17, 2024

leoneazzurro said:
You are comparing a graphic architecture with a programming environment. It makes no sense.

Both have U for unified. So the point is from very beginning nv was focussed on unified software stack whereas AMD was focused on hardware engg.

I.e. NV is a vertically integrated software company like Apple

leoneazzurro · Sep 17, 2024

marees said:
Both have U for unified. So the point is from very beginning nv was focussed on unified software stack whereas AMD was focused on hardware engg.

I.e. NV is a vertically integrated software company like Apple

Yes to the first part but in any case hardware architecture and unified software are different things, you can have an unified stack with different architectures, i.e. CUDA works on vastly different hardware implementations of very different generations of products, just like AMD's ROCm does (or as it should do). AMD being late in getting an optimized integrated stack similar to CUDA has nothing to do with them having different architectures for the consumer and HPC spaces. And they are as an integrated software company as Nvidia. They are simply late and worse at that.

Mahboi · Sep 17, 2024

Shmee said:
So basically, are you saying that AMD graphics will improve dramatically over RDNA 2 and 3, now with UDNA and a new series of architectures? Interesting take, and exciting if true, as I already think some aspects of RDNA 3 and RDNA 2 did quite well.

At least they'll stop putting the penny pinching first. Which should have dramatic effects.
Software RT solutions? No, hardware ones.
Minimalistic FSR HW support? No, maximal one.
Smallest, cheapest competitive chip? No, go big.
Go for density? NO GO BIG GO BIG.
And so on.

Just think of it as CDNA being the baseline instead of RDNA. I almost want to say that size/performance won't be a question anymore, only price (and watts obviously).
Essentially they'll stop faking competition with minimalistic responses and start going hard. The fact that they're reunifying means that they feel ready for the big leap IMO. Gloves off and all.

coercitiv said:
Tried reading last night, but when I realized there's no end in sight

It was a very long night indeed...

Mahboi · Sep 17, 2024

Arctic Islands said:
Any source of leaks support that? I only know 7900XTX underperforms >15% for some reason, but I don't know why.

Pretty sure the source was our very own Kepler_L2 but I can't recall. Should be in the RDNA 3 topic anyway.
Also one strong indicator is the compute perf. RDNA 3 actually trounces Lovelace when it's not software limited (I.E extra effects, special stuff that it doesn't cover SW or HW wise). It's better performance across the board, check the technotice videos I posted. Compute works fine, it's raster that doesn't.

leoneazzurro said:
Yes to the first part but in any case hardware architecture and unified software are different things, you can have an unified stack with different architectures, i.e. CUDA works on vastly different hardware implementations of very different generations of products, just like AMD's ROCm does (or as it should do). AMD being late in getting an optimized integrated stack similar to CUDA has nothing to do with them having different architectures for the consumer and HPC spaces. And they are as an integrated software company as Nvidia. They are simply late and worse at that.

Eh, it's a bit more complex at the driver level. You do get extra code, extra support, extra bugs. This is why for example CDNA got AI support just fine, but RDNA had to get N31, then N32, then N33 opened up bit by bit.
Also, if RDNA has WMMA and CDNA doesn't, you have to do two implementations of matrix ops. It's just more complexity no matter how it's sliced.

leoneazzurro · Sep 17, 2024

Mahboi said:
Eh, it's a bit more complex at the driver level. You do get extra code, extra support, extra bugs. This is why for example CDNA got AI support just fine, but RDNA had to get N31, then N32, then N33 opened up bit by bit.
Also, if RDNA has WMMA and CDNA doesn't, you have to do two implementations of matrix ops. It's just more complexity no matter how it's sliced.

That's for sure, but Nvidia also has the same issue even if at a lower extent: its not that Ada and Turing work in the same way at low level, and HPC solutions from Nvidia in any case have for sure different optimizations respect to their consumer counterparts (due to the different workload focus and different memory subsystem).

Mahboi · Sep 17, 2024

And with this article: 1000th post!

Golden Boi time.

Z O X · Sep 17, 2024

In other words, AMD was short with money and people.

Mahboi · Sep 17, 2024

Absolutely not.

So how do you gain more money than the other side while having a worse product?

By selling a token product without actually trying to compete on the consumer side while focusing the real effort on the enterprise side.
Until your enterprise product is mature enough to become your consumer product too. Hence UDNA.

Article State of RDNA era AMD

Golden Member

Look on my GeForce, ye chipmakers, and despair​

But How?​

Golden Member

A Dwarf taking on a Giant​

CDNA and RDNA 1​

CDNA and RDNA 2​

Golden Member

A Good Rasterizer, In a time of Bad Raytracers​

The FSR Paradox​

Golden Member

The Legend of the Golden Child​

Alas, Poor ROPick, I thought I knew ye​

Golden Member

Aqua Vanjaram​

And so, Here we are​

Golden Member

Breaking into two...​

...then faking it for 8 years...​

Golden Member

...and getting themselves ready...​

...to complete the circle.​

So what about RDNA 4?​

Platinum Member

Memory & Storage, Graphics Cards Mod Elite Member

Senior member

Junior Member

Diamond Member

Golden Member

Senior member

Golden Member

Golden Member

Golden Member

Golden Member

Golden Member

Junior Member

Golden Member

Look on my GeForce, ye chipmakers, and despair

But How?

A Dwarf taking on a Giant

CDNA and RDNA 1

CDNA and RDNA 2

A Good Rasterizer, In a time of Bad Raytracers

The FSR Paradox

The Legend of the Golden Child

Alas, Poor ROPick, I thought I knew ye

Aqua Vanjaram

And so, Here we are

Breaking into two...

...then faking it for 8 years...

...and getting themselves ready...

...to complete the circle.

So what about RDNA 4?