Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

Ajay · Oct 14, 2023

Frenetic Pony said:
I'm glad AMD finally kicked their (department) head out, they've really needed someone that knows how to compete with Nvidia in mindshare terms and that guy clearly could not.

Well, they need a good Halo product each generation. It grabs headlines. They were shooting for that with Big RDNA4, unfortunately, they got behind the Eightball on that.

Oh, finally ran across that comment (which I think was removed) for the AMD engineer:

@BrockSuire75

I want to put out something about AMD GPU s. AMD is not leaving the high end market. As some have seen all over the internet about AMD to stop making high end GPU s. This is 1000% false. AMD has next Gen cards being validating as we speak. I have a few engineering samples I'm evaluating. Keep dreaming, never let anyone stop you!

That was a few months ago, so I doubt he was talking about RDNA5.

branch_suggestion · Oct 14, 2023

TESKATLIPOKA said:
Thanks for presenting Nvidia's superiority in a thread mostly about RDNA4.
I really appreciate your and the others' effort in spamming this thread with pretty much unrelated stuff.

NV astroturfers already forced the closure of the B3D HW forum. There is basically nowhere that AMD GPUs are discussed that isn't eventually harassed by the usual suspects heralding how RTG can never compete.

TESKATLIPOKA · Oct 15, 2023

Frenetic Pony said:
Anyway, ticking it over in my head, CU counts are probably
40CU/128bit
60CU/192bit
80CU/256bit
120CU/384bit

160 is too big and 100 is too odd, and this setup matches CU to bandwidth perfectly.

You forgot about Infinity cache, which would skewer that CU/BW ratio heavily.

Effective Infinity cache 2 Bandwidth amplification.
96MB: 2304B/clock * 1.88GHz = 4332GB/s * 0.53 = 2296GB/s
76MB: 1824B/clock * 1.88GHz = 3429GB/s * 0.46 = 1577GB/s
64MB: 1536B/clock * 1.88GHz = 2888GB/s * 0.42 = 1213GB/s
48MB: 1152B/clock * 1.88GHz = 2166GB/s * 0.35 = 758GB/s
32MB: 768B/clock * 1.88GHz = 1444GB/s * 0.27 = 390GB/s

If you wanted almost linear BW increase from IC, then you would need:
120CU -> 96MB
80CU -> 76MB
60CU-> 64MB
40CU -> 48MB

Then the next question is the clockspeed for those GPUs.
Even If you kept the clockspeed at RDNA3 level, you would need faster Vram for everything, so GDDR6 is out of the question.
And I would like to see 24gb modules being used.

Kepler_L2 · Oct 15, 2023

tajoh111 said:
MI300 is coming out too late.

Particularly if you look at Nvidia latest roadmap.

I suspect most of the mi300 this year is going to EL Capitan.

Meaning most of the MI300 being sold is being done in 2024.

This is going to be compared against Blackwell which is being unveiled in March of 2024. With Nvidia going to a yearly release for datacenter, it gives them the ability to use the fastest Memory, manufacturing tech giving AMD a brutal competition. Rumors point to Blackwell selling in high volume in 2nd of of 2024.

This has other consequences for AMD because this type of volume and money may cause AMD to have supply issues with TSMC.

With Nvidia likely being a 100 billion revenue company next year and net profit in the 40 to 50 billion range(a quarter being greater than AMD total profit in the last decade), they will have the grunt to wipe out AMD's AI data center plans through strangulation in the supply chains, developers support and accelerated road maps.

Hopper is sold out of 2024 which translates into about 80 billion dollars(2 million units at 40k each). Add in other Nvidia revenue like Blackwell and gaming and it is simply a monstrous amount of revenue. This gives Nvidia the financial horsepower to produce a 3nm data center chip in 2024 which is going to be compared against 5nm mi300.

By the time AMD get's to 3nm, Nvidia will be on 2nm. AMD is losing it's position at TSMC.

TSMC Is Sprinting to 2nm to Satisfy Demand From Nvidia, Apple

Getting ready for 2nm trial production and using Nvidia AI for optimized chip floor planning

www.tomshardware.com

This will have consequences for AMD in the rest of it's product roadmaps as Intel uses TSMC more and it loses clout at TSMC. AMD has a hard choice ahead which division is going to be sacrificed in order for the rest of the products to succeed. I think consumer graphics is going to be that item that gets heavy cuts again like in the past. The cancelation of Navi 41 was a prelude to this I think.

AMD needs to have more forward thinking. Nvidia dominated supercomputers in the past and still has 5 of the top 10 super computers in the world.

https://www.top500.org/lists/top500/list/2023/06/

Look at the rest of the list and it is still dominated by Nvidia. Nvidia has just moved on to bigger and better things. A couple super computer from the US Goverment worth 1.2 billion dollars every 5 years is decent money for AMD..... but for Nvidia that's soon to be the weekly sales of H100, with better margins to boot.

Cool story but AMD is the first N3E and N2 customer.

TESKATLIPOKA · Oct 15, 2023

As a continuation to my previous post, I would like to see these specs for GPUs(N44, N43, N42, N41).
It's just a speculation and I calculated everything as a chiplet.
MCD is now 12MB + 32-bit GDDR7 paired to a 24gbps module, just so I can make easy cut-down chips.

GPU	Shader Engine	CU(WGP)	ROPs	Frequency	Infinity Cache	Memory width	Memory speed	Vram	TBP	Performance 4K (TPU)
RX 7600	2	32(16)	64	2655 MHz	32 MB	128-bit GDDR6	18 gbps	8 GB	165 W	100%
RX 8600	2 (N44)	36(18)	72	3000 MHz	36 MB	96-bit GDDR7	30 gbps	9 GB	125 W	~127%
RX 8600XT	2 (N44)	40(20)	80	3200 MHz	48 MB	128-bit GDDR7	27 gbps	12 GB	150 W	~151%
RX 7700XT	3	54(27)	96	2544 MHz	48 MB	192-bit GDDR6	18 gbps	12 GB	245 W	169%
RX 7800XT	3	60(30)	96	2430 MHz	64 MB	256-bit GDDR6	19.5 gbps	16 GB	263 W	204%
RX 8700	3 (N43)	54(27)	108	3000 MHz	60 MB	160-bit GDDR7	32 gbps	15 GB	190 W	~227%
RX 8700XT	3 (N43)	60(30)	120	3200 MHz	72 MB	192-bit GDDR7	32 gbps	18 GB	225 W	~269%
RX 7900XT	6	84(42)	192	2400 MHz	80 MB	320-bit GDDR6	20 gbps	20 GB	315 W	261%
RX 7900XTX	6	96(48)	192	2500 MHz	96 MB	384-bit GDDR6	20 gbps	24 GB	355 W	311%
RX 8800	4 (N42)	72(36)	144	3000 MHz	84 MB	224-bit GDDR7	31 gbps	21 GB	250 W	~280%
RX 8800XT	4 (N42)	80(40)	160	3200 MHz	96 MB	256-bit GDDR7	32 gbps	24 GB	300 W	~332%
RX 8900	6 (N41)	108(54)	216	3000 MHz	120 MB	320-bit GDDR7	31 gbps	30 GB	375 W	~420%
RX 8900XT	6 (N41)	120(60)	240	3200 MHz	144 MB	384-bit GDDR7	32 gbps	36 GB	450 W	~498%

Yeah, I know both BW+IC BW is in some cases low and performance estimate is based on Flops increase compared to previous generation, so in real life It would be less.

Joe NYC · Oct 15, 2023

tajoh111 said:
This has other consequences for AMD because this type of volume and money may cause AMD to have supply issues with TSMC.

With Nvidia likely being a 100 billion revenue company next year and net profit in the 40 to 50 billion range(a quarter being greater than AMD total profit in the last decade), they will have the grunt to wipe out AMD's AI data center plans through strangulation in the supply chains, developers support and accelerated road maps.

Of all of the possible challenges AMD might face in challenging NVidia in datacenter GPU, strangulation in the supply chains (by NVidia) is not something that the supply chains will get fooled into (by NVidia).

The major capacity constraint now is CoWoS, but the supply is likely going to outpace the AI hype by 2025, and by this time, AMD will likely not even be using CoWoS for Mi400.

jpiniero · Oct 15, 2023

TESKATLIPOKA said:
As a continuation to my previous post, I would like to see these specs for GPUs(N44, N43, N42, N41).
It's just a speculation and I calculated everything as a chiplet.
MCD is now 12MB + 32-bit GDDR7 paired to a 24gbps module, just so I can make easy cut-down chips.

Yeah but the chiplet is dead. At least for RDNA4.

I was gonna say for the faster one being 40 CUs, 12 GB 128-bit GDDR7, perhaps 10% slower than the 7700 XT for $399. Which even that should be very competitive with Blackwell in raster.

AMD could release new N31 before then with faster GDDR6 if it exists and call that high end.

IMO, where Blackwell will get faster is (well over) a grand.

Timorous · Oct 15, 2023

jpiniero said:
Yeah but the chiplet is dead. At least for RDNA4.

I was gonna say for the faster one being 40 CUs, 12 GB 128-bit GDDR7, perhaps 10% slower than the 7700 XT for $399. Which even that should be very competitive with Blackwell in raster.

AMD could release new N31 before then with faster GDDR6 if it exists and call that high end.

IMO, where Blackwell will get faster is (well over) a grand.

Allegedly and it makes zero sense.

Dropping the MI300 style monster halo part, sure I get but I don't see that being cheap enough to be useful in the typical high and upper mid range part of the market. I also don't see monolithic being cheap enough to make dies bigger than 250mm or so viable either, not for AMD atleast who can't sell them in professional products like NV can, especially with how poorly cache and IO shrinks.

So that still leaves a part of the market that would be best served by chiplets.

Joe NYC · Oct 15, 2023

Timorous said:
Allegedly and it makes zero sense.

Dropping the MI300 style monster halo part, sure I get but I don't see that being cheap enough to be useful in the typical high and upper mid range part of the market. I also don't see monolithic being cheap enough to make dies bigger than 250mm or so viable either, not for AMD atleast who can't sell them in professional products like NV can, especially with how poorly cache and IO shrinks.

So that still leaves a part of the market that would be best served by chiplets.

The pressure to go with chiplets is only going to grow. RDNA5 will likely be at least N3E, so the difference between the cost of the compute silicon on N3E and I/O + cache on N6 is going to grow further.

BTW, I wonder what the process node was supposed to in the cancelled Navi 4c, if by any change it was N3B, and if that might have been another reason to cancel it..

jpiniero · Oct 15, 2023

Timorous said:
Allegedly and it makes zero sense.

Could be easily explained by AMD planning on doing scalable chiplets and simply ran out of time making it work.

branch_suggestion · Oct 16, 2023

Joe NYC said:
The pressure to go with chiplets is only going to grow. RDNA5 will likely be at least N3E, so the difference between the cost of the compute silicon on N3E and I/O + cache on N6 is going to grow further.

BTW, I wonder what the process node was supposed to in the cancelled Navi 4c, if by any change it was N3B, and if that might have been another reason to cancel it..

RDNA5 is N3P or N2, maybe a mix.
RDNA4 is likely the same deal as Zen5, originally N3B, backported to N4P.

Joe NYC · Oct 16, 2023

branch_suggestion said:
RDNA5 is N3P or N2, maybe a mix.

I guess depends on the release date.

But if it follows the same approach as Navi 4c, with only small part of the overall GPU being on the advanced node, the cost of that node is less of an obstacle.

I wonder what the next version of Strix Halo brings. Maybe a similar approach as Navi 4C, splitting the large SOC on advanced node (N3E?) to AID on N6 + SED on advanced node

branch_suggestion said:
RDNA4 is likely the same deal as Zen5, originally N3B, backported to N4P.

I wonder if that (having to backport to N4P wasn't one of the contributing factors to cancellation of Navi 4c

MoogleW · Oct 16, 2023

Considering the timelines of RDNA5 apparently moved over, could it be possible RDNA5 is backported from an orignal N2 design to N3E? I don't think the RDNA5 we will get is the same RDNA5 that would have existed in the future.

Would be interested if the core design is intact or some changes will be postponed quietly to RDNA6 so as not to have too many new changes in a short time frame

MoogleW · Oct 16, 2023

Joe NYC said:
I guess depends on the release date.

But if it follows the same approach as Navi 4c, with only small part of the overall GPU being on the advanced node, the cost of that node is less of an obstacle.

I wonder what the next version of Strix Halo brings. Maybe a similar approach as Navi 4C, splitting the large SOC on advanced node (N3E?) to AID on N6 + SED on advanced node

I wonder if that (having to backport to N4P wasn't one of the contributing factors to cancellation of Navi 4c

Surely not, the available process to use should be amongst the first things IHVs factor in to an architecture. This influences design rules, target efficiency, etc and these are booked in advanced

moinmoin · Oct 16, 2023

MoogleW said:
Surely not, the available process to use should be amongst the first things IHVs factor in to an architecture. This influences design rules, target efficiency, etc and these are booked in advanced

While true AMD likely is used to targeting multiple possible nodes at once. Remember that Zen 2 originally was intended to use GloFo's later cancelled 7nm node before it was eventually revealed that it will use TSMC's N7, seemingly so without any delay.

Frenetic Pony · Oct 16, 2023

N3B is awful, so I doubt RDNA4 was ever intended for it, there's a reason Apple is the only major customer for it and N3E was rushed out asap. N3E isn't terribly better, but just better enough to be worth it for the most competitive markets. It's not until N2 that TSMC will get back on track with something like a worthwhile advance.

It'd be more interesting to know how much AMD is looking at Samsung, or even (gasp) Intel! Intel right now is advancing rapidly in silicon foundry tech, not so much in design. Meteor Lake doesn't appear particularly competitive still with AMD, and this is the first one that Pat's been involved with in any meaningful way. If AMD can muscle Intel out of silicon design but use their foundry tech that would be the most optimal outcome for them.

Joe NYC · Oct 16, 2023

MoogleW said:
Considering the timelines of RDNA5 apparently moved over, could it be possible RDNA5 is backported from an orignal N2 design to N3E? I don't think the RDNA5 we will get is the same RDNA5 that would have existed in the future.

Would be interested if the core design is intact or some changes will be postponed quietly to RDNA6 so as not to have too many new changes in a short time frame

We don't know if and how much RDNA5 is going to be pulled in.

Maybe only RDNA4 is going to have earlier availability. Marketing would force AMD to hold back RDNA4 until the top SKU was ready and released. Since top SKU was the problematic one, holding RDNA4, cancelling it may have moved up the schedule of the other RDNA4 parts.

Joe NYC · Oct 16, 2023

Frenetic Pony said:
N3B is awful, so I doubt RDNA4 was ever intended for it, there's a reason Apple is the only major customer for it and N3E was rushed out asap. N3E isn't terribly better, but just better enough to be worth it for the most competitive markets. It's not until N2 that TSMC will get back on track with something like a worthwhile advance.

It'd be more interesting to know how much AMD is looking at Samsung, or even (gasp) Intel! Intel right now is advancing rapidly in silicon foundry tech, not so much in design. Meteor Lake doesn't appear particularly competitive still with AMD, and this is the first one that Pat's been involved with in any meaningful way. If AMD can muscle Intel out of silicon design but use their foundry tech that would be the most optimal outcome for them.

Intel would have to completely divest itself from any control of their foundry (+ couple of years of good behavior) for AMD and also any major customers to have trust in this foundry.

Samsung seems to be behind in hybrid bond packaging - but needs to catch up in hurry, in order to be able to make HBM4.

So unless / until Samsung can do 3D stacking of chiplets, Samsung would only be an option for monolithic packages...

As far as TSMC, as long as AMD is on par or ahead in design and AMD continues to be ahead in chiplets, there is not much of a reason for AMD to consider alternative.

AMD is moving to the top of the line with the #1 foundry, has strategic relationship with TSMC. No reason for AMD to ruin this.

Ajay · Oct 16, 2023

Joe NYC said:
We don't know if and how much RDNA5 is going to be pulled in.

I suppose that it's possible that the quick cancellation of Big RDNA4 allowed AMD to move more engineers on to the RDNA5 team. Still, it's really hard to pull in timelines by adding more engineers depending on when they are injected into the development process. The later they start, the smaller their impact. The best ways to speed up projects when I was a software were for us to work more hours (oh well) or cut out some features (even if they were half done).

jpiniero · Oct 16, 2023

Frenetic Pony said:
N3B is awful, so I doubt RDNA4 was ever intended for it, there's a reason Apple is the only major customer for it and N3E was rushed out asap. N3E isn't terribly better, but just better enough to be worth it for the most competitive markets. It's not until N2 that TSMC will get back on track with something like a worthwhile advance.

N2 has a small quality increase but the density gain is minimal. It's looking like another N3B. AMD might be the only customer of it (and for Turin Dense only)

Joe NYC · Oct 16, 2023

jpiniero said:
N2 has a small quality increase but the density gain is minimal. It's looking like another N3B. AMD might be the only customer of it (and for Turin Dense only)

I guess that would be Venice Dense.

Frenetic Pony · Oct 17, 2023

jpiniero said:
N2 has a small quality increase but the density gain is minimal. It's looking like another N3B. AMD might be the only customer of it (and for Turin Dense only)

Density appears to be done for either way, but unlike N3 (at least N3 vs N4P) the power usage at same frequencies is projected to drop a whole lot. For most parts today that's more than good enough, whether it's a consumer part in likely a mobile device or an enterprise/cloud part that's just going to go into some ridiculous chiplet package and substrate size anyway, nigh the entire spectrum can use better power efficiency well.

Orodruin · Oct 17, 2023

When will RDNA 4 be released? Will it come in 2025?

Frankly, I think it will fall behind Nvidia again. Because AMD urgently needs to make some changes and adjustments in the software wing. It produces very powerful cards, but it always lags behind Nvidia on the software side. Under normal circumstances, the RX 7000 series should be at par with Nvidia RTX4000 with technologies such as RTX or DLSS turned on. But somehow he always falls behind.

Joe NYC · Oct 17, 2023

candasulas said:
When will RDNA 4 be released? Will it come in 2025?

Frankly, I think it will fall behind Nvidia again. Because AMD urgently needs to make some changes and adjustments in the software wing. It produces very powerful cards, but it always lags behind Nvidia on the software side. Under normal circumstances, the RX 7000 series should be at par with Nvidia RTX4000 with technologies such as RTX or DLSS turned on. But somehow he always falls behind.

RDNA4 should definitely be released in 2024.

jpiniero · Oct 17, 2023

Frenetic Pony said:
Density appears to be done for either way, but unlike N3 (at least N3 vs N4P) the power usage at same frequencies is projected to drop a whole lot.

It's really not. And with minimal density gain, it's going to be stupidly expensive. You'd have to have a product where customers would gladly pay for the small power savings.

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Lifer

Senior member

Platinum Member

Senior member

Platinum Member

Platinum Member

Lifer

Golden Member

Platinum Member

Lifer

Senior member

Platinum Member

Member

Member

Diamond Member

Senior member

Platinum Member

Platinum Member

Lifer

Lifer

Platinum Member

Senior member

Member

Platinum Member

Lifer