Discussion RDNA4 + CDNA3 Architectures Thread

Page 18 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,773
6,750
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:
Jul 27, 2020
23,678
16,609
146
If N41 really is canned, N31 sales must have been super disappointing for AMD to pull out. Then again, they paid for their own mistakes so they can't really blame consumers.
 

adroc_thurston

Diamond Member
Jul 2, 2023
5,407
7,577
96

TESKATLIPOKA

Platinum Member
May 1, 2020
2,640
3,197
136
He doesn't know anything.
I didn't say he does.
Just that his guess looks correct to me.
Eh you can do a lot with a new uArch and a shrink.
But I digress.
Yeah, you can do a lot with that. But It's a monolith, so It's questionable how much more transistor they can put inside when a big chunk of the chip no longer scales with a better node.
Don't think transistor density for this monolith at N3E would be much better than what 5nm N31 GCD had.
 
Last edited:

Frenetic Pony

Senior member
May 1, 2012
218
179
116
I didn't say he does.
Just that his guess looks correct to me.

Yeah, you can do a lot with that. But It's a monolith, so It's questionable how much more transistor they can put inside when a big chunk of the chip no longer scales with a better node.
Don't think transistor density for this monolith at N3E would be much better than what 5nm N31 GCD had.

The real question for GPUs, especially consumer ones like this, isn't density scaling but power scaling. At least on TSMC at the moment.

It's hard to tell exactly from the outside, but I get the impression that while N3E scales more densely than the 5nm line, the latest (4p) has close to the same power efficiency.

If 4p is available and cheaper than N3E I could easily see RDNA4 going with that. Regardless of how many chiplets there are they're not maxxing out the reticle size so density isn't an overriding concern as such.
 

blackangus

Senior member
Aug 5, 2022
230
413
106
I read that TSMC sold all N3 capacity for 2024 to Apple, very recently.
So that would seem to indicate no N3 for anyone else next year.
But that could mean all N3 that wasn't already sold, article was not clear as it just blanket stated all N3 capacity.

So apparently we would be looking at 25 before we see an N3 GPU.
 

adroc_thurston

Diamond Member
Jul 2, 2023
5,407
7,577
96
read that TSMC sold all N3 capacity for 2024 to Apple, very recently.
So that would seem to indicate no N3 for anyone else next year.
But that could mean all N3 that wasn't already sold, article was not clear as it just blanket stated all N3 capacity.
That's N3b aka stuff no one uses.
 

gdansk

Diamond Member
Feb 8, 2011
4,040
6,660
136
Interesting. All the articles I have seen just say every N3 wafer they can produce, no process differentiation.
But hey that is main-stream news isn't it =)
It's from Wayne Ma originally who says Apple has "roughly a year" of 3 nanometer exclusivity. Article is more concerned about Apple getting a special deal for helping finance TSMC's bleeding edge processes.
What they don't say is that the exclusivity is because N3e - the non-bleeding process which everyone else will be using - is about a year after N3b (give or take a quarter).
 
Last edited:

TESKATLIPOKA

Platinum Member
May 1, 2020
2,640
3,197
136
The real question for GPUs, especially consumer ones like this, isn't density scaling but power scaling. At least on TSMC at the moment.

It's hard to tell exactly from the outside, but I get the impression that while N3E scales more densely than the 5nm line, the latest (4p) has close to the same power efficiency.

If 4p is available and cheaper than N3E I could easily see RDNA4 going with that. Regardless of how many chiplets there are they're not maxxing out the reticle size so density isn't an overriding concern as such.
We were talking about 2 small RDNA4 monolithic chips and their possible performance.
You are talking about GPUs in general, but in the case of these RDNA4 chips the power scaling is not really an issue, when they will be likely under 200W.

I think Transistor density is a bigger issue in this case and not because they would max out the size limit, but because of the production cost vs older process.

The first question is how big an RDNA4 monolith would be? It's supposedly tiny so I think It would be ~150mm2 for the bigger one.

The second question is If N3E can provide 180 MTr/mm2?
N33 has 65.2 MTr/mm2 density using N6
N31 has 152.3 MTr/mm2 for It's N5 GCD, but only 55.4 MTr/mm2 for N6 MCD.
Ada has ~120-125 MTr/mm2 using N4 process.

If yes, that would mean 27B transistors in a 150mm2 chip. That's 2x more than N33.
With that many transistors 4SE, 64CU, 4096Shaders, 160TMUs, 128ROPs, 64MB IC, 256-bit GDDR6 should be minimum specs for a RDNA3 chip.
Of course we are talking about RDNA4 here and not RDNA3, so for the same specs It likely needs more transistors, similar to RDNA3 which needed ~20% more than RDNA2.
If they fixed the clocks(3.5GHz instead of 2.5GHz) then even without any other architectural improvement It could actually be faster than N21, If density and die size is comparable to what I wrote.
Ok, performance doesn't looks as bad as I originally thought.

Then the next question is If It's better(cheaper) to use N3E and have a 150mm2 chip or N4 and have a 220mm2 chip. At least the later should be cheaper to make than N32.

The last question is If this chip would really be only a midrange for ~$399?
It has too high performance, so I think I was too optimistic and in reality It would be smaller and have less transitors.
 
Last edited:

Frenetic Pony

Senior member
May 1, 2012
218
179
116
Cost is more complicated than you're bringing up.

The first consideration is that N3E would be a brand new node for TSMC, usually in very high demand. The higher the demand, the more TSMC can charge. 4P is obviously an older node, with less demand. Foundries rely on high utilization of their fabs for profit, so any advantage in cost N3E brings might be offset by lower 4P prices.

The second is that power efficiency is directly correlated with clockspeeds. As stated, N3E does not appear to be any great advance on 4P in terms of efficiency. Even theoretically achievable clockspeed gains might be minimal, and often there's other limitations, usually the cache/memory hierarchy, that will limit clockspeeds before then.

Speaking of cache hierarchy, because GPUs rely so heavily on cache there's a lot of it. But SRAM doesn't scale nearly as well as logic for advancing nodes. Thus any density scaling advantage between N3E and 4P would be minimized by the amount of SRAM needed for registers, L1, shared memory, and possibly L2.

If the die size is 150mm, regardless of whether it can be a chiplet or not, it still puts up a legitimate question as to whether AMD would care enough to get to N3E. The relatively small efficiency advantage will probably be a consideration for their highest margin product lines, I.E. servers and MI400, as well as for mobile parts where power efficiency is paramount. But there's already rumors pointing towards Zen 5 desktop being built on 4nm rather than 3, because if power efficiency nor die size matters quite as much then price is the ultimate decider. And 4p might be cheaper next year and into 2025.
 
Reactions: xpea

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
my opinion is that amd is taking their time after mcd did not work out on gpu as they may have originally planned like their cpu design paid off. it'll be easier to fight at the low and mid range with intel than nvidia who've got all the good features. xess looks better than frs any day of the week.
 

maddie

Diamond Member
Jul 18, 2010
5,078
5,394
136
my opinion is that amd is taking their time after mcd did not work out on gpu as they may have originally planned like their cpu design paid off. it'll be easier to fight at the low and mid range with intel than nvidia who've got all the good features. xess looks better than frs any day of the week.
First time reading this.

Do you have any info that the MCD design is the issue, or is it MCDs are novel, so they must be the reason, reasoning?
 

adroc_thurston

Diamond Member
Jul 2, 2023
5,407
7,577
96
my opinion is that amd is taking their time after mcd did not work out on gpu as they may have originally planned like their cpu design paid off
N31/32 isses are unrelated to tiles.
Do you have any info that the MCD design is the issue, or is it MCDs are novel, so they must be the reason, reasoning?
There is none.
 

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
First time reading this.

Do you have any info that the MCD design is the issue, or is it MCDs are novel, so they must be the reason, reasoning?
Based on a casual conversation I had with a former AMD engineer who worked on K7 and worked with jimmy that I know through a mutual industry friend. The approach for a gpu was more novel than a processor. It's not the choice of going mcd that caused rdna3 to be 80% there with performance left hanging. K7 was obviously not a gpu family but a processor one. the idea of fusing two gpus had been done about25 or 26 years ago by ati and 3d labs, the latter most have never heard of. they got gobbled up by creative labs and then renamed and then their ip got gobbled up by intel. dual die gpus exhibited performance problems most which I scantly remember due to age and it being a long time. are mcd gpus the future? yes, but there's gonna take some time and lots of r n d before performance really powers through. I think AMD's design was good. I won't claim it to be flawed. they continue to make efforts to reign in power use and optimize what they've got. as a consumer I'd take it as a good sign they're laying low for rdna4 and trying to perfect their designs rather than subjugate themselves to more humiliation through angry users. I don't think their cards launched at fair prices but I think the same as nvidia.

I'll chide amd for their drive polishdness but never their hardware. yeah nvidia's hardware is better but they spend more on r n d and recoup their costs by driving a dry giant tree up their customers rear ends un lubed. Ultimately I feel that when amd gets this down pat, nvidia will be in a world of hurt. they will have another hot, power hungry design even if mcd because they'll be caught with their pants down. kinda like gcn making more sense for data than gaming, but worse. jensen needs a hard spanking to get back down to earth. his massive ego and inflated head keep taking him up to the skies.

if amd had held off another generation and done mcd only through their big boy parts such as for llm or whatever people buy them for at the mo, then rtg would have had a bigger budget but no experience in designing for consumer level hardware. it's a flip of the coin pick your battles situation. I think nvidia sitting rtx40 out with a monolithic die was both good because the 4090 and 4080 ti are excellent, the rest not so for the price, but sitting out may affect how 50 series pans for consumers. right now jensen the dancing monkey's focus is on this ai drivel crap so he does not care if a 4090 ti doesn't come out and if the 50 series is a middle of the road design like 30 series but still sees a price increase. he knows plenty of morons with more money than sense will buy it.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,640
3,197
136
Cost is more complicated than you're bringing up.

The first consideration is that N3E would be a brand new node for TSMC, usually in very high demand. The higher the demand, the more TSMC can charge. 4P is obviously an older node, with less demand. Foundries rely on high utilization of their fabs for profit, so any advantage in cost N3E brings might be offset by lower 4P prices.

The second is that power efficiency is directly correlated with clockspeeds. As stated, N3E does not appear to be any great advance on 4P in terms of efficiency. Even theoretically achievable clockspeed gains might be minimal, and often there's other limitations, usually the cache/memory hierarchy, that will limit clockspeeds before then.

Speaking of cache hierarchy, because GPUs rely so heavily on cache there's a lot of it. But SRAM doesn't scale nearly as well as logic for advancing nodes. Thus any density scaling advantage between N3E and 4P would be minimized by the amount of SRAM needed for registers, L1, shared memory, and possibly L2.

If the die size is 150mm, regardless of whether it can be a chiplet or not, it still puts up a legitimate question as to whether AMD would care enough to get to N3E. The relatively small efficiency advantage will probably be a consideration for their highest margin product lines, I.E. servers and MI400, as well as for mobile parts where power efficiency is paramount. But there's already rumors pointing towards Zen 5 desktop being built on 4nm rather than 3, because if power efficiency nor die size matters quite as much then price is the ultimate decider. And 4p might be cheaper next year and into 2025.
I don't really see any difference in what you wrote and what I wrote, you just added more details.
BTW the clockspeed increase of 40%(3.5GHz vs 2.5GHz) I mentioned wasn't because of N3E but because they fixed their architecture, I didn't add any additional gains to It.

There is really nothing complicated about production cost. AMD wants It as low as possible, that's all. They will choose the cheaper option unless the costlier one offers significant advantages like higher clocks, better efficiency and so on, which would allow them to sell their product for more.
Of course, in the case of N3E demand will be high and It would be better to allocate It to more profitable products, so the GPU would be left with N4P instead.

The question still stands, which one is better to use for a relatively small monolithic chip?


N4P vs N5 provides 11% speed improvement, 18% energy reduction(22% higher power efficiency) and 6% improvement in transistor density.Videocardz
If I compare N3E FinFlex 2-1 Fin vs N4P then we end up with
+47% density, +0% speed, -15% energy reduction
I will use this one in my table.

SemiAnalysis claims 35% higher cost for N3E than N5. Let's keep It also for N3E vs N4P.
I made a table with a 27B RDNA4 GPU using different nodes with different transistor density as an example.
NodeMTr/mm2Wafer costGPU sizeGood dies per wafer
Faulty dies per waferPrice per good waferPrice per good+faulty wafer
N4P123 (100%)$15,000
(100%)
220mm220650$72.8$58.6
N3E
2-1 FIN
180 (147%)$20,250
(135%)
150mm232452$62.5$53.9
N3E
2-1 FIN
150
(122%)
$22,500
(150%)
180mm226551$84.9$71.2
My conclusion is that even If AMD used N3E there would be only a little gain in performance and even efficiency wouldn't be that much better.
Production cost could end up either cheaper or costlier depending on N3E's price and density improvement.
 
Reactions: Tlh97
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |