Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

igor_kavinski · Jan 18, 2025

yuri69 said:
Well, it is surprising. What is the investment needed for introducing such dual-CCD SKU? What else on top of proper binning and producing a fw revision is required?

I think it's more of a "mental" issue than a technical one in greenlighting the dual CCD SKU for production. The person deciding that (maybe Lisa) probably thinks like so:

1 V-cache die == 1 Halo SKU == PROFIT

2 V-cache dies == 1 Halo SKU == 1.25 x PROFIT == Loss of 0.75X in PROFIT == Not worth taking blame for in times of financial distress

2 V-cache dies == 1 Halo SKU == 2 x PROFIT == Angry people == Not worth the bad PR

That last one, we need to unite, pool our financial resources and present AMD with upfront funds to get these special CPUs made to order.

Thibsie · Jan 18, 2025

And then you'll see the numbers and you'll will complain about performance...

igor_kavinski · Jan 18, 2025

Thibsie said:
And then you'll see the numbers and you'll will complain about performance...

That's not important. What's important is not having to use any band-aid like Gamebar or even driver level executable process affinity. It will perform fine even on Linux. All threads will get first class status, instead of the scheduler/gamebar deciding to demote some threads to the lower cache CCD, maybe right in the middle of when they really, really need extra cache. This is what keeps me up at night. CPUs getting bandwidth starved and threads getting cache starved.

Thibsie · Jan 18, 2025

Of course you'll have to play the affinity game, threads will migrate to CCD2 which means the other vcache then come back and forth.

I don't see the point, it will only make sure the second CCD does boost to lower freq.

mmaenpaa · Jan 18, 2025

I have just installed Server 2022 on 7950X (128GB ECC memory)

I have been trying to find out if Microsoft branch prediction improvement has been included for Microsoft Server 2022 os also. I think it was mostly removing unnecessary leftover security migitations from old processor versions.

KB5041587 was the update for W11 and of course is included in 24H2

Does anybody have info on this?

igor_kavinski · Jan 18, 2025

Hitman928 said:
We could make a Ryzen 9 9950X3D with 3D V-cache on all 16 cores, AMD tells us

AMD confirms that it could make a limited edition dual-CCD Ryzen X3D gaming CPU with 3D V-cache on both CCDs, but it would be a niche product.

www.pcgamesn.com

Niche product, says the company who completely miscalculated 9800X3D demand

igor_kavinski · Jan 18, 2025

mmaenpaa said:
Does anybody have info on this?

Compare your CB R23 score with the Win11 branch prediction enhanced score.

igor_kavinski · Jan 18, 2025

mmaenpaa said:
I have just installed Server 2022 on 7950X (128GB ECC memory)

What RAM speed did your mobo settle on for all populated DIMM slots? And your mobo model?

gdansk · Jan 18, 2025

mmaenpaa said:
I have just installed Server 2022 on 7950X (128GB ECC memory)

I have been trying to find out if Microsoft branch prediction improvement has been included for Microsoft Server 2022 os also. I think it was mostly removing unnecessary leftover security migitations from old processor versions.

KB5041587 was the update for W11 and of course is included in 24H2

Does anybody have info on this?

AFAIK the regression that it fixed did not impact Windows Server 2022.

DrMrLordX · Jan 18, 2025

511 said:
Obessive CCX Disorder?

coercitiv said:
Obsessive Cache Disorder

More like Obsessive sCheduler Disorder, as in no more threads migrating to the wrong CCD . . . but yeah any of those is good enough.

yuri69 said:
Well, it is surprising. What is the investment needed for introducing such dual-CCD SKU? What else on top of proper binning and producing a fw revision is required?

Might be a packaging limit thing, might not.

Hail The Brain Slug · Jan 19, 2025

igor_kavinski said:
Niche product, says the company who completely miscalculated 9800X3D demand

In for one.

gdansk · Jan 19, 2025

igor_kavinski said:
Niche product, says the company who completely miscalculated 9800X3D demand

A niche product which in its mere existence would consume two 9800X3Ds. And at best the reviews will say to skip it because it is indistinguishable in most tests and even worse in some others at higher cost. It isn't a halo part. It offers no performance over the mixed configuration unless you're running niche work like fluid simulation. It's just an expensive part. AMD has the benchmarks and said "I’m scared to death of it" because it's a very hard product to sell.

The easiest way to avoid reviews is to launch it as part of AM5 Epyc and/or toward the end of Zen 5's life span when no one cares.

biostud · Jan 19, 2025

gdansk said:
A niche product which in its mere existence would consume two 9800X3Ds. And at best the reviews will say to skip it because it is indistinguishable in most tests and even worse in some others at higher cost. It isn't a halo part. It offers no performance over the mixed configuration unless you're running niche work like fluid simulation. It's just an expensive part. AMD has the benchmarks and said "I’m scared to death of it" because it's a very hard product to sell.

The easiest way to avoid reviews is to launch it as part of AM5 Epyc and/or toward the end of Zen 5's life span when no one cares.

There has been so many "niche" or special edition CPUs that wasn't worth it, but still has been launched.

All Intels KS series, before that the EE series, 8086K.

As long as you don't promise something it can't deliver, then informed buyers can decide for themselves.

They could simply launch it is as a 9950X3D2 with premium cost and openly say it's a test vehicle to see if there is a market for those, even if if it doesn't bring much extra performance.

Noone could give a negative review if that was the case, because they don't over promise anything.

StefanR5R · Jan 19, 2025

[Strix Halo, which CCDs did AMD use?]

dr1337 said:
If its just a different GMI interconnect, why isn't there a chance its already on all the normal CCDs? It could have a legacy mode for the old desktop IO die, and the new mode for higher spec controllers.

MS_AT said:
My uninformed guess, would be that they are using "sea of wires" for a reason and that reason is they have changed from "more serial" physical interconnect clocked at 20sth GHz to a "more parallel" one they can clock at 2 GHz. Since they maintain BW, that means they have ~roughly 10 times more wires, meaning more contacts on the CCD itself. Since IO takes space might be they have done some frequency space trade-off on rest of the chip to accommodate more IO and keep sensible v/f curve, since Halo is limited to 5.1GHz boost.

But yea, it's just a guess, maybe somebody can correct me if I missed the picture

Yes, I/O takes space: In Granite Ridge and Turin, the SERDESes take a lot of space, as do the contact pads towards the substrate. Conversely, in Strix Halo you simply have the same on-die fabric as in Granite Ridge and Turin, but routed to reasonably small vias/ contact pads for hybrid bonding.

Strix Halo's die-to-die interconnect takes less space than Granite Ridge's and Turin's. It takes so little space that it is very well possible that it already is present on Granite Ridge's or/and Turin's CCD. Just like the connections to X3D Cache are present on Granite Ridge's CCD. And these are probably wider (at least have more bandwidth) than the die-to-die interconnect.

Here is how much — or how little — space the two types of die-to-die interconnect take in Durango, the CCD used in Raphael/ Genoa/ MI300A/ MI300C:

(from computerbase.de's report on AMD's MI300 presentation in San Jose, 12/2023)

gdansk · Jan 19, 2025

biostud said:
There has been so many "niche" or special edition CPUs that wasn't worth it, but still has been launched.

All Intels KS series, before that the EE series, 8086K.

As long as you don't promise something it can't deliver, then informed buyers can decide for themselves.

They could simply launch it is as a 9950X3D2 with premium cost and openly say it's a test vehicle to see if there is a market for those, even if if it doesn't bring much extra performance.

Noone could give a negative review if that was the case, because they don't over promise anything.

No, those actually provided slightly more clock speed. Moreover they are simply bins of existing parts. They are making parts good enough for those already. May as well scrape some off for a higher MSRP SKU. This dual cache CCD part would reduce clock rates & increase cache latency out of the box while requiring twice as many X3D CCDs. It's just a less flexible, more expensive version that will not look good in benches, tests or reviews.

The demand for this product is the few people running fluid simulations or entirely-in-L3 queries on AM5. And gamers who incorrectly think it is any better at gaming than the mixed CCD product. But the best configuration for gamers that uses 2 cache chiplets would be stacked on one CCD, not spread over two separate CCDs. They're asking for a configuration that's not even gonna help them (unless they're too lazy to use core pinning) and will reduce AMD's ability to supply & eventually discount 9800X3D.

Thibsie · Jan 19, 2025

biostud said:
Noone could give a negative review if that was the case, because they don't over promise anything.

Of course some would, what do you think ?

StefanR5R · Jan 19, 2025

gdansk said:
And gamers who incorrectly think it is any better

I'd rather say, gamers wo overestimate¹ how much a homogeneous dual-CCX CPU is better at gaming versus a heterogeneous one, plus gamers who overestimate how much a 12+ core CPU is better in their very own choice of games versus an 8 core CPU.² IOW it's a grey area, not black and white.
________
¹) more so, when not just performance, but performance/price is considered
²) plus gamers who have a realistic grasp of that but don't care about CPU price

Edit,

Thibsie said:
Of course some would [give negative reviews], what do you think ?

Most definitely, considering how many reviewers merely measure average video game FPS in a few standard undemanding scenes and call it a day. However, if AMD made such a product and called it EPYC 4{4,5}85PX, the "silly youtube thumbnails effect" could perhaps be well contained.

mmaenpaa · Jan 19, 2025

igor_kavinski said:
What RAM speed did your mobo settle on for all populated DIMM slots? And your mobo model?

3600Mhz
Asrock B650D4U

https://www.asrockrack.com/general/productdetail.asp?Model=B650D4U#Specifications

maddie · Jan 19, 2025

StefanR5R said:
[Strix Halo, which CCDs did AMD use?]

Yes, I/O takes space: In Granite Ridge and Turin, the SERDESes take a lot of space, as do the contact pads towards the substrate. Conversely, in Strix Halo you simply have the same on-die fabric as in Granite Ridge and Turin, but routed to reasonably small vias/ contact pads for hybrid bonding.

Strix Halo's die-to-die interconnect takes less space than Granite Ridge's and Turin's. It takes so little space that it is very well possible that it already is present on Granite Ridge's or/and Turin's CCD. Just like the connections to X3D Cache are present on Granite Ridge's CCD. And these are probably wider (at least have more bandwidth) than the die-to-die interconnect.

Here is how much — or how little — space the two types of die-to-die interconnect take in Durango, the CCD used in Raphael/ Genoa/ MI300A/ MI300C:
View attachment 115198
(from computerbase.de's report on AMD's MI300 presentation in San Jose, 12/2023)

The RX 7xxx chiplet fan-out connections should be more accurate. No stacking, but high bandwidth, low latency and low power, + much cheaper than 3D.

A reminder of the tech.

Win2012R2 · Jan 19, 2025

igor_kavinski said:
2 V-cache dies == 1 Halo SKU == 2 x PROFIT == Angry people == Not worth the bad PR

Call it EPYC and sell for 3 x PROFIT, normal thing since it's a server chip - no bad PR

igor_kavinski · Jan 19, 2025

Win2012R2 said:
Call it EPYC and sell for 3 x PROFIT, normal thing since it's a server chip - no bad PR

Then people like me get angry. We want it somewhat within reach. Epycs and TRs are too expensive for individuals.

Win2012R2 · Jan 19, 2025

igor_kavinski said:
Then people like me get angry. We want it somewhat within reach. Epycs and TRs are too expensive for individuals.

There is a 16 core Epyc version with 512 MB L3 cache with list price of 4256 USD, which is perhaps the true reason why they really don't want to release highly clocked 16 cores with 192 MB cache for like $1000.

Right now AMD certainly prefers 2x more motherboards sold for same 3D chiplets, and double users too - this is all the installment base that will likely buy again, still having a small batch of HALO product that can be relatively easily done in this case is always good idea, get superbinned chiplets to clock 100 Mhz higher, job done.

Joe NYC · Jan 19, 2025

StefanR5R said:
[Strix Halo, which CCDs did AMD use?]

Yes, I/O takes space: In Granite Ridge and Turin, the SERDESes take a lot of space, as do the contact pads towards the substrate. Conversely, in Strix Halo you simply have the same on-die fabric as in Granite Ridge and Turin, but routed to reasonably small vias/ contact pads for hybrid bonding.

Strix Halo's die-to-die interconnect takes less space than Granite Ridge's and Turin's. It takes so little space that it is very well possible that it already is present on Granite Ridge's or/and Turin's CCD. Just like the connections to X3D Cache are present on Granite Ridge's CCD. And these are probably wider (at least have more bandwidth) than the die-to-die interconnect.

Here is how much — or how little — space the two types of die-to-die interconnect take in Durango, the CCD used in Raphael/ Genoa/ MI300A/ MI300C:
View attachment 115198
(from computerbase.de's report on AMD's MI300 presentation in San Jose, 12/2023)

I think you are right in your speculation. I also think Strix Halo uses the same CCD as desktop and server. In the Cheese interviews, AMD guy stressed binning. Which can only be done if you have bins for other uses (rather than throwing chips away).

This would imply dual interface on each Zen 5 CCD, and this way, AMD can go wild binning. Since CCD chiplets only need power and link to IOD, there is probably plenty of space to accommodate dual interface.

As far as usage of die area for GMI vs. fan-out, there is a big difference in that GMI uses silicon area, while RDL mainly using space on the other side of the die - metal layers of the CPU connected to RDL layer (while transistors above can do their unrelated work).

From the picture of the Mi300a die, it looks like there are some transistors in use to create the stop on the ring bus, but this is a small area.

Moving on to Zen 6, GMI goes away from the top of the die, shrinking the die, only interface for RDL stays.

MS_AT · Jan 19, 2025

StefanR5R said:
Strix Halo's die-to-die interconnect takes less space than Granite Ridge's and Turin's. It takes so little space that it is very well possible that it already is present on Granite Ridge's or/and Turin's CCD.

The interview does not mention that as far as I can tell, so I would be grateful for a source. MI300A is said to use GMI SERDES while when talking about Halo they are using a past tense, like GMI is the "old" thing.

We are able to get power benefits. Because prior to that, we had a GMI PHY that lived in there and that consumed a whole lot of power in order to be able to send this over high frequencies over short distances. Here we are clocking it at a way less than the 20 gigs that the GMI was being clocked at. This is anywhere between, you know, one to two gigahertz.

And tech they are supposed to use is the same that was already linked here, where they can make much denser interconnect, use more wires and clock it lower, but still come ahead on bandwidth. Another analogue would be HMB vs GDDR.

So my line of reasoning why they are using a different CCD is: they still need to move the same amount of data between the 2 chips (CCD <-> IOD), if it is being clocked 10 times slower, they need to make it 10 times wider unless they play with encoding etc. So since they need to do 10x the number of connections to CCD than before, whatever the before was, people doing annotation of desktop Zen5 die shots, would probably spot it. It should be visible if they are able to spot the vias for x3d cache.

Joe NYC said:
As far as usage of die area for GMI vs. fan-out, there is a big difference in that GMI uses silicon area, while RDL mainly using space on the other side of the die - metal layers of the CPU connected to RDL layer (while transistors above can do their unrelated work).

Since the inteview say they are able to do away with SERDES, wouldn't we see differences in the die shots, as those wires should lead directly to structures in silicon without anything doing the translation. And they would need more wires, for the reason mentioned above.

StefanR5R · Jan 19, 2025

MS_AT said:
So my line of reasoning why they are using a different CCD is: they still need to move the same amount of data between the 2 chips (CCD <-> IOD), if it is being clocked 10 times slower, they need to make it 10 times wider unless they play with encoding etc. So since they need to do 10x the number of connections to CCD than before, whatever the before was, people doing annotation of desktop Zen5 die shots, would probably spot it. It should be visible if they are able to spot the vias for x3d cache.

While it's ten times wider than what's going from SERDES to substrate, it's not terribly wide compared to what is already there on-chip. Actually, what's going from the Coherent Master over to the fanout contacts is likely at the same order of magnitude of clock and width as what's going from the Coherent Master to the SERDES.

Further, as @Joe NYC pointed out, this stuff may not show up on die shots at all because it happens in the "lowermost" metal layers.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Lifer

Golden Member

Lifer

Golden Member

Member

Lifer

Lifer

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Elite Member

Diamond Member

Golden Member

Elite Member

Member

Diamond Member

Senior member

Lifer

Senior member

Platinum Member

Senior member

Elite Member