Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 925 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Jul 27, 2020
23,512
16,525
146
Well, it is surprising. What is the investment needed for introducing such dual-CCD SKU? What else on top of proper binning and producing a fw revision is required?
I think it's more of a "mental" issue than a technical one in greenlighting the dual CCD SKU for production. The person deciding that (maybe Lisa) probably thinks like so:

1 V-cache die == 1 Halo SKU == PROFIT

2 V-cache dies == 1 Halo SKU == 1.25 x PROFIT == Loss of 0.75X in PROFIT == Not worth taking blame for in times of financial distress

2 V-cache dies == 1 Halo SKU == 2 x PROFIT == Angry people == Not worth the bad PR

That last one, we need to unite, pool our financial resources and present AMD with upfront funds to get these special CPUs made to order.
 
Jul 27, 2020
23,512
16,525
146
And then you'll see the numbers and you'll will complain about performance...
That's not important. What's important is not having to use any band-aid like Gamebar or even driver level executable process affinity. It will perform fine even on Linux. All threads will get first class status, instead of the scheduler/gamebar deciding to demote some threads to the lower cache CCD, maybe right in the middle of when they really, really need extra cache. This is what keeps me up at night. CPUs getting bandwidth starved and threads getting cache starved.
 
Reactions: biostud

Thibsie

Golden Member
Apr 25, 2017
1,010
1,186
136
Of course you'll have to play the affinity game, threads will migrate to CCD2 which means the other vcache then come back and forth.

I don't see the point, it will only make sure the second CCD does boost to lower freq.
 
Reactions: Kryohi

mmaenpaa

Member
Aug 4, 2009
116
215
116
I have just installed Server 2022 on 7950X (128GB ECC memory)

I have been trying to find out if Microsoft branch prediction improvement has been included for Microsoft Server 2022 os also. I think it was mostly removing unnecessary leftover security migitations from old processor versions.

KB5041587 was the update for W11 and of course is included in 24H2

Does anybody have info on this?
 

gdansk

Diamond Member
Feb 8, 2011
4,030
6,637
136
I have just installed Server 2022 on 7950X (128GB ECC memory)

I have been trying to find out if Microsoft branch prediction improvement has been included for Microsoft Server 2022 os also. I think it was mostly removing unnecessary leftover security migitations from old processor versions.

KB5041587 was the update for W11 and of course is included in 24H2

Does anybody have info on this?
AFAIK the regression that it fixed did not impact Windows Server 2022.
 
Reactions: mmaenpaa

DrMrLordX

Lifer
Apr 27, 2000
22,491
12,364
136
Obessive CCX Disorder?

Obsessive Cache Disorder

More like Obsessive sCheduler Disorder, as in no more threads migrating to the wrong CCD . . . but yeah any of those is good enough.

Well, it is surprising. What is the investment needed for introducing such dual-CCD SKU? What else on top of proper binning and producing a fw revision is required?
Might be a packaging limit thing, might not.
 

gdansk

Diamond Member
Feb 8, 2011
4,030
6,637
136
Niche product, says the company who completely miscalculated 9800X3D demand
A niche product which in its mere existence would consume two 9800X3Ds. And at best the reviews will say to skip it because it is indistinguishable in most tests and even worse in some others at higher cost. It isn't a halo part. It offers no performance over the mixed configuration unless you're running niche work like fluid simulation. It's just an expensive part. AMD has the benchmarks and said "I’m scared to death of it" because it's a very hard product to sell.

The easiest way to avoid reviews is to launch it as part of AM5 Epyc and/or toward the end of Zen 5's life span when no one cares.
 
Last edited:

biostud

Lifer
Feb 27, 2003
19,323
6,340
136
A niche product which in its mere existence would consume two 9800X3Ds. And at best the reviews will say to skip it because it is indistinguishable in most tests and even worse in some others at higher cost. It isn't a halo part. It offers no performance over the mixed configuration unless you're running niche work like fluid simulation. It's just an expensive part. AMD has the benchmarks and said "I’m scared to death of it" because it's a very hard product to sell.

The easiest way to avoid reviews is to launch it as part of AM5 Epyc and/or toward the end of Zen 5's life span when no one cares.
There has been so many "niche" or special edition CPUs that wasn't worth it, but still has been launched.

All Intels KS series, before that the EE series, 8086K.

As long as you don't promise something it can't deliver, then informed buyers can decide for themselves.

They could simply launch it is as a 9950X3D2 with premium cost and openly say it's a test vehicle to see if there is a market for those, even if if it doesn't bring much extra performance.

Noone could give a negative review if that was the case, because they don't over promise anything.
 

StefanR5R

Elite Member
Dec 10, 2016
6,341
9,760
136
[Strix Halo, which CCDs did AMD use?]
If its just a different GMI interconnect, why isn't there a chance its already on all the normal CCDs? It could have a legacy mode for the old desktop IO die, and the new mode for higher spec controllers.
My uninformed guess, would be that they are using "sea of wires" for a reason and that reason is they have changed from "more serial" physical interconnect clocked at 20sth GHz to a "more parallel" one they can clock at 2 GHz. Since they maintain BW, that means they have ~roughly 10 times more wires, meaning more contacts on the CCD itself. Since IO takes space might be they have done some frequency space trade-off on rest of the chip to accommodate more IO and keep sensible v/f curve, since Halo is limited to 5.1GHz boost.

But yea, it's just a guess, maybe somebody can correct me if I missed the picture
Yes, I/O takes space: In Granite Ridge and Turin, the SERDESes take a lot of space, as do the contact pads towards the substrate. Conversely, in Strix Halo you simply have the same on-die fabric as in Granite Ridge and Turin, but routed to reasonably small vias/ contact pads for hybrid bonding.

Strix Halo's die-to-die interconnect takes less space than Granite Ridge's and Turin's. It takes so little space that it is very well possible that it already is present on Granite Ridge's or/and Turin's CCD. Just like the connections to X3D Cache are present on Granite Ridge's CCD. And these are probably wider (at least have more bandwidth) than the die-to-die interconnect.

Here is how much — or how little — space the two types of die-to-die interconnect take in Durango, the CCD used in Raphael/ Genoa/ MI300A/ MI300C:

(from computerbase.de's report on AMD's MI300 presentation in San Jose, 12/2023)
 
Last edited:

gdansk

Diamond Member
Feb 8, 2011
4,030
6,637
136
There has been so many "niche" or special edition CPUs that wasn't worth it, but still has been launched.

All Intels KS series, before that the EE series, 8086K.

As long as you don't promise something it can't deliver, then informed buyers can decide for themselves.

They could simply launch it is as a 9950X3D2 with premium cost and openly say it's a test vehicle to see if there is a market for those, even if if it doesn't bring much extra performance.

Noone could give a negative review if that was the case, because they don't over promise anything.
No, those actually provided slightly more clock speed. Moreover they are simply bins of existing parts. They are making parts good enough for those already. May as well scrape some off for a higher MSRP SKU. This dual cache CCD part would reduce clock rates & increase cache latency out of the box while requiring twice as many X3D CCDs. It's just a less flexible, more expensive version that will not look good in benches, tests or reviews.

The demand for this product is the few people running fluid simulations or entirely-in-L3 queries on AM5. And gamers who incorrectly think it is any better at gaming than the mixed CCD product. But the best configuration for gamers that uses 2 cache chiplets would be stacked on one CCD, not spread over two separate CCDs. They're asking for a configuration that's not even gonna help them (unless they're too lazy to use core pinning) and will reduce AMD's ability to supply & eventually discount 9800X3D.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
6,341
9,760
136
And gamers who incorrectly think it is any better
I'd rather say, gamers wo overestimate¹ how much a homogeneous dual-CCX CPU is better at gaming versus a heterogeneous one, plus gamers who overestimate how much a 12+ core CPU is better in their very own choice of games versus an 8 core CPU.² IOW it's a grey area, not black and white.
________
¹) more so, when not just performance, but performance/price is considered
²) plus gamers who have a realistic grasp of that but don't care about CPU price

Edit,
Of course some would [give negative reviews], what do you think ?
Most definitely, considering how many reviewers merely measure average video game FPS in a few standard undemanding scenes and call it a day. However, if AMD made such a product and called it EPYC 4{4,5}85PX, the "silly youtube thumbnails effect" could perhaps be well contained.
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
5,075
5,393
136
[Strix Halo, which CCDs did AMD use?]


Yes, I/O takes space: In Granite Ridge and Turin, the SERDESes take a lot of space, as do the contact pads towards the substrate. Conversely, in Strix Halo you simply have the same on-die fabric as in Granite Ridge and Turin, but routed to reasonably small vias/ contact pads for hybrid bonding.

Strix Halo's die-to-die interconnect takes less space than Granite Ridge's and Turin's. It takes so little space that it is very well possible that it already is present on Granite Ridge's or/and Turin's CCD. Just like the connections to X3D Cache are present on Granite Ridge's CCD. And these are probably wider (at least have more bandwidth) than the die-to-die interconnect.

Here is how much — or how little — space the two types of die-to-die interconnect take in Durango, the CCD used in Raphael/ Genoa/ MI300A/ MI300C:
View attachment 115198
(from computerbase.de's report on AMD's MI300 presentation in San Jose, 12/2023)
The RX 7xxx chiplet fan-out connections should be more accurate. No stacking, but high bandwidth, low latency and low power, + much cheaper than 3D.

A reminder of the tech.








 

Win2012R2

Senior member
Dec 5, 2024
792
795
96
Then people like me get angry. We want it somewhat within reach. Epycs and TRs are too expensive for individuals.
There is a 16 core Epyc version with 512 MB L3 cache with list price of 4256 USD, which is perhaps the true reason why they really don't want to release highly clocked 16 cores with 192 MB cache for like $1000.

Right now AMD certainly prefers 2x more motherboards sold for same 3D chiplets, and double users too - this is all the installment base that will likely buy again, still having a small batch of HALO product that can be relatively easily done in this case is always good idea, get superbinned chiplets to clock 100 Mhz higher, job done.
 
Reactions: r.p

Joe NYC

Platinum Member
Jun 26, 2021
2,931
4,301
106
[Strix Halo, which CCDs did AMD use?]


Yes, I/O takes space: In Granite Ridge and Turin, the SERDESes take a lot of space, as do the contact pads towards the substrate. Conversely, in Strix Halo you simply have the same on-die fabric as in Granite Ridge and Turin, but routed to reasonably small vias/ contact pads for hybrid bonding.

Strix Halo's die-to-die interconnect takes less space than Granite Ridge's and Turin's. It takes so little space that it is very well possible that it already is present on Granite Ridge's or/and Turin's CCD. Just like the connections to X3D Cache are present on Granite Ridge's CCD. And these are probably wider (at least have more bandwidth) than the die-to-die interconnect.

Here is how much — or how little — space the two types of die-to-die interconnect take in Durango, the CCD used in Raphael/ Genoa/ MI300A/ MI300C:
View attachment 115198
(from computerbase.de's report on AMD's MI300 presentation in San Jose, 12/2023)

I think you are right in your speculation. I also think Strix Halo uses the same CCD as desktop and server. In the Cheese interviews, AMD guy stressed binning. Which can only be done if you have bins for other uses (rather than throwing chips away).

This would imply dual interface on each Zen 5 CCD, and this way, AMD can go wild binning. Since CCD chiplets only need power and link to IOD, there is probably plenty of space to accommodate dual interface.

As far as usage of die area for GMI vs. fan-out, there is a big difference in that GMI uses silicon area, while RDL mainly using space on the other side of the die - metal layers of the CPU connected to RDL layer (while transistors above can do their unrelated work).

From the picture of the Mi300a die, it looks like there are some transistors in use to create the stop on the ring bus, but this is a small area.

Moving on to Zen 6, GMI goes away from the top of the die, shrinking the die, only interface for RDL stays.
 

MS_AT

Senior member
Jul 15, 2024
555
1,168
96
Strix Halo's die-to-die interconnect takes less space than Granite Ridge's and Turin's. It takes so little space that it is very well possible that it already is present on Granite Ridge's or/and Turin's CCD.
The interview does not mention that as far as I can tell, so I would be grateful for a source. MI300A is said to use GMI SERDES while when talking about Halo they are using a past tense, like GMI is the "old" thing.
We are able to get power benefits. Because prior to that, we had a GMI PHY that lived in there and that consumed a whole lot of power in order to be able to send this over high frequencies over short distances. Here we are clocking it at a way less than the 20 gigs that the GMI was being clocked at. This is anywhere between, you know, one to two gigahertz.
And tech they are supposed to use is the same that was already linked here, where they can make much denser interconnect, use more wires and clock it lower, but still come ahead on bandwidth. Another analogue would be HMB vs GDDR.

So my line of reasoning why they are using a different CCD is: they still need to move the same amount of data between the 2 chips (CCD <-> IOD), if it is being clocked 10 times slower, they need to make it 10 times wider unless they play with encoding etc. So since they need to do 10x the number of connections to CCD than before, whatever the before was, people doing annotation of desktop Zen5 die shots, would probably spot it. It should be visible if they are able to spot the vias for x3d cache.

As far as usage of die area for GMI vs. fan-out, there is a big difference in that GMI uses silicon area, while RDL mainly using space on the other side of the die - metal layers of the CPU connected to RDL layer (while transistors above can do their unrelated work).
Since the inteview say they are able to do away with SERDES, wouldn't we see differences in the die shots, as those wires should lead directly to structures in silicon without anything doing the translation. And they would need more wires, for the reason mentioned above.
 

StefanR5R

Elite Member
Dec 10, 2016
6,341
9,760
136
So my line of reasoning why they are using a different CCD is: they still need to move the same amount of data between the 2 chips (CCD <-> IOD), if it is being clocked 10 times slower, they need to make it 10 times wider unless they play with encoding etc. So since they need to do 10x the number of connections to CCD than before, whatever the before was, people doing annotation of desktop Zen5 die shots, would probably spot it. It should be visible if they are able to spot the vias for x3d cache.
While it's ten times wider than what's going from SERDES to substrate, it's not terribly wide compared to what is already there on-chip. Actually, what's going from the Coherent Master over to the fanout contacts is likely at the same order of magnitude of clock and width as what's going from the Coherent Master to the SERDES.

Further, as @Joe NYC pointed out, this stuff may not show up on die shots at all because it happens in the "lowermost" metal layers.
 
Reactions: Tlh97 and Vattila
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |