- Mar 3, 2017
- 1,773
- 6,749
- 136
I think it's more of a "mental" issue than a technical one in greenlighting the dual CCD SKU for production. The person deciding that (maybe Lisa) probably thinks like so:Well, it is surprising. What is the investment needed for introducing such dual-CCD SKU? What else on top of proper binning and producing a fw revision is required?
That's not important. What's important is not having to use any band-aid like Gamebar or even driver level executable process affinity. It will perform fine even on Linux. All threads will get first class status, instead of the scheduler/gamebar deciding to demote some threads to the lower cache CCD, maybe right in the middle of when they really, really need extra cache. This is what keeps me up at night. CPUs getting bandwidth starved and threads getting cache starved.And then you'll see the numbers and you'll will complain about performance...
Niche product, says the company who completely miscalculated 9800X3D demandWe could make a Ryzen 9 9950X3D with 3D V-cache on all 16 cores, AMD tells us
AMD confirms that it could make a limited edition dual-CCD Ryzen X3D gaming CPU with 3D V-cache on both CCDs, but it would be a niche product.www.pcgamesn.com
Compare your CB R23 score with the Win11 branch prediction enhanced score.Does anybody have info on this?
What RAM speed did your mobo settle on for all populated DIMM slots? And your mobo model?I have just installed Server 2022 on 7950X (128GB ECC memory)
AFAIK the regression that it fixed did not impact Windows Server 2022.I have just installed Server 2022 on 7950X (128GB ECC memory)
I have been trying to find out if Microsoft branch prediction improvement has been included for Microsoft Server 2022 os also. I think it was mostly removing unnecessary leftover security migitations from old processor versions.
KB5041587 was the update for W11 and of course is included in 24H2
Does anybody have info on this?
Obessive CCX Disorder?
Obsessive Cache Disorder
Might be a packaging limit thing, might not.Well, it is surprising. What is the investment needed for introducing such dual-CCD SKU? What else on top of proper binning and producing a fw revision is required?
In for one.Niche product, says the company who completely miscalculated 9800X3D demand
A niche product which in its mere existence would consume two 9800X3Ds. And at best the reviews will say to skip it because it is indistinguishable in most tests and even worse in some others at higher cost. It isn't a halo part. It offers no performance over the mixed configuration unless you're running niche work like fluid simulation. It's just an expensive part. AMD has the benchmarks and said "I’m scared to death of it" because it's a very hard product to sell.Niche product, says the company who completely miscalculated 9800X3D demand
There has been so many "niche" or special edition CPUs that wasn't worth it, but still has been launched.A niche product which in its mere existence would consume two 9800X3Ds. And at best the reviews will say to skip it because it is indistinguishable in most tests and even worse in some others at higher cost. It isn't a halo part. It offers no performance over the mixed configuration unless you're running niche work like fluid simulation. It's just an expensive part. AMD has the benchmarks and said "I’m scared to death of it" because it's a very hard product to sell.
The easiest way to avoid reviews is to launch it as part of AM5 Epyc and/or toward the end of Zen 5's life span when no one cares.
If its just a different GMI interconnect, why isn't there a chance its already on all the normal CCDs? It could have a legacy mode for the old desktop IO die, and the new mode for higher spec controllers.
Yes, I/O takes space: In Granite Ridge and Turin, the SERDESes take a lot of space, as do the contact pads towards the substrate. Conversely, in Strix Halo you simply have the same on-die fabric as in Granite Ridge and Turin, but routed to reasonably small vias/ contact pads for hybrid bonding.My uninformed guess, would be that they are using "sea of wires" for a reason and that reason is they have changed from "more serial" physical interconnect clocked at 20sth GHz to a "more parallel" one they can clock at 2 GHz. Since they maintain BW, that means they have ~roughly 10 times more wires, meaning more contacts on the CCD itself. Since IO takes space might be they have done some frequency space trade-off on rest of the chip to accommodate more IO and keep sensible v/f curve, since Halo is limited to 5.1GHz boost.
But yea, it's just a guess, maybe somebody can correct me if I missed the picture
No, those actually provided slightly more clock speed. Moreover they are simply bins of existing parts. They are making parts good enough for those already. May as well scrape some off for a higher MSRP SKU. This dual cache CCD part would reduce clock rates & increase cache latency out of the box while requiring twice as many X3D CCDs. It's just a less flexible, more expensive version that will not look good in benches, tests or reviews.There has been so many "niche" or special edition CPUs that wasn't worth it, but still has been launched.
All Intels KS series, before that the EE series, 8086K.
As long as you don't promise something it can't deliver, then informed buyers can decide for themselves.
They could simply launch it is as a 9950X3D2 with premium cost and openly say it's a test vehicle to see if there is a market for those, even if if it doesn't bring much extra performance.
Noone could give a negative review if that was the case, because they don't over promise anything.
Of course some would, what do you think ?Noone could give a negative review if that was the case, because they don't over promise anything.
I'd rather say, gamers wo overestimate¹ how much a homogeneous dual-CCX CPU is better at gaming versus a heterogeneous one, plus gamers who overestimate how much a 12+ core CPU is better in their very own choice of games versus an 8 core CPU.² IOW it's a grey area, not black and white.And gamers who incorrectly think it is any better
Most definitely, considering how many reviewers merely measure average video game FPS in a few standard undemanding scenes and call it a day. However, if AMD made such a product and called it EPYC 4{4,5}85PX, the "silly youtube thumbnails effect" could perhaps be well contained.Of course some would [give negative reviews], what do you think ?
The RX 7xxx chiplet fan-out connections should be more accurate. No stacking, but high bandwidth, low latency and low power, + much cheaper than 3D.[Strix Halo, which CCDs did AMD use?]
Yes, I/O takes space: In Granite Ridge and Turin, the SERDESes take a lot of space, as do the contact pads towards the substrate. Conversely, in Strix Halo you simply have the same on-die fabric as in Granite Ridge and Turin, but routed to reasonably small vias/ contact pads for hybrid bonding.
Strix Halo's die-to-die interconnect takes less space than Granite Ridge's and Turin's. It takes so little space that it is very well possible that it already is present on Granite Ridge's or/and Turin's CCD. Just like the connections to X3D Cache are present on Granite Ridge's CCD. And these are probably wider (at least have more bandwidth) than the die-to-die interconnect.
Here is how much — or how little — space the two types of die-to-die interconnect take in Durango, the CCD used in Raphael/ Genoa/ MI300A/ MI300C:
View attachment 115198
(from computerbase.de's report on AMD's MI300 presentation in San Jose, 12/2023)
Call it EPYC and sell for 3 x PROFIT, normal thing since it's a server chip - no bad PR2 V-cache dies == 1 Halo SKU == 2 x PROFIT == Angry people == Not worth the bad PR
Then people like me get angry. We want it somewhat within reach. Epycs and TRs are too expensive for individuals.Call it EPYC and sell for 3 x PROFIT, normal thing since it's a server chip - no bad PR
There is a 16 core Epyc version with 512 MB L3 cache with list price of 4256 USD, which is perhaps the true reason why they really don't want to release highly clocked 16 cores with 192 MB cache for like $1000.Then people like me get angry. We want it somewhat within reach. Epycs and TRs are too expensive for individuals.
[Strix Halo, which CCDs did AMD use?]
Yes, I/O takes space: In Granite Ridge and Turin, the SERDESes take a lot of space, as do the contact pads towards the substrate. Conversely, in Strix Halo you simply have the same on-die fabric as in Granite Ridge and Turin, but routed to reasonably small vias/ contact pads for hybrid bonding.
Strix Halo's die-to-die interconnect takes less space than Granite Ridge's and Turin's. It takes so little space that it is very well possible that it already is present on Granite Ridge's or/and Turin's CCD. Just like the connections to X3D Cache are present on Granite Ridge's CCD. And these are probably wider (at least have more bandwidth) than the die-to-die interconnect.
Here is how much — or how little — space the two types of die-to-die interconnect take in Durango, the CCD used in Raphael/ Genoa/ MI300A/ MI300C:
View attachment 115198
(from computerbase.de's report on AMD's MI300 presentation in San Jose, 12/2023)
The interview does not mention that as far as I can tell, so I would be grateful for a source. MI300A is said to use GMI SERDES while when talking about Halo they are using a past tense, like GMI is the "old" thing.Strix Halo's die-to-die interconnect takes less space than Granite Ridge's and Turin's. It takes so little space that it is very well possible that it already is present on Granite Ridge's or/and Turin's CCD.
And tech they are supposed to use is the same that was already linked here, where they can make much denser interconnect, use more wires and clock it lower, but still come ahead on bandwidth. Another analogue would be HMB vs GDDR.We are able to get power benefits. Because prior to that, we had a GMI PHY that lived in there and that consumed a whole lot of power in order to be able to send this over high frequencies over short distances. Here we are clocking it at a way less than the 20 gigs that the GMI was being clocked at. This is anywhere between, you know, one to two gigahertz.
Since the inteview say they are able to do away with SERDES, wouldn't we see differences in the die shots, as those wires should lead directly to structures in silicon without anything doing the translation. And they would need more wires, for the reason mentioned above.As far as usage of die area for GMI vs. fan-out, there is a big difference in that GMI uses silicon area, while RDL mainly using space on the other side of the die - metal layers of the CPU connected to RDL layer (while transistors above can do their unrelated work).
While it's ten times wider than what's going from SERDES to substrate, it's not terribly wide compared to what is already there on-chip. Actually, what's going from the Coherent Master over to the fanout contacts is likely at the same order of magnitude of clock and width as what's going from the Coherent Master to the SERDES.So my line of reasoning why they are using a different CCD is: they still need to move the same amount of data between the 2 chips (CCD <-> IOD), if it is being clocked 10 times slower, they need to make it 10 times wider unless they play with encoding etc. So since they need to do 10x the number of connections to CCD than before, whatever the before was, people doing annotation of desktop Zen5 die shots, would probably spot it. It should be visible if they are able to spot the vias for x3d cache.