Question Zen 6 Speculation Thread

DavidC1 · Aug 13, 2024

gdansk said:
But your argument doesn't hold up at all because Zen 5, Lion Cove and Zen 6 all deliver meager ST improvements. It isn't a choice of more MT at the expense of ST. MT is the only thing they can increase more than 10-15% next generation.

I believe they can, if they move away from the clockspeed ideology.

In the Golden days of scaling, you were uarch limited in terms of clocks so high pipeline stages got you a lot more. So 40% increase in pipeline might have resulted in say 25-30% increase in clocks. 10 vs 20 stages might be 60-70% difference in clocks. 3GHz vs 5.xGHz is a lot to overcome.

Now you have 9-10 stage pipeline CPUs reaching 4.4GHz, and above 5.X GHz you run into thermal density issues, so you need to do stupid things like widen the space between transistors to reduce that making it larger too. And you are doing that even though the 5.x GHz CPU has a near 20-stage pipeline. You have chips like Raptorlake literally frying itself with extra voltages to get to 6GHz.

And uop caches are better avoided. The reason? The more the cores are limited by power, die size, lower scaling, the less speculative gains are worth it. Uop cache hit is at best a chance on hit, while avoiding it and increasing it elsewhere is a guarantee. Branch predictors will never hit 100% accuracy, so there's always room for uncertainty, so those extra stages make it worth. Remember that the uop cache itself adds 2 extra stages on a miss, which is why we went from 14 stages on Core to 14-18 on Sandy Bridge.

The OC headroom for modern CPUs are zero for this reason as well. While it has been painfully slowly creeping up, above 5GHz has always been the domain of exotic cooling, regardless of pipeline stages. What happened was cooling has not only advanced, but become significantly larger too. You should see how small "power hungry" Prescott heatsinks are compared to the modern literal aluminum bricks. Or how water cooling has become common, when it used to be exotic cooling domain too.

LightningZ71 · Aug 13, 2024

With memory bandwidth not slated to increase much for Zen6 (we're assuming one more AM5 generation), it stands to reason that they aren't going to be targeting massive MT throughput improvements. It stands to reason that there will not be a core count increase as it just won't be accommodated in most MT tasks. The only thing that would throw a spanner in the works is a healthy dose of MALL cache. Even then, it would need to be quite large to have broad applicability.

gdansk · Aug 13, 2024

StefanR5R said:
After we concluded that gratis cores must be provided, does it follow that we are entitled to get host consolidation for free too?

It costs what $20? additional for them to swap a low bin Turin-D CCD in place of a Granite Ridge CCD. And they can charge $150 more for the part, increasing their ASP. Choose any numbers you like. It is genuinely in AMD's best financial interest to make such a part. Maybe in time for Arrow Lake refresh.

inquiss · Aug 13, 2024

gdansk said:
Not at all. I said that you cannot discount AMD making a 8+16 part. Even if memory bandwidth doesn't increase, they did so on Strix Point already. It isn't a fairy tale or Santa's wishlist it is literally one AMD exec wanting to increase their client group ASP by 0.1% away from existing. If Intel provides the motivation... so it may be.

And in Zen 6 it is *inevitable* even if some SKUs launch in AM5 with the same memory bandwidth that the core count increases. 10% IPC generation has to deliver something.

Nah, AMD use smaller cores for efficiency only. Why would they bring that to desktop? They won't. They have enough of a scheduling issue on laptop, why bring that to desktop when you haven't got the upside of the efficiency

inquiss · Aug 13, 2024

Fjodor2001 said:
You have to pay overhead for each additional node. Additional PSU, chassis, motherboard, etc.

Better to have a single node with X cores, than 2 nodes with X/2 cores.

Also, not all workloads even support or are suitable for multiple nodes. So it's DOA for those use cases. Additionally, a lot of people think it's to much of a hassle to bother with multiple nodes. Messier to configure, takes up more space, latency when communicating between nodes etc.

If someone is having use cases where they really want a huge number of cores, then I can understand that multiple nodes could be a good solution. Or going cloud and rent whatever you like. But not if you're looking for a 24/32C type of system (or even 64C).

If you want it all in one system, and you have a use for those cores. You get threadripper or epyc. It really is that simple

gdansk · Aug 13, 2024

inquiss said:
Nah, AMD use smaller cores for efficiency only. Why would they bring that to desktop? They won't. They have enough of a scheduling issue on laptop, why bring that to desktop when you haven't got the upside of the efficiency

Intel is on N3B shortly. Big efficiency gains. 2024 Arrow Lake will necessitate the launch of the X3D parts. Intel will have Arrow Lake Refresh in 2025. And they've shown they don't shy away from pushing parts to the limit for a refresh.

If you're AMD do you simply sit idle for 18+ months constantly decreasing your average selling price? Perhaps. Or maybe a 9950XT would be enough. But that might take some of the best binned parts from Turin, wasteful. So why not take a mediocre Turin-D CCD and go back to the 1800X/3950X moar cores no games strategy for a single part in your entire line-up which no one is forced to buy against their will and which you can conceivably sell for more than a 9950X (in small quantities)?

Just don't write it off.

DavidC1 · Aug 13, 2024

gdansk said:
If you're AMD do you simply sit idle for 18+ months constantly decreasing your average selling price?

Isn't that exactly what happened in the previous generations?

The common leakers are saying 8+32 ARL is canned so a ARL refresh may be at best 14900K like update. Which AMD will be able to counter easily with current Zen 5 parts and X3D.

gdansk · Aug 13, 2024

DavidC1 said:
Isn't that exactly what happened in the previous generations?

X3D "saved" them. But go look at their client group revenue. Are they happy with being the lowest margin segment? And this time they need X3D even to compete with base ARL. Are we looking forward to negative margin?

DavidC1 · Aug 13, 2024

gdansk said:
X3D "saved" them. But go look at their client group revenue. Are they happy with being the lowest margin segment? And this time they need X3D even to compete with base ARL.

This is Intel's only saving grace, which are now being hit by the overvoltage issues.

Weaknesses are always going to exist. AMD is doing much better on the laptop and the server market anyway.

gdansk · Aug 13, 2024

DavidC1 said:
This is Intel's only saving grace, which are now being hit by the overvoltage issues.

Weaknesses are always going to exist. AMD is doing much better on the laptop and the server market anyway.

Yes. But people continue to write it off categorically rather than entertain the possibility that AMD may make a part almost no one should ever buy (2990WX, 3800XT, 5900XT...).

And I'll add that the 7950X remained competitive in MT even with 14900K. 9950X will be less competitive with ARL in MT than that was and it only gets worse with refresh.

StefanR5R · Aug 13, 2024

[again straying off topic]
– The Turin-dense CCD doesn't look to me as if it would physically fit into the AM5 package (along with a classic CCD and the cIOD).
– Having to keep Turin-dense CCX L3 tags would be something new. Raphael's ( = Granite Ridge's) IOD may or may not be capable to do that.

gdansk · Aug 13, 2024

StefanR5R said:
[again straying off topic]
– The Turin-dense CCD doesn't look to me as if it would physically fit into the AM5 package (along with a classic CCD and the cIOD).
– Having to keep Turin-dense CCX L3 tags would be something new. Raphael's ( = Granite Ridge's) IOD may or may not be capable to do that.

Thank you! It only took 5 pages for someone to say some actual reason(s) it might not happen. Much better arguments than memory bandwidth or no one will buy it.

FlameTail · Aug 13, 2024

DavidC1 said:
Now you have 9-10 stage pipeline CPUs reaching 4.4GHz

Apple M4?

Hitting 4.4 GHz with only 10 pipeline stages sounds impressive.

poke01 · Aug 13, 2024

Going back on track, I hope the rumours about AMD separating core designs for each segment are true for Zen6.

FlameTail · Aug 13, 2024

poke01 said:
Going back on track, I hope the rumours about AMD separating core designs for each segment are true for Zen6.

Wasn't that supposed to be with Zen7?

Zen6 will use the same uarch in both client and server iirc. The difference is that client and server use different CCDs.

inquiss · Aug 13, 2024

FlameTail said:
Wasn't that supposed to be with Zen7?

Zen6 will use the same uarch in both client and server iirc. The difference is that client and server use different CCDs.

Interesting. Don't know anything about zen 7. Client (desktop and luggages) supposedly the same thing in zen 6. Server gets more specialisation there. Different memory technologies

soresu · Aug 13, 2024

poke01 said:
when do get OoO superscalar architectures from AMD?

Everything AMD has done since K7 at least has been OoO superscalar.

soresu · Aug 13, 2024

StefanR5R said:
At this point I am completely lost as to what people really want. Seemingly not just more cores, but gratis cores.

Expecting sensible discourse from a new µArch thread 16+ months out from release is a fools errand.

Just watch and enjoy the crazy train 😂

gdansk · Aug 13, 2024

soresu said:
Expecting sensible discourse from a new µArch thread 16+ months out from release is a fools errand.

We have a pretty good idea from that slide which nailed Zen 5. Bigger core complex. 10%+ IPC.

If anyone says there will be more IPC you can basically ignore them like we should have for Zen 5 speculation thread.

soresu · Aug 13, 2024

gdansk said:
Also if SMT is enough to transform x64 from wildly inefficient to competitive performance per watt in some workloads then why hasn't ARM pursued it for use in servers?

They did with E1/A65, which according to their PR seemed to be as good as A510 in ST perf and probably better in MT while being significantly more efficient too.

Exactly why they abandoned that path for A510 is uncertain, but what I definitely do know is the Neoverse cores are closely matched to the Cortex cores.

Even though some Neoverse cores have had extra functionality vs their Cortex counterparts, the core itself seems largely the same.

If SMT is a non trivial addition I can see why they wouldn't put it in Neoverse V or N because the need for it in smartphone or tablet cores is pretty low.

That being said - the shift to WoA/PC SoC's might change some attitudes going forward, who knows what the future will bring.

soresu · Aug 13, 2024

gdansk said:
We have a pretty good idea from that slide which nailed Zen 5. Bigger core complex. 10%+ IPC.

I meant about the thread in general.

Also I have serious doubts about the accuracy of that slide vis a vis post Zen 5 info.

It was many months before Zen 5 - now while that's a stretch it is possible that they had an accurate idea of Zen 5's perf.

Add a good 16 months on top of that for Zen 6 readiness/release and the slide is way too far out from Zen 6 actually being operational, to say nothing of it being extremely weird for such a cagey company like AMD to have divulged that much information about future products unless it was for semi custom clients thinking about what IP to put in their future SoC design, and even then I have doubts that this would be disclosed in slide form rather than directly by word of mouth from Lisa Su or another exec.

DavidC1 · Aug 13, 2024

poke01 said:
when do get OoO superscalar architectures from AMD?

What? Surely you meant something else? PC chips have been superscalar since the original Pentium and x86 has been OoOE since Pentium Pro(Pentium II for client). Superscalar merely means more than 1 decoder.

StefanR5R said:
– The Turin-dense CCD doesn't look to me as if it would physically fit into the AM5 package (along with a classic CCD and the cIOD).
– Having to keep Turin-dense CCX L3 tags would be something new. Raphael's ( = Granite Ridge's) IOD may or may not be capable to do that.

Yes, it assumes the CCDs are perfectly identical. It is very possible that it's not.

soresu said:
If SMT is a non trivial addition I can see why they wouldn't put it in Neoverse V or N because the need for it in smartphone or tablet cores is pretty low.

SMT the way AMD/Intel uses barely takes up any space, and is actually quite efficient perf/W wise.

It has always been a risk in terms of execution, because it complicates validation, which potentially increases the risk of the project slipping, which may be worth way more than other factors.

Intel held off SMT until Nehalem because the Oregon team had experience with it because they built Netburst. The Haifa team didn't hence Core 2 skipped it.

If the particular design team sees it as a big risk then they won't use them, plain and simple.

soresu said:
Also I have serious doubts about the accuracy of that slide vis a vis post Zen 5 info.

Those differences won't make Zen 6 go from 10% to 32%. The seeds of hype can be planted in many ways, including the post above.

SpudLobby · Aug 14, 2024

soresu said:
I meant about the thread in general.

Also I have serious doubts about the accuracy of that slide vis a vis post Zen 5 info.

It was many months before Zen 5 - now while that's a stretch it is possible that they had an accurate idea of Zen 5's perf.

I mean, so? Sure they could have had a good idea of the performance. Exist50 did and told me all about it even while I was skeptical it’d be that middling (relative to the structure changes and what they need anyways).

Doing “we just don’t know” on some level is pretty much just giving cover to circle jerks here that will evolve. I’d go ahead and bet on an 8-13% integer gain with Zen 6, minimal clock changes, some power gains and a pretty similar ST perf/W gap with Apple and Arm, Qualcomm as usual.

poke01 · Aug 14, 2024

DavidC1 said:
What? Surely you meant something else? PC chips have been superscalar since the original Pentium and x86 has been OoOE since Pentium Pro(Pentium II for client). Superscalar merely means more than 1 decoder.

oh I was way off. I thought it meant high IPC with low clocks. Apologies.

MadRat · Aug 14, 2024

AMD already supports more bandwith. You are simply not maximizing the existing IF subsystem currently because its desynchronizes memory clocks. You have to design an external memory controller to do it and you can raise your bandwidth significantly to the core system using caches, maximum flck, and interleaving memory channel technologies. But it will cost you.

Question Zen 6 Speculation Thread

Golden Member

Platinum Member

Diamond Member

Senior member

Senior member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Golden Member

Diamond Member

Lifer