Question Zen 6 Speculation Thread

Page 36 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

OneEng2

Senior member
Sep 19, 2022
209
314
106
N3e is not that good honestly, we know Apple has shifted to a high performance library instead of high density library, and they got from 4 to 4.5GHz, not that impressive considering this fact. Intel is clocking at 5.7GHz with N3B so yeah it won't be any speeding beast of a process node.
I think that if you are willing to give up die space for IPC, and pay for a high transistor budget on a premium fab node, you can get very high performance a low power from a single core. Apple products are much more expensive than PC's. This is one major contributor to that cost IMO.
I meant software adoption. And I mentioned that AMD will have it for free, as Intel is actively driving SIMD library efforts, which hardly can be said about AMD. They cannot even ensure their CPUs have reasonable support in mainstream compilers 3 months past launch...
Agree. For decades, Intel has driven new instruction sets and their adoption .... except when they didn't (ie 3DNow! and x64) and were forced to by the industry. In fact, by being in a near monopoly position, Intel could release new instruction sets, get the optimizations in place, and enjoy an entire design cycle of being ahead of AMD just with this one advantage alone.

It seems like Intel has run out of new tricks for new instructions as of late though. You don't hear about new miracle instructions so much anymore. I mean, sure AVX512 rocked with its crazy wide execution path. How could it not improve performance ... albeit at a good cost in die size, power, etc. Still, it did improve IPC in the loads it can be used in by a BIG margin. Then Intel got rid of it on desktop because they didn't want to pay the price in power and die space for the feature.... while AMD still has it. You don't suppose Intel can use their monopoly power to get software compilers to remove the support so they don't look so bad do you? .
 
Reactions: poke01

Doug S

Platinum Member
Feb 8, 2020
2,867
4,876
136
It isn't though. It's 10%. After 2 years. Not gonna cut it except in the server market where they can spam more cores to make up for it and they're still not at the frequency wall.

The solution to your issue is "do better than 10% if you're taking two years to get there", not "do two designs and hope that magically you'll end up with >10% cumulative with the second one".
 

gdansk

Diamond Member
Feb 8, 2011
3,188
5,041
136
The solution to your issue is "do better than 10% if you're taking two years to get there", not "do two designs and hope that magically you'll end up with >10% cumulative with the second one".
That isn't relevant to Zen 6 as we already know it's about ~10%. And this is the Zen 6 thread so it colored my line of argument heavily. The only way to prevent stagnation is to deliver it quicker. It is a problem for AMD if they want to be relevant in the laptop market.

Disregarding Zen 6 you can look at the last two years and see ARM delivered 1.1x and 1.15x and clock rate increases on top of that. In the same time period AMD only delivered Zen 5 with 1.17x (more like 1.15x, but I'll be charitable) at the same clock rate.
 
Last edited:
Reactions: FlameTail

soresu

Diamond Member
Dec 19, 2014
3,303
2,570
136
while AMD still has it. You don't suppose Intel can use their monopoly power to get software compilers to remove the support so they don't look so bad do you?
The reason they removed it from desktop/mobile was the lack of common support in the E cores of Alder Lake.

The point of AVX10 is to give parity for 256 bit instructions vs AVX512 so that they can bring it back.

Besides - 512 bit SIMD is the one major advantage they have against (most) ARM based solutions at the moment while even Cortex X and Neoverse V are stuck on 128 bit SVE2.
 
Reactions: OneEng2

yuri69

Senior member
Jul 16, 2013
567
1,010
136
I have observed the same situation for zen4. Actually this time it was better as the initial enablement in GCC landed 6 months ahead of release iirc, but it contained some misinformation about cpu capabilities... Then after CPUs were released, GCC maintainers done some benchmarks to provide znver5 specific tunings. In clang, they added the support month after GR release, almost missing the 19.1.0 release window, raising questions from maintainers why they did not bring the support sooner since gcc got it half a year faster [the issue was the support patches landed in at the time the release branch was supposed to accept bug fixes only]. Oh, and the patch did not contain any tunings so znver5 in clang is basically znver4. This is even funnier since AMD's own compiler is based on clang/llvm... MSVC does not contain any chip specific tunings or enablements so there was nothing to enable there.

Contrast with Intel is stark, where Diamond Rapids is getting enablement patches right now.
Yea, this is hilarious since AMD has been boasting they were a software company.
 

poke01

Platinum Member
Mar 8, 2022
2,479
3,287
106
Yea, this is hilarious since AMD has been boasting they were a software company.
We will know how accurate that statement is come RDNA4, their AI upscaling should be on par with XeSS at least and if it’s on par with DLSS or even better then they are finally moving towards being software being the focus.
 

inquiss

Senior member
Oct 13, 2010
236
339
136
That isn't relevant to Zen 6 as we already know it's about ~10%. And this is the Zen 6 thread so it colored my line of argument heavily. The only way to prevent stagnation is to deliver it quicker. It is a problem for AMD if they want to be relevant in the laptop market.

Disregarding Zen 6 you can look at the last two years and see ARM delivered 1.1x and 1.15x and clock rate increases on top of that. In the same time period AMD only delivered Zen 5 with 1.17x (more like 1.15x, but I'll be charitable) at the same clock rate.
How did we know that zen 6 will only raise performance by 10%?

We don't know the clock rate and we didn't know the impact of the new topology which should reduce latency and improve bandwidth. Even if clock rate doesn't increase, which it will, the new structure should also have some impacts and we get a peek into that when halo launches.
 

moinmoin

Diamond Member
Jun 1, 2017
5,118
8,164
136
The solution to your issue is "do better than 10% if you're taking two years to get there", not "do two designs and hope that magically you'll end up with >10% cumulative with the second one".
AMD is supposed to have several teams independently working on the core. In the ideal case this would mean one team's delay wouldn't affect the next one. But it seems AMD likes moving things around, like the widely reported Zen 3 features appearing in Zen 2 already.

Without knowing anything more I'd say this is one of the reasons the cadence seems to get longer instead of shorter again as would be expected of multiple independent teams. If AMD continues like this it could get into similar situations like Intel where it bit off more than it could chew.
 
Reactions: Tlh97 and Racan

Meteor Late

Member
Dec 15, 2023
95
83
51
How did we know that zen 6 will only raise performance by 10%?

We don't know the clock rate and we didn't know the impact of the new topology which should reduce latency and improve bandwidth. Even if clock rate doesn't increase, which it will, the new structure should also have some impacts and we get a peek into that when halo launches.

Yeah, Zen 6 will definitely improve clock speed, at minimum the jump will be to N3P, which is a good jump in performance, there is no way Zen 6 doesn't clock at 6GHz single core IMO, I would be very surprised if that is the case.
 

Gideon

Golden Member
Nov 27, 2007
1,825
4,326
136
Yeah, Zen 6 will definitely improve clock speed, at minimum the jump will be to N3P, which is a good jump in performance, there is no way Zen 6 doesn't clock at 6GHz single core IMO, I would be very surprised if that is the case.
I know it ain't happening, but If I had a choice between 30% more IPC at 5Ghz vs 10% more IPC at 6 Ghz I'd take the former any day of the week. 6Ghz essentially only helps desktop and desktop-replacement SKUs. IPC would help laptops, servers, MT load ... everything else as well.
 

Meteor Late

Member
Dec 15, 2023
95
83
51
I know it ain't happening, but If I had a choice between 30% more IPC at 5Ghz vs 10% more IPC at 6 Ghz I'd take the former any day of the week. 6Ghz essentially only helps desktop and desktop-replacement SKUs. IPC would help laptops, servers, MT load ... everything else as well.

Not necessarily, because then 5.5GHz, 5GHz etc would consume substantially less power, so it depends how it is done. We are assuming the 30% more IPC would be more efficient than 10% and 6GHz, but it's not always the case.
 

eek2121

Diamond Member
Aug 2, 2005
3,131
4,476
136
AMD does not use the leading edge for client. Zen 6 client will be a variant of N3. Server might get N2, or might not.

Disagree with Apple and ARM only making small improvements YoY. CAGR is what matters.
View attachment 111379

Geekbench 6 Single Core202220232024
ARM2000
Cortex X3
(8G2)
2300
Cortex X4
(8G3)
2900
Cortex X925
(D9400)
Apple2600
 M2
3200
 M3
4000
 M4
Qualcomm3200
Oryon-L
(8E)
Intel3100
13900K
3200
14900K
3400
285K
AMD3000
7950X
3500
9950X

If this trend continues, ARM vendors will outpace x86 sooner than later.

I used Geekbench6 for convenenience, but I am sure this trend can been seen in SPEC2017 too.

Sure, ARM CPUs having 20% faster single core isn't going to lead them to taking 50% of the market overnight. X86 CPUs are protected by a moat of advantages such as compatibility and modularity. But what about the long term? Those advantages won't last for perpetuity. App compatibility on ARM is improving day by day, and one day there will be socketable desktop ARM CPUs.

X86 vendors will have to quicken their pace and/or deliver larger improvements with each generation, to keep up.
Now do Geekbench 5 and get back to us.
Cadence is essentially reaction time or latency. And if it's filler you can see a company scrambling for something to offer where the intended products didn't work out as planned (as it happened with Intel plenty of times in the past decade).


No, at the higher end of this range it is not. There's a danger of AMD falling behind in CAGR, and the slower the cadence the harder it will be for AMD to catch up in CAGR.

I personally was hoping that the cadence entries running late would be subsequently followed by entries coming sooner, keeping the cadence average low. But that doesn't seem to be the case at all.


Compared to the previous M series gens M4 had some surprisingly solid IPC improvements on top of that though.
AMD is doing just fine.

Note that we aren’t done with the Zen 5 launch cycle yet. As we go into 2025, we haven’t even seen the fastest chips launch.
That isn't relevant to Zen 6 as we already know it's about ~10%.
No, we don’t know that. AMD has not released anything suggesting that. Further, we are now seeing evidence that Zen 5 is starved for memory bandwidth. Zen 6 will very likely resolve this issue, so even if the IPC gains are “only” 10%, actual performance gains will be greater.
 

Gideon

Golden Member
Nov 27, 2007
1,825
4,326
136
Not necessarily, because then 5.5GHz, 5GHz etc would consume substantially less power, so it depends how it is done. We are assuming the 30% more IPC would be more efficient than 10% and 6GHz, but it's not always the case.
Their fastest mobile chip is stuck at 5 Ghz on TSMC 4nm. I'm pretty sure Strix Halo will reach a bit higher (as it uses desktop dies optimized for higher clock speed), but I seriously doubt any 3E chip will do anywhere near 6Ghz in mobile processors.
 
Reactions: Tlh97 and marees

FlameTail

Diamond Member
Dec 15, 2021
4,197
2,548
106
but If I had a choice between 30% more IPC at 5Ghz vs 10% more IPC at 6 Ghz
Brainiacs and Speed Demons (terminology from the 1990s).

The problem with speed demons (lower IPC, higher frequency) is that they consume a lot of power. Also now that Moore's Law is dead, scaling up clock speed has become very difficult. This type of core is suitable only for desktops and workstations.

Brainiacs (higher IPC, lower frequency) are well suited for smartphones, laptops and servers, due to their lower power consumption.

There are signs that the two philosophies are converging, and this classification is mostly a relic from the past.
 
Reactions: Tlh97

OneEng2

Senior member
Sep 19, 2022
209
314
106
That isn't relevant to Zen 6 as we already know it's about ~10%. And this is the Zen 6 thread so it colored my line of argument heavily. The only way to prevent stagnation is to deliver it quicker. It is a problem for AMD if they want to be relevant in the laptop market.

Disregarding Zen 6 you can look at the last two years and see ARM delivered 1.1x and 1.15x and clock rate increases on top of that. In the same time period AMD only delivered Zen 5 with 1.17x (more like 1.15x, but I'll be charitable) at the same clock rate.
In DC, performance per watt always becomes the limiting factor for scaling. Zen 6 will have 32 core CCD's giving a potential DC part having 384 cores and 768 threads on a 1K watt socket. I'll bet you a coke it improves performance by more than 10% in this market .... the highest growth, highest margin market.

In Laptop (the next most important market) it will be more about battery life and cost than outright performance. I agree that ARM may well pose a threat in this market in 2026 (especially in thin and light).

It is only in the rapidly shrinking performance desktop market where outright performance is demanded.... and guys, gamers only account for a fraction of the desktop market. Granted, it is still some high profit business, just not enough to shake a stick at compared to DC and laptop (and the rest of desktop not gaming).
I know it ain't happening, but If I had a choice between 30% more IPC at 5Ghz vs 10% more IPC at 6 Ghz I'd take the former any day of the week. 6Ghz essentially only helps desktop and desktop-replacement SKUs. IPC would help laptops, servers, MT load ... everything else as well.
Performance per watt is going to be increasingly important in designs moving forward. You have to get the heat out somehow, and more and more devices run on batteries.
 
Reactions: Tlh97 and Thibsie

StefanR5R

Elite Member
Dec 10, 2016
6,012
9,040
136
Brainiacs and Speed Demons (terminology from the 1990s).

The problem with speed demons (lower IPC, higher frequency) is that they consume a lot of power.
High IPC has its own power costs: Overhead to manage out-of-order execution, energy wasted whenever you guessed wrong in speculative execution...

There are signs that the two philosophies are converging,
Necessarily, I'd say.
Another approach, but of limited applicability, is to turn to SIMD whenever you can = to turn your problems into low-IPC problems.
 

Josh128

Senior member
Oct 14, 2022
505
856
106
I mused about this in the Zen 5 speculation thread, but I doubt it will happen this gen on the dual CCD SKUs. IT NEEDS to happen on by Zen 6 though. This could be a game changer for a LOT of workloads, not just for gaming. Arguably more important than the additional L3 cache is creating an effective 16 core CCX from two 8 core CCDs-- which, when you look at the gains made from going 4 core CCX to 8 core CCX in Zen 2 to Zen 3, it was very significant. This would combine the huge cache with the unified / 16 core CCX to make the most powerful 16 core x86 chip by far. Its potentially an entire gen on gen gain just from unifying the L3. The new beneath the CCD setup seems to be perfect for doing this to my uninformed mind.


L3 latency would increase, but it should be far from "horrible". If the chiplets are layed out correctly, it would be not much worse than the latency of the L3 on the 16 core CCX on Zen 5c. If the chiplets are layed out as below with a single L3 SRAM die below them, there you have it. At worst, the latency would double from far core to far core/L3, but still about the same as if you had a single monolithic chip with one long L3 layout between the cores, which they do have. If you disagree, please refer to the figures below and explain why.

Im no CPU engineer, but I dont see how this is something that would not be quite possible, and quite beneficial, to make a killer CPU. It may not happen on AM5 due to pin layouts, etc, but I think it WILL happen eventually.

Theoretical 9950X3D:



Existing Zen 5c CCD
 
Reactions: Tlh97

MS_AT

Senior member
Jul 15, 2024
347
768
96
It seems like Intel has run out of new tricks for new instructions as of late though. You don't hear about new miracle instructions so much anymore. I mean, sure AVX512 rocked with its crazy wide execution path. How could it not improve performance ... albeit at a good cost in die size, power, etc. Still, it did improve IPC in the loads it can be used in by a BIG margin. Then Intel got rid of it on desktop because they didn't want to pay the price in power and die space for the feature.... while AMD still has it. You don't suppose Intel can use their monopoly power to get software compilers to remove the support so they don't look so bad do you?
The crazy wide execution of AVX512 is arguably not the biggest advantage the AVX512 brought to the table for x64, but people focus on that stupid number because it is in the name and Intel in its infinite wisdom mandated that 512b execution must be supported, while 256b and 128b are optional extensions

Intel got rid of it, because E-cores implementing AVX512 according to specification would be either terribly slow or the area cost would grow to the point e-cores could no longer be so easily spammed in such quantities as they needed to spam them. Of course they got rid of the support in the next release cycle after introducing AVX512 support to client devices to make software developers happier... AVX10 is the effort to get the benefits back without having to pay the cost of 512b shuffle units.

I also don't understand the comment about software compilers support. Two out of three C/C++ biggest compilers are open source and Intel has no power to force them to do anything. Not to mention you could fork them and add the support back if you needed. If anything AMD is so successful due to Intel's work to add the support for AVX512 to those compilers. And Intel's ICX (clang's fork, AOCC is AMD's fork) still compiles code fine for Zen, for example Y-cruncher is using that for Zen5 optimized binaries as in Mystical's evaluation it's still doing a better job than upstream clang. So while I understand Intel's CPU execution leaves a lot to be desired, their software efforts in open source domain are more prominent than AMD's own.

But yes, that's probably off-topic here
 

naukkis

Senior member
Jun 5, 2002
961
828
136
The goal is maximizing performance after all, so it isn't "gimping" L1 if that change allows other changes that lead to higher overall performance. Apple is leading right now because of the combination of IPC and frequency. If you're afraid to make any changes that will reduce IPC (which increasing L1 latency clearly does) your ability to increase frequency will be greatly diminished.

You know that complex addressing modes are one of that. Those minimum L1 latencies are only available for simple memory addressing, complex memory addressing adds one cycle on both ARM and x86. So why not to have ISA that does only support those addressing modes with minimum access latencies?
 

maddie

Diamond Member
Jul 18, 2010
4,913
5,035
136
High IPC has its own power costs: Overhead to manage out-of-order execution, energy wasted whenever you guessed wrong in speculative execution...


Necessarily, I'd say.
Another approach, but of limited applicability, is to turn to SIMD whenever you can = to turn your problems into low-IPC problems.
True.

Also, IPC increases with transistor budgets in an inverse power function relationship to the # of transistors. With node gains decreasing, that alone might hamper future IPC gains. Power consumed and maybe more importantly, to dissipate, will slow advances.
 

Doug S

Platinum Member
Feb 8, 2020
2,867
4,876
136
You know that complex addressing modes are one of that. Those minimum L1 latencies are only available for simple memory addressing, complex memory addressing adds one cycle on both ARM and x86. So why not to have ISA that does only support those addressing modes with minimum access latencies?

If you do that then using that ISA you'll need to do the register addition or whatever to calculate your address before executing those "simple" memory addressing instructions. There is going to be latency in that additional instruction you're executing so the total latency will be same - at best. It could be worse, if you need more than one additional instruction.

And while I suppose your IPC will be artificially increased (this requires two or more instructions to accomplish the same thing an ISA that supports more complex addressing needs only one for) in reality you're worse off. You have to fetch two or more instructions instead of just one, so this reduces the effective size of your instruction cache, increases power due to more fetch/decode, etc.

There's a reason why the "pure RISC" philosophy (which is what you're advocating for) fell out of favor once transistor budgets rose to the point where supporting more complex addressing modes no longer had any meaningful cost associated with it.
 

Thibsie

Senior member
Apr 25, 2017
902
1,014
136
There's a reason why the "pure RISC" philosophy (which is what you're advocating for) fell out of favor once transistor budgets rose to the point where supporting more complex addressing modes no longer had any meaningful cost associated with it.
Isn't this what RiscV (with its pros and cons) is all about ?
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |