Question Zen 6 Speculation Thread

inquiss · Dec 31, 2024

yuri69 said:
As @Hitman928 stated the profiling of Zen 5 done so far had revealed the frontend bandwidth - decoders busy, uOp busy - is NOT the bottleneck. The frontend latency - a L1i cache miss, fetch after a mispredicted branch, ITLB misses, etc. - often is.

So doubling the decoder width won't help that much.

The slide contains marketing speak since the slide was not internal but presented under NDA. This means they definitely do not need to sandbag or anything like that.

* For the then-existing gens, they list real the marketing IPC figure.
* For Zen 5 they listed 10-15+% IPC goal
* For Zen 6 they listed 10+% IPC goal

Compare Zen 5 with Zen 6. If they knew the IPC goal would be higher than 15%, they would present it as such. Higher is better. But they did not.

Would items like reduced latency from the new packaging be included in the IPC number though? That's what I'm curious about. Core could be 10% but benefits from IOD and packaging impact on performance might not be...

OneEng2 · Dec 31, 2024

gdansk said:
Note: This problem only applies to x64 in the near term. Apple and ARM will keep delivering decent improvements every year until they hit the power and frequency wall. Probably only a few years later but still.

Physics and economics apply to ARM as well. I disagree.

Win2012R2 said:
With greatly diminishing probability for each +1%, where as 10%+ makes it very likely to be in range of 10-15% (mid is only 12.5%), but a lot less so to be in 10-20%

Since Zen 5 underperformed (I reckon dual branch predictor not working as intended so in effect it's less wide than Zen 4) - if that's fixable and gets fixed in Zen 6 then maybe we will see better improvement than on that slide.

I'll be happy if they add Intel APX to it

Indeed, doubling the GPRs would help in many situations and mostly across the board.

It was my understanding that x86 had mitigated this mostly though through the use of extended registers and register renaming?

Win2012R2 said:
On a plus side - if they do so well with 4-wide decoder and another one disabled in non-SMT mode, then imagine how much better it will be with just 5-wide decoders? Maybe primary (most likely branch) decoder should be 5-6 wide and secondary is 4-wide...

Hitman928 said:
From the (granted limited) publicly available profiling done on Zen 5, the decoder width was seldom the bottleneck for the Zen 5 core.

I think that a new IOD and faster memory will Open up some use cases so that fixing the decoder will actually help.

I still think 10-15% is a good bet.

gdansk · Dec 31, 2024

OneEng2 said:
Physics and economics apply to ARM as well. I disagree

But less immediately when their performance per watt is 3x AMD and Intel. How else did Apple go from merely matching Zen 3 with M1 to eclipsing Zen 5 by 15% (or more) with M4? It's easier for them to boost clock rates than it is for AMD/Intel to find IPC improvements.

MS_AT · Dec 31, 2024

OneEng2 said:
Indeed, doubling the GPRs would help in many situations and mostly across the board.

It was my understanding that x86 had mitigated this mostly though through the use of extended registers and register renaming?

Register renaming solves different problem. Compiler does not know your register file has hundred entries, as this is an implementation detail, it will spill as soon as you run out of architectural GPRs (16 for x64, 32 for aarch64, 32 for x64 + APX)

Bigos · Dec 31, 2024

Zen 2 and 4+ can rename simple memory write/read pairs, possibly alleviating the lack of architectural registers a bit (at the cost of unnecessary memory instructions, which take cache space and require load/store resources).

igor_kavinski · Dec 31, 2024

inquiss said:
Would items like reduced latency from the new packaging be included in the IPC number though? That's what I'm curious about. Core could be 10% but benefits from IOD and packaging impact on performance might not be...

Yeah, I think MT throughput doesn't figure into IPC calculations. IPC is usually related to the ST throughput. Reduced latency and higher internal bandwidth could allow the cores to work even better together.

Win2012R2 · Dec 31, 2024

gdansk said:
But less immediately when their performance per watt is 3x AMD and Intel. How else did Apple go from merely matching Zen 3 with M1 to eclipsing Zen 5 by 15% (or more) with M4?

N3E is lovely

Maybe in mobile phones ARM it is 3x, in laptops that's not the case and certainly not in server space where wattage also matters, also Apple isn't ARM - it's a very special sauce, for a very special very high price. If only AMD could sell you a laptop where they could charge $600 for couple of TBs of subpar NAND...

Just look at Ampere chips (at their reviews since they ain't actually easily buyable) - they matched core counts, but with all that PCIE5 etc their wattage is up there with AMD and price isn't exactly "dirty cheap enough to port code to diff arch", it basically did not go anywhere other than maybe Oracle who (I think) funds it in the first place.

P.S. I am a very happy Apple M4 user.

OneEng2 said:
It was my understanding that x86 had mitigated this mostly though through the use of extended registers and register renaming?

Having lots of hidden registers to rename isn't exactly cheap thing to have - perhaps explicit support might help reduce necessity to have such a large number, it's certainly a LOT better for programmer.

Tuna-Fish · Dec 31, 2024

MS_AT said:
Register renaming solves different problem. Compiler does not know your register file has hundred entries, as this is an implementation detail, it will spill as soon as you run out of architectural GPRs (16 for x64, 32 for aarch64, 32 for x64 + APX)

Register renaming allows the compiler to just keep a live set, and offload finding ILP to the CPU. Modern compilers are absolutely assuming renaming and hundreds of registers. Before renaming was common, compilers were designed to try to extract ILP in ways that increased register pressure, through aggressive unrolling and interleaving and the like.

gdansk · Dec 31, 2024

Win2012R2 said:
N3E is lovely

Possibly, but like I said: cost concerns impact ARM less because the phone market justifies designing the core for the latest processes. Physics are the same for both of them but their design allows them to avoid some of the worst of it (for now).

Win2012R2 · Dec 31, 2024

gdansk said:
cost concerns impact ARM less because have the phone market

I disagree - mobile market is slowing down, new stuff just isn't offering a lot better features, and it's got REALLY expensive too - not just Apple phones, Samsung also on same high level. The whole mobile market most likely peaked - it's already switched to model of "sell old stuff that retained value OKish", that's like car used market, we might see the same with GPUs going forward.

ARM is getting very tiny amount per unit anyway, even with their price hikes (apart from desire to take %-tage of final price - but I think that went nowhere), they have a problem of getting into new markets and in server space the new stuff is GPUs that gets all the money for the moment.

gdansk · Dec 31, 2024

Win2012R2 said:
I disagree - mobile market is slowing down

Go plot the GB6 1T or SPECint scores of A and X series. It really isn't slowing down like x64. ARM might suck at it doing it in a decent area but their "partners" are doing better.

Zen 6 is a 10% generation so I'm pretty sure AMD won't even catch up to M4 by 2026. And by that fall Cortex X will be ahead of them in performance and performance per watt (probably not on area, however, which matters for server).

Win2012R2 · Dec 31, 2024

gdansk said:
It really isn't slowing down like x64

The market is slowing down - people can't drop grand and a half on new toy every year which isn't much different than the one they've got, purchase cycles increased, the "AI" stuff isn't working yet to get people buying.

gdansk said:
Go plot the GB6 1T or SPECint scores of A and X series

Yeah, it's amazing what one can do when selling 200 mln+ premium devices per year - personally I'd prefer if they got battery life in my iPhone to 7 days.

igor_kavinski · Dec 31, 2024

gdansk said:
Zen 6 is a 10% generation so I'm pretty sure AMD won't even catch up to M4 by 2026.

Most x86 users won't care. At most, they will get an ARM device to "feel" the snappy performance of the latest ARM SoC from whoever. The number of x86 power users who will ditch their existing x86 device and go full ARM is going to be miniscule especially if they are not mobile warriors.

For ARM to cause the premature death of x86, it needs to offer at least 90% emulated x86 performance. Some may even be OK with 50% emulated performance, as long as the emulation is so robust and solid that it runs almost any x86 executable flawlessly, save for the odd ones.

igor_kavinski · Dec 31, 2024

Win2012R2 said:
P.S. I am a very happy Apple M4 user.

What's your use case?

I have an M1 and so far I'm only using it for viewing movies/TV shows. Don't see the need to upgrade coz I can't think of running anything on it that would fill me with utter, inexplicable bliss.

gdansk · Dec 31, 2024

igor_kavinski said:
Most x86 users won't care. At most, they will get an ARM device to "feel" the snappy performance of the latest ARM SoC from whoever. The number of x86 power users who will ditch their existing x86 device and go full ARM is going to be miniscule especially if they are not mobile warriors.

For ARM to cause the premature death of x86, it needs to offer at least 90% emulated x86 performance. Some may even be OK with 50% emulated performance, as long as the emulation is so robust and solid that it runs almost any x86 executable flawlessly, save for the odd ones.

I didn't say anything about the death of x64. Just an argument that ARM vendors will not slow down until later while the x64 vendors already have:

To reiterate:

Page 48 - Question - Zen 6 Speculation Thread

Page 48 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

OneEng2 said:
I think we need to start getting used to smaller improvements and longer times between generations. Damn Physics!

gdansk said:
Note: This problem only applies to x64 in the near term. Apple and ARM will keep delivering decent improvements every year until they hit the power and frequency wall. Probably only a few years later but still.

I.e. A better ISA and more money from an unassailable, captive and massive phone market means they can keep improving for longer.

igor_kavinski · Dec 31, 2024

gdansk said:
Just that the argument that ARM vendors will not slow down until later while the x64 vendors already have:

They can't keep going forever. They will hit a plateau sooner or later but yes, there is a good chance that they may leave the x86 players quite behind and then those players will spend quite sometime catching up. It's actually good for x86 because ARM is showing them that more performance is within reach. Lunar Lake's performance is miraculous compared to Meteor Lake and we probably wouldn't have seen it materialize without M1.

adroc_thurston · Dec 31, 2024

gdansk said:
Just an argument that ARM vendors will not slow down until later

They'll slow down once they run out of clock bumps. Whatever.

igor_kavinski · Dec 31, 2024

If Zen 6 goes 24C/48T or even 20C/40T, is the ARM camp gonna have a competitively priced MT monster to challenge it? This is what I'm interested in seeing and both Qualcomm and Nvidia better be ready to take the challenge or go home.

gdansk · Dec 31, 2024

igor_kavinski said:
hey will hit a plateau sooner or later but yes

It is definitely later. x64 plateaued in 2024.

adroc_thurston said:
They'll slow down once they run out of clock bumps. Whatever.

Yep. But that's a lot of room compared to AMD.

igor_kavinski said:
If Zen 6 goes 24C/48T or even 20C/40T, is the ARM camp gonna have a competitively priced MT monster to challenge it?

Yeah, the M6 Max.

igor_kavinski · Dec 31, 2024

gdansk said:
Yeah, the M6 Max.

Come on. That's anything but competitively priced. Apple has no interest in competing. They go directly for the "I don't know how but I keep getting richer" crowd.

gdansk · Dec 31, 2024

igor_kavinski said:
Come on. That's anything but competitively priced. Apple has no interest in competing. They go directly for the "I don't know how but I keep getting richer" crowd.

It includes a GPU of about ~400mm2 of leading edge silicon with raytracing and AI acceleration. How much will that cost in 2026/2027? Jensen's Pricing Law suggest $1500 to $2000 alone.

igor_kavinski · Dec 31, 2024

gdansk said:
It includes a GPU of about ~400mm2 of leading edge silicon with raytracing and AI acceleration. How much will that cost in 2026/2027? Jensen's Pricing Law suggest $1500 to $2000 alone.

That's mostly of interest to developers at this point. Apple isn't even trying seriously to entice studios to port their games to Apple Silicon. Like maybe offering to cover 50% of the porting costs for only AAA games, for start. They have the money but don't want to spend it because they are haunted by Steve Jobs' ghost at night if they try to think seriously about targeting the games industry.

gdansk · Dec 31, 2024

igor_kavinski said:
That's mostly of interest to developers at this point. Apple isn't even trying seriously to entice studios to port their games to Apple Silicon. Like maybe offering to cover 50% of the porting costs for only AAA games, for start. They have the money but don't want to spend it because they are haunted by Steve Jobs' ghost at night if they try to think seriously about targeting the games industry.

Not the subject of this thread nor do I care.

Zen 6 will need to surpass AMD's own expectations to match a 5.1mm thin tablet from 2024 in 1T performance in 2026.
Commodity ARM cores will offer as much 1T performance as Zen 6 in 2026 at lower power. May be costly.
People who want workstation MT will have it and (in whole system analysis) in a price competitive way.

Whether or not people switch to ARM or not I do not care. Personally I rather stay on x64 because I have old programs which fail to run under WoA emulation currently. But it's like living in Detroit in the 60/70s watching everything turn to crap.

Win2012R2 · Dec 31, 2024

igor_kavinski said:
What's your use case?

Typical tablet - browsing, video watching, works great, potentially it can do 10x perf editing videos and stuff, but I don't use it for that - would have preferred to cut those bits off and save on battery life!

gdansk said:
Zen 6 is a 10% generation so I'm pretty sure AMD won't even catch up to M4 by 2026.

I am ok with that, x86 has been around far longer than current crop of ARMs and it has to deal with backwards compatibility going back to 80s, obviously it's easier to achieve uplifts in CPU arch where you control vertically whole stack and can jettison old stuff if necessary. As far as I am concerned they've won mobile market (phone, tablet), but for proper laptop, desktop and especially servers it's x86 all the way, but if I was hyperscaler I'd also use ARM as a leverage to prevent Intel/AMD from charging me their outrageous full list prices...

The thing that worries me most about x86 is COST - not in client mind you, that's very reasonable, but in servers it's totally outrageously too expensive - like 15 grand for a single bloody chip? Pricing is way out of whack, power usage is also going up to crazy levels, so again - cost, but ongoing.

adroc_thurston · Dec 31, 2024

gdansk said:
But that's a lot of room compared to AMD.

Not really. Apple mobile is just 600MHz lower than Strix now.
Everyone will ship 5G cores and then it'll get hard.

gdansk said:
Zen 6 is a 10% generation

Oh come on. Who forgets the frequency like that.

Question Zen 6 Speculation Thread

Senior member

Senior member

Diamond Member

Senior member

Member

Lifer

Senior member

Golden Member

Diamond Member

Senior member

Diamond Member

Senior member

Lifer

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Senior member

Diamond Member