Question Zen 6 Speculation Thread

Page 47 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

MS_AT

Senior member
Jul 15, 2024
449
972
96
AVX10.1/256 is AVX512VL without the 512-bit registers. That is, AMD already implements it, it's a subset of AVX512.
I really hope they will not drop AVX512 support and that they support will AVX10/512 (if I am not mistaken this implies that AVX10/256 will also be supported) on client cores. I mean they already have superior physical implementation regardless if we look at half or full rate. I think it would be a waste if they dropped 512b support on client cores [I would have preferred they would keep full rate, but if they deem it too much, I really hope 512b half-rate will be used] as this is giving the advantage back to intel for seemingly no gain.
 

Win2012R2

Senior member
Dec 5, 2024
647
609
96
AVX10.1/256 is AVX512VL without the 512-bit registers. That is, AMD already implements it, it's a subset of AVX512.
Yes, but if there are no 512 bit registers then how would legacy AVX-512 work on that CPU? Double pumping still (I think) required full registers - if /256 isn't compatible with AVX-512 then it's DOA

I really hope they will not drop AVX512 support
That would be an incredibly stupid thing to do now that it finally gets proper traction in software support, and results are great already.

Even if AVX10.2 is great, it will appear like when sometime in 2026? Who knows how Intel will look like by that point
 

gdansk

Diamond Member
Feb 8, 2011
3,768
6,015
136
They can abandon 512 bit registers and add more 256 bit execution units. This would allow increased throughput for 128/256 bit operations and still have the same throughput (but with more latency) for 512 bit operations. But I think it takes more transistors overall.
 
Reactions: Tlh97 and Win2012R2

gdansk

Diamond Member
Feb 8, 2011
3,768
6,015
136
Would that still work with AVX-512 code?
Yes, I mean going back to a Zen 4 like AVX-512 but with the load/store bandwidth increases of Zen 5 and more execution units + larger queues so there could be more "double pumped" operations in flight at the same time.

It should be better for client workloads and also still be good for HPC. And on c cores they could not add the extra FPU pipes/queues to further save area if they wanted to cheap out (and I think they do).
 
Last edited:
Reactions: Tlh97 and Win2012R2

GTracing

Senior member
Aug 6, 2021
276
645
106
Yes, I mean going back to a Zen 4 like AVX-512 but with the load/store bandwidth increases of Zen 5 and more execution units + larger queues so there could be more "double pumped" operations in flight at the same time.

It should be better for client workloads and also still be good for HPC. And on c cores they could not add the extra FPU pipes/queues to further save area if they wanted to cheap out (and I think they do).
Double pumping doesn't reduce the size of the registers. Double pumped AVX-512 requires the full 512 bit registers.
 

Win2012R2

Senior member
Dec 5, 2024
647
609
96
"It is worth noting that despite Intel AVX10/512 including all of Intel's AVX-512 instructions, applications compiled to Intel AVX-512 with vector lengths limited to 256-bit are not guaranteed to work with an AVX10/256 processor due to differences in the supported mask register width."


Just more problems, double pumping is great approach, this is just more problems.
 

OneEng2

Senior member
Sep 19, 2022
385
590
106
This is wrong on so many levels I'm gonna weep.

Sure does, Granite Ridge still shares the core CCD design with Turin Classic.
Please explain the price pressure you feel AMD and Intel feel in DC chips. Intel just released Granite Rapids which underperforms Turin by a significant margin and still is priced significantly higher. Can you provide some evidence to the contrary?

Good point on Granite Ridge, but I am unsure if this will continue going forward with Zen 6.
 

gdansk

Diamond Member
Feb 8, 2011
3,768
6,015
136
Double pumping doesn't reduce the size of the registers. Double pumped AVX-512 requires the full 512 bit registers.
Does it? I thought they paired two registers together and presented it as ZMM. Since they have way more than 32 registers internally - for rename - I'm not sure.
 

DrMrLordX

Lifer
Apr 27, 2000
22,369
12,175
136
Yeah... is it a good thing though?

Maybe. Unification of CCD design between desktop and server meant that desktop would always be dragged forward into the future no matter what, but brought with it power consumption penalties and GMI link limitations. Separating the two gives AMD more opportunities to neglect desktop if they feel it is to their advantage. Or we may just get more-efficient design that target end users better than repurposed workstation parts.

It doesn't today with Turin D which has a bunch of N3E based 16 core CCDs.
The core design is the same. Regardless, Turin does share CCDs with Granite Ridge, so (for now) the legacy continues.
 

GTracing

Senior member
Aug 6, 2021
276
645
106
Does it? I thought they paired two registers together and presented it as ZMM. Since they have way more than 32 registers internally I'm not sure how it would be distinguishable anyway.

[with double pumping] they can implement the 16 x 256-bit registers of AVX2, but only need to implement 128-bit floating point and integer units. This comes with a cost: while you save on die space and power, some workloads may see significant performance regressions.
 

gdansk

Diamond Member
Feb 8, 2011
3,768
6,015
136
So it must be. But the general idea remains. 2x execution units but double pumped allows more throughput in existing software without hurting AVX-512 throughput (it would hurt latency). And since they'll have more logic transistors to play with it seems like that path forward for AMD (unless it totally explodes scheduling and register file ports)
 

OneEng2

Senior member
Sep 19, 2022
385
590
106
Thanks!

The part I particularly want to point out is this one:

In other words, you should expect implementations to use AVX10.N with 512-bit vectors. This ensures that any existing AVX-512 code is fully supported, and continues the legacy of backward compatibility for x86_64.
CSPs literally make their own (heavily subsidized by ARM) stuff lmao.

>list prices
you can't be this new can you
... and you believe that ARM competes with high end x86 servers today? Also, creating your own CPU costs lots and lots of money. Unless you make a metric crap ton of them, it doesn't pay off.

Yes, list prices. It is pointless to discuss it any other way. There is no way to know how OEM deals are structured between companies and how much CPU unit price is agreed on.

You should have more class than to belittle someone as you did. I am not, in fact, "new" at all and have been a PC enthusiast since the TRS80 was introduced. Show a little respect. These are all interesting conversations to have, and disagreements can be had without disrespect.
 

Thunder 57

Diamond Member
Aug 19, 2007
3,283
5,389
136
Thanks!

The part I particularly want to point out is this one:

In other words, you should expect implementations to use AVX10.N with 512-bit vectors. This ensures that any existing AVX-512 code is fully supported, and continues the legacy of backward compatibility for x86_64.

... and you believe that ARM competes with high end x86 servers today? Also, creating your own CPU costs lots and lots of money. Unless you make a metric crap ton of them, it doesn't pay off.

Yes, list prices. It is pointless to discuss it any other way. There is no way to know how OEM deals are structured between companies and how much CPU unit price is agreed on.

You should have more class than to belittle someone as you did. I am not, in fact, "new" at all and have been a PC enthusiast since the TRS80 was introduced. Show a little respect. These are all interesting conversations to have, and disagreements can be had without disrespect.

I had to look up the TRS80. I don't even think my parents had met when it came out, let alone me existing.

It would be nice if people would be a bit more respectful here. This isn't Reddit or WCCFTech and I think most would like to keep at that way. Sorry for the OT.
 

reaperrr3

Member
May 31, 2024
55
188
66
You're not getting that either. Double wide stuff is squarely for -halo.
Makes me wonder if boards with soldered-on Medusa-Halo will come to desktop anyway, as literal premium halo product.

Because if there's no downsides to core count, core clocks, V-Cache etc. and twice the memory bandwidth, I can absolutely see some enthusiasts and semi-professionals being willing to pay 1K+ if that gets them twice the RAM bandwidth and a beefy IGP.
 
Reactions: Tlh97 and Win2012R2

Joe NYC

Platinum Member
Jun 26, 2021
2,790
4,103
106
they did that. Just uh, not in client. You guys are too poor for SoIC everywhere.

I have not noticed before, but it looks like AMD has a higher ASP on the DIY side than Intel. Primarily due to SoIC chips.

BTW, there was a time, just past the Zen 3 release, when Lisa said that she wants to make AMD into a premium brand. That was a little short lived, after:
- Alder Lake release
- delay of 5800x3d way past Alder Lake release
- Rembrandt debacle (partially caused by DDR5 only choice at the time of weak market)

Maybe AMD is trying again in client market - with Zen 5 V-Cache, Strix Halo.

If AMD sells required enough high ASP parts, more products with SoIC will become possible. One obvious such part would be Strix Halo with full graphics, one CCD and V-Cache - for gaming Mini PCs

 

MS_AT

Senior member
Jul 15, 2024
449
972
96
They can abandon 512 bit registers and add more 256 bit execution units. This would allow increased throughput for 128/256 bit operations and still have the same throughput (but with more latency) for 512 bit operations. But I think it takes more transistors overall.
The register file is already partitioned on Strix, meaning that there are more 256b entries than 512b entries available for rename. The cleverness of Zen4/Zen5 half-rate 512b implementation depends on the fact that they are tracking the instruction as single uOP through the pipeline and the split is only happening at the execution unit, in contrast with Zen1 implementation of AVX2, where AVX2 ops were internally split into 2 128b uops and were carried out as if they were 2 separate instructions. So in theory if I understood your proposal it should work, but for some reason AMD decided not to go with that route. I mean it might be that with AVX2 you would not be able to make efficient use of those execution units due to architectural register limit. Or that the control structures for the additional execution units were more trouble than they are worth.
 

fastandfurious6

Senior member
Jun 1, 2024
305
398
96
I have not noticed before, but it looks like AMD has a higher ASP on the DIY side than Intel. Primarily due to SoIC chips.

BTW, there was a time, just past the Zen 3 release, when Lisa said that she wants to make AMD into a premium brand. That was a little short lived, after:
- Alder Lake release
- delay of 5800x3d way past Alder Lake release
- Rembrandt debacle (partially caused by DDR5 only choice at the time of weak market)

Maybe AMD is trying again in client market - with Zen 5 V-Cache, Strix Halo.

If AMD sells required enough high ASP parts, more products with SoIC will become possible. One obvious such part would be Strix Halo with full graphics, one CCD and V-Cache - for gaming Mini PCs


yeah alder lake literally saved intel, the last 'fast' architecture from them and they have nothing significantly 'better' even in the pipeline... arrow lake is par with alder even worse in some cases

amd is already working on zen 6 and it will be significantly faster than zen 5
 
Reactions: Joe NYC

Nothingness

Diamond Member
Jul 3, 2013
3,183
2,233
136
CSPs literally make their own (heavily subsidized by ARM) stuff lmao.
Do you really think Arm has the money to heavily subsidize hyperscalers? They might get some interesting reduction, but it's unlikely to reach the level Intel can afford. Anyway lower cost is only part of the reason why Arm makes inroads in that market (TCO goes beyond chip price, flexibility to add accelerators on SoC, etc.).
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |