Question Zen 6 Speculation Thread

Page 7 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

marees

Senior member
Apr 28, 2024
420
484
96
Not sure why the hate for multithreading.

For small cores it gets forward movement when one context is stalled for Free*. Nice for real-time, too.

For large cores it gets you high utilization of functional units for Free*.

* Terms and conditions apply, but the costs are pretty small. Marvell called it a 5% area impact for TX3.
Intel gave a reason of power constrained and/or area constrained

I can see consoles going this way, as they are always power constrained

Intel wants to compete with low power ARM APUs. So it makes sense for them. Not sure if it makes sense for AMD
 

gdansk

Platinum Member
Feb 8, 2011
2,894
4,381
136
Not sure why the hate for multithreading.

For small cores it gets forward movement when one context is stalled for Free*. Nice for real-time, too.

For large cores it gets you high utilization of functional units for Free*.

* Terms and conditions apply, but the costs are pretty small. Marvell called it a 5% area impact for TX3.
For the core farms, 5% less area means they can fit more cores. And what percent of those customers are using SMT? I don't know but it seems like a security risk with all these bleeds and breaks.

I think they could discard it for core farm SKUs and it would be a net benefit for most customers.
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
Intel gave a reason of power constrained and/or area constrained

I can see consoles going this way, as they are always power constrained

Intel wants to compete with low power ARM APUs. So it makes sense for them. Not sure if it makes sense for AMD

I saw what Intel said. I thought the way it was put was kind of strange, but the gist seems to have been "we care about ST for these cores, because for MT we have a flock of smaller cores." Works for me - if you have small cores around for throughput loads, the SMT value calculation does change, not least because scheduling gets annoying; with ADL they did the whole "1 Atom core = 1 SMT thread" equivalence, which probably simplified scheduling but is a bit troublesome if it's a performance point that designs are supposed to target.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,475
1,978
136
Not sure why the hate for multithreading.

For small cores it gets forward movement when one context is stalled for Free*. Nice for real-time, too.

For large cores it gets you high utilization of functional units for Free*.

* Terms and conditions apply, but the costs are pretty small. Marvell called it a 5% area impact for TX3.

It reduces the performance available per thread.

If you only consider the CPU, this still seems like a huge win for throughput loads. But an active process requires pretty much the same amount of RAM regardless of how fast it is. Back when DRAM was comparatively cheap and CPUs were expensive, this wasn't a big deal. But today, if you spec out a normal server for a throughput load, >50% of the cost of that server is going to be DRAM. And supporting twice the threads means you have to double the most expensive component on your bom, for >25% extra cost, and you are not getting even 20% extra speed. It just makes more sense to get more cores, to more efficiently utilize the most expensive part of the server.

(None of this applies for workloads that don't use much ram per thread. Congrats, they are super cheap to host now.)

(edit: ) I don't see AMD outright removing SMT2 any time soon, it does help on some workloads. But it's much less important, and clients are increasingly turning it off.
 
Reactions: moinmoin

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
For the core farms, 5% less area means they can fit more cores. And what percent of those customers are using SMT? I don't know but it seems like a security risk with all these bleeds and breaks.

I think they could discard it for core farm SKUs and it would be a net benefit for most customers.

As far as I know, hyperscalers typically do run with SMT. To the best of my knowledge, one "vCPU" on AWS is an SMT thread with x86 parts, for instance.

If we're throwing out the performance baby with the side-channel bathwater, we need to have a talk about branch prediction, OoO, shared caches across multiple cores, etc.
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
It reduces the performance available per thread.

If you only consider the CPU, this still seems like a huge win for throughput loads. But an active process requires pretty much the same amount of RAM regardless of how fast it is. Back when DRAM was comparatively cheap and CPUs were expensive, this wasn't a big deal. But today, if you spec out a normal server for a throughput load, >50% of the cost of that server is going to be DRAM. And supporting twice the threads means you have to double the most expensive component on your bom, for >25% extra cost, and you are not getting even 20% extra speed. It just makes more sense to get more cores, to more efficiently utilize the most expensive part of the server.

(None of this applies for workloads that don't use much ram per thread. Congrats, they are super cheap to host now.)

(edit: ) I don't see AMD outright removing SMT2 any time soon, it does help on some workloads. But it's much less important, and clients are increasingly turning it off.

The same argument would apply against aggressive manycore designs, though. If it's really about maximizing the RAM-to-thread ratio - and I have not seen that as a central element of capacity planning, either for conventional enterprise stuff or EDA flows - everyone should be building hulking monster-cores like Z, not tossing hundreds of cores on a device.
 

Nothingness

Diamond Member
Jul 3, 2013
3,066
2,060
136
As far as I know, hyperscalers typically do run with SMT. To the best of my knowledge, one "vCPU" on AWS is an SMT thread with x86 parts, for instance.

If we're throwing out the performance baby with the side-channel bathwater, we need to have a talk about branch prediction, OoO, shared caches across multiple cores, etc.
Yeah I can confirm that lower cost AWS instances have SMT enabled . In my case, security is not a concern since these instances are secured (read, no one external to my company can run jobs on them), but you get horrible performance. Definitely a showstopper for my needs.
 

StefanR5R

Elite Member
Dec 10, 2016
5,914
8,826
136
I can confirm that lower cost AWS instances have SMT enabled . [...] you get horrible performance. Definitely a showstopper for my needs.
Purely WRT your application performance, in this case it's still better for you to rent VMs with twice the vCPUs with SMT enabled, compared to rent SMT-disabled VMs. Pricing of either variant is another (but not entirely technical) question. Maybe the latter can be had at a better price (I haven't checked) because of lower potential power use.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,479
3,380
106
That's the trajectory but crossover timeline seems off.
Maybe a comboPHY for DT/luggable with two platform at once?
A platform supporting 2x LPDDR5/6 LPCAMM for Strix Halo - as a parallel platform to AM5 - would be interesting.
 

Glo.

Diamond Member
Apr 25, 2015
5,802
4,776
136
A platform supporting 2x LPDDR5/6 LPCAMM for Strix Halo - as a parallel platform to AM5 - would be interesting.
The only platform that it will appear on desktop is in Mac Mini/Mac Studio competition.
 
Reactions: Joe NYC

Doug S

Platinum Member
Feb 8, 2020
2,750
4,681
136
The only platform that it will appear on desktop is in Mac Mini/Mac Studio competition.

I think within a few years desktops will be 100% LPCAMM. If you want traditional DIMMs you'll be on server platforms, or on workstation level desktops that use server CPUs like Intel's Xeon workstations or server CPUs in desktop clothing like Threadripper.

A single LPCAMM that's 192 bits wide (i.e. the high end, there will be narrower ones) is the equivalent of six DDR channels bandwidth wise. Outside of those server class CPUs mentioned above, how many have six channels? Zero, AFAIK. The single LPCAMM will also take up less board space than 6 (let alone 12) DIMM slots. I wouldn't be surprised to see boards in SFF setups install the CAMM on the bottom to further reduce footprint in a way DIMMs cannot.

Almost no one wants a tower case PC. Those have been the standard for years because of the need to fit 3.5" and 5.25" drives, but those are gone. So you can have a Mac Mini type form factor for the PC for the average person, and a Mac Studio like form factor (expanded to a cube) for those who want a discrete GPU. Tower style that aren't the server/workstation type platform PCs still using DIMMs are going to be a niche by the end of the decade if they exist at all.

I wouldn't look for much in the way of options for two LPCAMMs like JoeNYC wants. That requires 384 bits of memory controller - just look at the M3 Pro die shots and double the area devoted to memory controllers - plus maybe more since LPDDR6 is more complex than LPDDR5. Those don't shrink as much as logic does, either, so don't expect much help from N2. That's a lot of area, and that's equivalent to 12 DDR channels. i.e. Threadripper territory.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,479
3,380
106
That's too big and too niche for DT. forget about that.
DT has discrete graphics, proper one.

The memory controller could support 2 of the big channels, there could be pins for 2 channels but a lower end platform would only support 1 channel

OTOH, the same platform, supporting both channels could be a revival of HEDT platform.

But the point would be to move to LPDDR memories for desktops / APUs / NUCs / corporate desktops
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,129
15,275
136
Yeah I can confirm that lower cost AWS instances have SMT enabled . In my case, security is not a concern since these instances are secured (read, no one external to my company can run jobs on them), but you get horrible performance. Definitely a showstopper for my needs.
What CPUs are in those boxes ?
 

adroc_thurston

Diamond Member
Jul 2, 2023
3,545
5,107
96
The memory controller could support 2 of the big channels, there could be pins for 2 channels but a lower end platform would only support 1 channel
Lotta engineering for not much gained.
But the point would be to move to LPDDR memories for desktops / APUs / NUCs / corporate desktops
Can do that with bog standard 128b (then 192b-ish) for L6.
 

Glo.

Diamond Member
Apr 25, 2015
5,802
4,776
136
I think within a few years desktops will be 100% LPCAMM. If you want traditional DIMMs you'll be on server platforms, or on workstation level desktops that use server CPUs like Intel's Xeon workstations or server CPUs in desktop clothing like Threadripper.

A single LPCAMM that's 192 bits wide (i.e. the high end, there will be narrower ones) is the equivalent of six DDR channels bandwidth wise. Outside of those server class CPUs mentioned above, how many have six channels? Zero, AFAIK. The single LPCAMM will also take up less board space than 6 (let alone 12) DIMM slots. I wouldn't be surprised to see boards in SFF setups install the CAMM on the bottom to further reduce footprint in a way DIMMs cannot.

Almost no one wants a tower case PC. Those have been the standard for years because of the need to fit 3.5" and 5.25" drives, but those are gone. So you can have a Mac Mini type form factor for the PC for the average person, and a Mac Studio like form factor (expanded to a cube) for those who want a discrete GPU. Tower style that aren't the server/workstation type platform PCs still using DIMMs are going to be a niche by the end of the decade if they exist at all.

I wouldn't look for much in the way of options for two LPCAMMs like JoeNYC wants. That requires 384 bits of memory controller - just look at the M3 Pro die shots and double the area devoted to memory controllers - plus maybe more since LPDDR6 is more complex than LPDDR5. Those don't shrink as much as logic does, either, so don't expect much help from N2. That's a lot of area, and that's equivalent to 12 DDR channels. i.e. Threadripper territory.
I meant not about LPCAMM, but Strix Halo appearing on desktop as Mac Mini/Mac Studio competition .

In essence, I agree. LPCAMM is the future, because it will simplify manifacturing for desktops and mobile platforms, and allows the use of mobile chips straight up on desktops.

With the way it goes, the direction of AI everywhere - memory bandwidth and NPUs will be a requirement, and the need for competition with Apple/Microsoft will be stronger than anything else, which will drive the adoption of LPCAMM everywhere.

DIY will remain on desktop but as a highest end of highest end.

Everything below - APUs with PCIe expansion, or stuff like Mac Studio, large APUs without internal expansion, on desktops. Thats the direction of market, since desktop in general is dying.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,479
3,380
106
I think within a few years desktops will be 100% LPCAMM. If you want traditional DIMMs you'll be on server platforms, or on workstation level desktops that use server CPUs like Intel's Xeon workstations or server CPUs in desktop clothing like Threadripper.

A single LPCAMM that's 192 bits wide (i.e. the high end, there will be narrower ones) is the equivalent of six DDR channels bandwidth wise. Outside of those server class CPUs mentioned above, how many have six channels? Zero, AFAIK. The single LPCAMM will also take up less board space than 6 (let alone 12) DIMM slots. I wouldn't be surprised to see boards in SFF setups install the CAMM on the bottom to further reduce footprint in a way DIMMs cannot.

That's very interesting, that there will be 192 bit wide version. I don't know who has a device on their road map for this width at this time.

Strix Halo will be 256 bits wide, supporting LPDDR5.

Almost no one wants a tower case PC. Those have been the standard for years because of the need to fit 3.5" and 5.25" drives, but those are gone. So you can have a Mac Mini type form factor for the PC for the average person, and a Mac Studio like form factor (expanded to a cube) for those who want a discrete GPU. Tower style that aren't the server/workstation type platform PCs still using DIMMs are going to be a niche by the end of the decade if they exist at all.

I wouldn't look for much in the way of options for two LPCAMMs like JoeNYC wants. That requires 384 bits of memory controller - just look at the M3 Pro die shots and double the area devoted to memory controllers - plus maybe more since LPDDR6 is more complex than LPDDR5. Those don't shrink as much as logic does, either, so don't expect much help from N2. That's a lot of area, and that's equivalent to 12 DDR channels. i.e. Threadripper territory.

I see now, after a bit of Googling, Binging, Co-Piloting, I see that DDR6 is going to be 192 bits wide instead of 128 wide. That would seem sufficient / workable for Strix Halo like CPU.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,479
3,380
106
LPDDR6 is 24b subchannels for 192b total in a standard quadchannel setup.
DDR6 is the opposite and shrinks the subchannel to 16b from 32b of DDR5 and there you go, still 128b.

That's going to be some nice bandwidth uplift. Looks like LPDDR6 is going to be 2x LPDDR5.

Comparing 256 bits of Strix Halo vsl 192 bit LPCAMM for LPDDR6, it will be:
2 x 256 / 192 = 2 x 4/3 = 1.5

Seems like a bright future for company that can make a competent APU, and not so bright for company selling mobile dGPUs.
 
Reactions: lightmanek

Tuna-Fish

Golden Member
Mar 4, 2011
1,475
1,978
136
That's very interesting, that there will be 192 bit wide version. I don't know who has a device on their road map for this width at this time.

Strix Halo will be 256 bits wide, supporting LPDDR5.

Strix Halo will be old news by the time LPDDR6 will be out in volume. It's a Zen5 product, while even Zen6 is probably too early for LP6.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,479
3,380
106
Strix Halo will be old news by the time LPDDR6 will be out in volume. It's a Zen5 product, while even Zen6 is probably too early for LP6.

Strix Halo is a proof of concept, something to build up on in future versions.

It's not really completely breaking new ground since Apple is already doing it. Building good APUs, replacing GDDR with LPDDR, unified memory
 

inquiss

Member
Oct 13, 2010
186
266
136
Strix Halo is a proof of concept, something to build up on in future versions.

It's not really completely breaking new ground since Apple is already doing it. Building good APUs, replacing GDDR with LPDDR, unified memory
Let's hope it sells well so they find the rest of the roadmap...
 

Doug S

Platinum Member
Feb 8, 2020
2,750
4,681
136
LPDDR6 is 24b subchannels for 192b total in a standard quadchannel setup.
DDR6 is the opposite and shrinks the subchannel to 16b from 32b of DDR5 and there you go, still 128b.

There's no way DDR6 will have 16 bit subchannels. ECC, at least as an option, is a hard requirement for DDR6. You'd take a 50% bit penalty to do ECC in 16 bit chunks. DDR4's 64 bit channels allowed 72 bits to handle ECC. With DDR5 ECC DIMMs went 80 bits wide to handle handle ECC for two channels. I don't see them doubling down on that. Since non-ECC DDR6 DIMMs will be a lot rarer assuming the everything below "high end workstation" using CAMMs, they might end up with non-ECC DIMMs being really high priced due to how little demand there is so I'm not convinced ECC will be optional with DDR6. Maybe it'll be something you can disable in the BIOS, but I'd say there are better than even odds that every DDR6 DIMM is an ECC DIMM.

Now I suppose they could do something funky and fiddle with the burst length to get enough extra bits to do ECC rather than doing a wider data path as with DDR5 (which technically is what they did with LPDDR6) but it would make a lot more sense to make DDR6 DIMMs 96 bits wide in 24 bit wide channels following the LPDDR6 plan.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |