Question Zen 6 Speculation Thread

adroc_thurston · Wednesday at 5:36 PM

OneEng2 said:
If AMD can compete with N3P on client, why would they pay for N2?

Cuz it's better.

Glo. · Wednesday at 5:43 PM

DisEnchantment said:
Of course having lots of VRAM would be better than having CXL backed VRAM.

Unified Memory rules.

MS_AT · Wednesday at 6:23 PM

DisEnchantment said:
The concept is much simpler. In CXL.mem, the memory of the CXL device can be mapped into another, so it appears as memory in its address space

Do you need CXL for that? I think in the nvidia control panel you can set an option that is functionally equivalent, probably in llama.cpp they document what to set so when you run a model too large for a gpu it will not report out of memory exception but rather spill to system memory, but I would need to dig through the docu to double check if I am not hallucinating anything.

DisEnchantment · Wednesday at 6:30 PM

Glo. said:
Unified Memory rules.

Currently there is no cheap but fast solution

1024b wide APU to attach 1TB+ needed for the like of full blown DeepSeek Models ❌ not AMD style, too small GPU on APU
MI325X style interconnect to get 768GB ❌ too expensive

before CXL can be bottlenecked by the PCIe5 x16 BW of 256 GB/s, the system RAM bottleneck is already hit 128GB/s on 8000MT/s DDR5.
Need wider buses
Poor Man LLM is obviously poor in performance

MS_AT said:
Do you need CXL for that? I think in the nvidia control panel you can set an option that is functionally equivalent, probably in llama.cpp they document what to set so when you run a model too large for a gpu it will not report out of memory exception but rather spill to system memory, but I would need to dig through the docu to double check if I am not hallucinating anything.

CXL allows normal memory mapping without no software tricks, works for any GPU workload

On MI300C/X with matching Turin it should spill normally to RAM (it has IF 4.0), if not fitting on HBM cache. It has Unified Memory after all.

Do you have a link someone is running this on DT CPUs with Client GPUs?

All I have seen is purely CPU or running super distilled stuff

MS_AT · Wednesday at 6:54 PM

DisEnchantment said:
Do you have a link someone is running this on DT CPUs with Client GPUs?

https://videocardz.com/newz/nvidia-introduces-system-memory-fallback-feature-for-stable-diffusion

This is what I have been able to find quickly, but I have not tried to check cuda docu to see what are the limitations. I guess CXL is much more flexible.

DisEnchantment · Wednesday at 7:10 PM

MS_AT said:
This is what I have been able to find quickly, but I have not tried to check cuda docu to see what are the limitations. I guess CXL is much more flexible.

Guilty of being exclusively red .

branch_suggestion · Wednesday at 9:34 PM

OneEng2 said:
Not sure I agree that N2 is worth the cost bump over N3 for all applications. If AMD can compete with N3P on client, why would they pay for N2?

Every other client SoC of worth is using N2 in some form in the same timeframe.
N3 is N20, N2 is N16, it is betterer and gooder with no downsides, perf/$ is at least equal.

Joe NYC · Wednesday at 11:01 PM

DisEnchantment said:
Of course having lots of VRAM would be better than having CXL backed VRAM.
But it does not swap. The memory access at any address mapped to another device is simply routed, It is just like any NUMA access.
This is not me explaining the standard is like that.

Also most competitive models these days use MoE and the entire model is not activated. Case in point is DeepSeek. There are hotspots but you dont need to access everything, it is just that the entire model has to be loaded, a fraction of the entire model depending upon the expert being engaged gets accessed

The main advantage is being able to load the full model without getting distilled to oblivion.

I wonder if, in the long run, VRAM loses out to LPDDR.

With big disadvantages in
- Cost
- chip size
- power consumption
- scale of production

It is hard to believe GDDR is still in the game...

DrMrLordX · Thursday at 5:04 AM

MS_AT said:
Do you need CXL for that? I think in the nvidia control panel you can set an option that is functionally equivalent, probably in llama.cpp they document what to set so when you run a model too large for a gpu it will not report out of memory exception but rather spill to system memory, but I would need to dig through the docu to double check if I am not hallucinating anything.

NV has had proprietary implementations of a UVM since before the AI craze started. Though I was under the impression that you wanted/needed NVLink in hardware to make it work.

mikegg · Thursday at 5:34 AM

OneEng2 said:
Ahh... I am the head of a software department (and hardware). Exactly what percentage of computer sales do you think people like you and I represent?

In my office, we have mostly people using windows PC's .... even in development. Note, we are heavily embedded vs cloud and mobile apps.

Mobile group does use mostly Mac (because Apple is such a PITA and won't cross compile like EVERY other OS on the planet). THIS and THIS alone is why our mobile developers use a Mac. Cloud group is a mix of Mac, Windows and Linux. Embedded is 100% PC and Client computing is about 80% Windows.

Still, the VAST majority of business computer users are Windows. Look it up (around 85% IIRC).

I didn't say anything about total mac vs Windows business market share. All I said was that software development on Windows sucks.

Win2012R2 · Thursday at 6:08 AM

MS_AT said:
Do you need CXL for that? I think in the nvidia control panel you can set an option that is functionally equivalent

AFAIK (could be wrong here) that should require CPU to be used, where as with CXL device can talk to it directly bypassing CPU completely.

poke01 · Thursday at 6:53 AM

N2 is a must for client Zen6. Too much of comp will have serious IP come late 2026.

Glo. · Thursday at 7:43 AM

DisEnchantment said:
Currently there is no cheap but fast solution

1024b wide APU to attach 1TB+ needed for the like of full blown DeepSeek Models ❌ not AMD style, too small GPU on APU
MI325X style interconnect to get 768GB ❌ too expensive

before CXL can be bottlenecked by the PCIe5 x16 BW of 256 GB/s, the system RAM bottleneck is already hit 128GB/s on 8000MT/s DDR5.
Need wider buses
Poor Man LLM is obviously poor in performance

CXL allows normal memory mapping without no software tricks, works for any GPU workload

On MI300C/X with matching Turin it should spill normally to RAM (it has IF 4.0), if not fitting on HBM cache. It has Unified Memory after all.

Do you have a link someone is running this on DT CPUs with Client GPUs?

All I have seen is purely CPU or running super distilled stuff

I do not believe there will EVER be solution that will be "cheap" and "fast".

EOT.

OneEng2 · Thursday at 9:11 AM

adroc_thurston said:
Cuz it's better.

branch_suggestion said:
Every other client SoC of worth is using N2 in some form in the same timeframe.
N3 is N20, N2 is N16, it is betterer and gooder with no downsides, perf/$ is at least equal.

If AMD felt this strategy was a good path, Zen 5 desktop would be on N3B, not N4P.

Since this is a "speculation" thread, my "speculation" is that AMD will NOT spend the cash and risk production volume limitations with N2 for client on Zen 6.

Win2012R2 · Thursday at 9:17 AM

OneEng2 said:
If AMD felt this strategy was a good path, Zen 5 desktop would be on N3B

N3B was delayed (and was crap too), they had to derisk and change to N4, that's most likely why Zen 5 fallen behind expected gains, it would have been great on N3E

Plus Intel's 18A might be competitive next time around

adroc_thurston · Thursday at 10:37 AM

OneEng2 said:
If AMD felt this strategy was a good path, Zen 5 desktop would be on N3B, not N4P

N3b is silly and uneconomical.
N2 isn't risk in H2'26 either way.

yuri69 · Thursday at 10:42 AM

But how about Zen5c which supposedly uses N3E?

Win2012R2 · Thursday at 10:59 AM

yuri69 said:
But how about Zen5c which supposedly uses N3E?

C design had to use N3 as otherwise it won't be dense enough, since it had to be exactly the same arch design they could not implement new stuff in it vs N4 version, they shrunk what they had on hand, which was design that had to work on N4.

adroc_thurston · Thursday at 11:46 AM

yuri69 said:
But how about Zen5c which supposedly uses N3E?

Favela parts have more comp pressure from ARM communists.

OneEng2 · Thursday at 9:04 PM

Win2012R2 said:
N3B was delayed (and was crap too), they had to derisk and change to N4, that's most likely why Zen 5 fallen behind expected gains, it would have been great on N3E

Plus Intel's 18A might be competitive next time around

Possibly do, but AMD is currently more than competitive.

If you assume that Intel's transistor budget going from N3B to 18A will be about the same as AMD will get going from N4P to N3P, seems to me like AMD would have a better than average chance of maintaining their performance lead while also having a substantially lower cost.

Also, AMD is certainly going to have N2 compute chiplets with 32 cores giving them a clear path to a desktop CPU having 12 P cores and 32 E cores (44 total wit 88 threads). Pretty easy stitch job for the IOD.

Seems like a pretty great compromise and cost balance.

adroc_thurston · Thursday at 9:06 PM

OneEng2 said:
If you assume that Intel's transistor budget going from N3B to 18A will be about the same as AMD will get going from N4P to N3P, seems to me like AMD would have a better than average chance of maintaining their performance lead while also having a substantially lower cost

They do not care about Intel.

OneEng2 said:
Also, AMD is certainly going to have N2 compute chiplets with 32 cores giving them a clear path to a desktop CPU having 12 P cores and 32 E cores (44 total wit 88 threads). Pretty easy stitch job for the IOD.

Venice-D CCD is, for, well, Venice-D.
They're not doing dense spam on DT.

branch_suggestion · Thursday at 10:34 PM

OneEng2 said:
Possibly do, but AMD is currently more than competitive.

For now, 2026 will have quite a bit of bleeding due to the comp roadmap being offset.

OneEng2 said:
If you assume that Intel's transistor budget going from N3B to 18A will be about the same as AMD will get going from N4P to N3P, seems to me like AMD would have a better than average chance of maintaining their performance lead while also having a substantially lower cost.

Glymur and N1X are the parts AMD has to snuff out with Medusa, Intel is in the rear view mirror.

OneEng2 said:
Also, AMD is certainly going to have N2 compute chiplets with 32 cores giving them a clear path to a desktop CPU having 12 P cores and 32 E cores (44 total wit 88 threads). Pretty easy stitch job for the IOD.

Dense CCDs will never be in client, only cheap mono parts.
Ironically the Venice-D CCD is the priciest CCD ever.

OneEng2 said:
Seems like a pretty great compromise and cost balance.

Enough big boy cores is enough, want more than 48 threads, you need more memory channels.
Just buy Threadripper.

LightningZ71 · Friday at 12:42 AM

It's too bad that AMD can't do Strix Point refresh on a more recent node. N3e should be a bit less expensive by then and with fin flex, they should be able to manage a decent clock speed improvement all around.

Joe NYC · Friday at 1:01 AM

LightningZ71 said:
It's too bad that AMD can't do Strix Point refresh on a more recent node. N3e should be a bit less expensive by then and with fin flex, they should be able to manage a decent clock speed improvement all around.

Since AMD made a decision to abandon the monolithic die for the next gen Medusa Point, it makes a lot more sense to put all the effort into Medusa Point IOD (SoC) instead.

Imagine AMD did that, and Medusa Point SoC became a reality in Q4 2025. That 2 options would present itself:
- if Zen 6 is early, in H1 2026, then, release as planned
- if Zen 6 is late, in H2 2026, then release Medusa Point SoC + Strix Halo CCD (assuming the connection is compatible.

poke01 · Friday at 2:47 AM

branch_suggestion said:
Glymur and N1X are the parts AMD has to snuff out with Medusa, Intel is in the rear view mirror.

N1X isn’t a threat IMO, it comes with stock ARM cores from 2024.

Glymur at the least CPU side is going to very good but it won’t come out till H12026. Medusa will likely launch mid by 2026.

Question Zen 6 Speculation Thread

Diamond Member

Diamond Member

Senior member

Golden Member

Senior member

Golden Member

Senior member

Platinum Member

Lifer

Golden Member

Senior member

Diamond Member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Platinum Member

Platinum Member

Diamond Member