Question Zen 6 Speculation Thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

MS_AT

Senior member
Jul 15, 2024
534
1,124
96
The concept is much simpler. In CXL.mem, the memory of the CXL device can be mapped into another, so it appears as memory in its address space
Do you need CXL for that? I think in the nvidia control panel you can set an option that is functionally equivalent, probably in llama.cpp they document what to set so when you run a model too large for a gpu it will not report out of memory exception but rather spill to system memory, but I would need to dig through the docu to double check if I am not hallucinating anything.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,770
6,722
136
Unified Memory rules.
Currently there is no cheap but fast solution

1024b wide APU to attach 1TB+ needed for the like of full blown DeepSeek Models ❌ not AMD style, too small GPU on APU
MI325X style interconnect to get 768GB ❌ too expensive

before CXL can be bottlenecked by the PCIe5 x16 BW of 256 GB/s, the system RAM bottleneck is already hit 128GB/s on 8000MT/s DDR5.
Need wider buses
Poor Man LLM is obviously poor in performance

Do you need CXL for that? I think in the nvidia control panel you can set an option that is functionally equivalent, probably in llama.cpp they document what to set so when you run a model too large for a gpu it will not report out of memory exception but rather spill to system memory, but I would need to dig through the docu to double check if I am not hallucinating anything.
CXL allows normal memory mapping without no software tricks, works for any GPU workload

On MI300C/X with matching Turin it should spill normally to RAM (it has IF 4.0), if not fitting on HBM cache. It has Unified Memory after all.

Do you have a link someone is running this on DT CPUs with Client GPUs?

All I have seen is purely CPU or running super distilled stuff
 

branch_suggestion

Senior member
Aug 4, 2023
611
1,330
96
Not sure I agree that N2 is worth the cost bump over N3 for all applications. If AMD can compete with N3P on client, why would they pay for N2?
Every other client SoC of worth is using N2 in some form in the same timeframe.
N3 is N20, N2 is N16, it is betterer and gooder with no downsides, perf/$ is at least equal.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,913
4,284
106
Of course having lots of VRAM would be better than having CXL backed VRAM.
But it does not swap. The memory access at any address mapped to another device is simply routed, It is just like any NUMA access.
This is not me explaining the standard is like that.

Also most competitive models these days use MoE and the entire model is not activated. Case in point is DeepSeek. There are hotspots but you dont need to access everything, it is just that the entire model has to be loaded, a fraction of the entire model depending upon the expert being engaged gets accessed

The main advantage is being able to load the full model without getting distilled to oblivion.

I wonder if, in the long run, VRAM loses out to LPDDR.

With big disadvantages in
- Cost
- chip size
- power consumption
- scale of production

It is hard to believe GDDR is still in the game...
 

DrMrLordX

Lifer
Apr 27, 2000
22,479
12,335
136
Do you need CXL for that? I think in the nvidia control panel you can set an option that is functionally equivalent, probably in llama.cpp they document what to set so when you run a model too large for a gpu it will not report out of memory exception but rather spill to system memory, but I would need to dig through the docu to double check if I am not hallucinating anything.
NV has had proprietary implementations of a UVM since before the AI craze started. Though I was under the impression that you wanted/needed NVLink in hardware to make it work.
 

mikegg

Golden Member
Jan 30, 2010
1,883
495
136
Ahh... I am the head of a software department (and hardware). Exactly what percentage of computer sales do you think people like you and I represent?

In my office, we have mostly people using windows PC's .... even in development. Note, we are heavily embedded vs cloud and mobile apps.

Mobile group does use mostly Mac (because Apple is such a PITA and won't cross compile like EVERY other OS on the planet). THIS and THIS alone is why our mobile developers use a Mac. Cloud group is a mix of Mac, Windows and Linux. Embedded is 100% PC and Client computing is about 80% Windows.

Still, the VAST majority of business computer users are Windows. Look it up (around 85% IIRC).
I didn't say anything about total mac vs Windows business market share. All I said was that software development on Windows sucks.
 

Win2012R2

Senior member
Dec 5, 2024
748
756
96
Do you need CXL for that? I think in the nvidia control panel you can set an option that is functionally equivalent
AFAIK (could be wrong here) that should require CPU to be used, where as with CXL device can talk to it directly bypassing CPU completely.
 

Glo.

Diamond Member
Apr 25, 2015
5,898
4,937
136
Currently there is no cheap but fast solution

1024b wide APU to attach 1TB+ needed for the like of full blown DeepSeek Models ❌ not AMD style, too small GPU on APU
MI325X style interconnect to get 768GB ❌ too expensive

before CXL can be bottlenecked by the PCIe5 x16 BW of 256 GB/s, the system RAM bottleneck is already hit 128GB/s on 8000MT/s DDR5.
Need wider buses
Poor Man LLM is obviously poor in performance


CXL allows normal memory mapping without no software tricks, works for any GPU workload

On MI300C/X with matching Turin it should spill normally to RAM (it has IF 4.0), if not fitting on HBM cache. It has Unified Memory after all.

Do you have a link someone is running this on DT CPUs with Client GPUs?

All I have seen is purely CPU or running super distilled stuff
I do not believe there will EVER be solution that will be "cheap" and "fast".

EOT.
 

OneEng2

Senior member
Sep 19, 2022
456
681
106
Cuz it's better.

Every other client SoC of worth is using N2 in some form in the same timeframe.
N3 is N20, N2 is N16, it is betterer and gooder with no downsides, perf/$ is at least equal.
If AMD felt this strategy was a good path, Zen 5 desktop would be on N3B, not N4P.

Since this is a "speculation" thread, my "speculation" is that AMD will NOT spend the cash and risk production volume limitations with N2 for client on Zen 6.
 
Reactions: yuri69

Win2012R2

Senior member
Dec 5, 2024
748
756
96
But how about Zen5c which supposedly uses N3E?
C design had to use N3 as otherwise it won't be dense enough, since it had to be exactly the same arch design they could not implement new stuff in it vs N4 version, they shrunk what they had on hand, which was design that had to work on N4.
 

OneEng2

Senior member
Sep 19, 2022
456
681
106
N3B was delayed (and was crap too), they had to derisk and change to N4, that's most likely why Zen 5 fallen behind expected gains, it would have been great on N3E

Plus Intel's 18A might be competitive next time around
Possibly do, but AMD is currently more than competitive.

If you assume that Intel's transistor budget going from N3B to 18A will be about the same as AMD will get going from N4P to N3P, seems to me like AMD would have a better than average chance of maintaining their performance lead while also having a substantially lower cost.

Also, AMD is certainly going to have N2 compute chiplets with 32 cores giving them a clear path to a desktop CPU having 12 P cores and 32 E cores (44 total wit 88 threads). Pretty easy stitch job for the IOD.

Seems like a pretty great compromise and cost balance.
 

adroc_thurston

Diamond Member
Jul 2, 2023
5,269
7,392
96
If you assume that Intel's transistor budget going from N3B to 18A will be about the same as AMD will get going from N4P to N3P, seems to me like AMD would have a better than average chance of maintaining their performance lead while also having a substantially lower cost
They do not care about Intel.
Also, AMD is certainly going to have N2 compute chiplets with 32 cores giving them a clear path to a desktop CPU having 12 P cores and 32 E cores (44 total wit 88 threads). Pretty easy stitch job for the IOD.
Venice-D CCD is, for, well, Venice-D.
They're not doing dense spam on DT.
 
Reactions: Tlh97 and Joe NYC

branch_suggestion

Senior member
Aug 4, 2023
611
1,330
96
Possibly do, but AMD is currently more than competitive.
For now, 2026 will have quite a bit of bleeding due to the comp roadmap being offset.
If you assume that Intel's transistor budget going from N3B to 18A will be about the same as AMD will get going from N4P to N3P, seems to me like AMD would have a better than average chance of maintaining their performance lead while also having a substantially lower cost.
Glymur and N1X are the parts AMD has to snuff out with Medusa, Intel is in the rear view mirror.
Also, AMD is certainly going to have N2 compute chiplets with 32 cores giving them a clear path to a desktop CPU having 12 P cores and 32 E cores (44 total wit 88 threads). Pretty easy stitch job for the IOD.
Dense CCDs will never be in client, only cheap mono parts.
Ironically the Venice-D CCD is the priciest CCD ever.
Seems like a pretty great compromise and cost balance.
Enough big boy cores is enough, want more than 48 threads, you need more memory channels.
Just buy Threadripper.
 
Reactions: inquiss

LightningZ71

Platinum Member
Mar 10, 2017
2,067
2,508
136
It's too bad that AMD can't do Strix Point refresh on a more recent node. N3e should be a bit less expensive by then and with fin flex, they should be able to manage a decent clock speed improvement all around.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,913
4,284
106
It's too bad that AMD can't do Strix Point refresh on a more recent node. N3e should be a bit less expensive by then and with fin flex, they should be able to manage a decent clock speed improvement all around.

Since AMD made a decision to abandon the monolithic die for the next gen Medusa Point, it makes a lot more sense to put all the effort into Medusa Point IOD (SoC) instead.

Imagine AMD did that, and Medusa Point SoC became a reality in Q4 2025. That 2 options would present itself:
- if Zen 6 is early, in H1 2026, then, release as planned
- if Zen 6 is late, in H2 2026, then release Medusa Point SoC + Strix Halo CCD (assuming the connection is compatible.
 
Reactions: bearmoo

poke01

Diamond Member
Mar 8, 2022
3,330
4,583
106
Glymur and N1X are the parts AMD has to snuff out with Medusa, Intel is in the rear view mirror.
N1X isn’t a threat IMO, it comes with stock ARM cores from 2024.

Glymur at the least CPU side is going to very good but it won’t come out till H12026. Medusa will likely launch mid by 2026.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |