Question Zen 6 Speculation Thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

MS_AT

Senior member
Jul 15, 2024
525
1,107
96
The concept is much simpler. In CXL.mem, the memory of the CXL device can be mapped into another, so it appears as memory in its address space
Do you need CXL for that? I think in the nvidia control panel you can set an option that is functionally equivalent, probably in llama.cpp they document what to set so when you run a model too large for a gpu it will not report out of memory exception but rather spill to system memory, but I would need to dig through the docu to double check if I am not hallucinating anything.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,769
6,709
136
Unified Memory rules.
Currently there is no cheap but fast solution

1024b wide APU to attach 1TB+ needed for the like of full blown DeepSeek Models ❌ not AMD style, too small GPU on APU
MI325X style interconnect to get 768GB ❌ too expensive

before CXL can be bottlenecked by the PCIe5 x16 BW of 256 GB/s, the system RAM bottleneck is already hit 128GB/s on 8000MT/s DDR5.
Need wider buses
Poor Man LLM is obviously poor in performance

Do you need CXL for that? I think in the nvidia control panel you can set an option that is functionally equivalent, probably in llama.cpp they document what to set so when you run a model too large for a gpu it will not report out of memory exception but rather spill to system memory, but I would need to dig through the docu to double check if I am not hallucinating anything.
CXL allows normal memory mapping without no software tricks, works for any GPU workload

On MI300C/X with matching Turin it should spill normally to RAM (it has IF 4.0), if not fitting on HBM cache. It has Unified Memory after all.

Do you have a link someone is running this on DT CPUs with Client GPUs?

All I have seen is purely CPU or running super distilled stuff
 

branch_suggestion

Senior member
Aug 4, 2023
607
1,319
96
Not sure I agree that N2 is worth the cost bump over N3 for all applications. If AMD can compete with N3P on client, why would they pay for N2?
Every other client SoC of worth is using N2 in some form in the same timeframe.
N3 is N20, N2 is N16, it is betterer and gooder with no downsides, perf/$ is at least equal.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,906
4,270
106
Of course having lots of VRAM would be better than having CXL backed VRAM.
But it does not swap. The memory access at any address mapped to another device is simply routed, It is just like any NUMA access.
This is not me explaining the standard is like that.

Also most competitive models these days use MoE and the entire model is not activated. Case in point is DeepSeek. There are hotspots but you dont need to access everything, it is just that the entire model has to be loaded, a fraction of the entire model depending upon the expert being engaged gets accessed

The main advantage is being able to load the full model without getting distilled to oblivion.

I wonder if, in the long run, VRAM loses out to LPDDR.

With big disadvantages in
- Cost
- chip size
- power consumption
- scale of production

It is hard to believe GDDR is still in the game...
 

DrMrLordX

Lifer
Apr 27, 2000
22,472
12,324
136
Do you need CXL for that? I think in the nvidia control panel you can set an option that is functionally equivalent, probably in llama.cpp they document what to set so when you run a model too large for a gpu it will not report out of memory exception but rather spill to system memory, but I would need to dig through the docu to double check if I am not hallucinating anything.
NV has had proprietary implementations of a UVM since before the AI craze started. Though I was under the impression that you wanted/needed NVLink in hardware to make it work.
 

mikegg

Golden Member
Jan 30, 2010
1,881
490
136
Ahh... I am the head of a software department (and hardware). Exactly what percentage of computer sales do you think people like you and I represent?

In my office, we have mostly people using windows PC's .... even in development. Note, we are heavily embedded vs cloud and mobile apps.

Mobile group does use mostly Mac (because Apple is such a PITA and won't cross compile like EVERY other OS on the planet). THIS and THIS alone is why our mobile developers use a Mac. Cloud group is a mix of Mac, Windows and Linux. Embedded is 100% PC and Client computing is about 80% Windows.

Still, the VAST majority of business computer users are Windows. Look it up (around 85% IIRC).
I didn't say anything about total mac vs Windows business market share. All I said was that software development on Windows sucks.
 
Reactions: Nothingness

Win2012R2

Senior member
Dec 5, 2024
741
740
96
Do you need CXL for that? I think in the nvidia control panel you can set an option that is functionally equivalent
AFAIK (could be wrong here) that should require CPU to be used, where as with CXL device can talk to it directly bypassing CPU completely.
 

Glo.

Diamond Member
Apr 25, 2015
5,895
4,934
136
Currently there is no cheap but fast solution

1024b wide APU to attach 1TB+ needed for the like of full blown DeepSeek Models ❌ not AMD style, too small GPU on APU
MI325X style interconnect to get 768GB ❌ too expensive

before CXL can be bottlenecked by the PCIe5 x16 BW of 256 GB/s, the system RAM bottleneck is already hit 128GB/s on 8000MT/s DDR5.
Need wider buses
Poor Man LLM is obviously poor in performance


CXL allows normal memory mapping without no software tricks, works for any GPU workload

On MI300C/X with matching Turin it should spill normally to RAM (it has IF 4.0), if not fitting on HBM cache. It has Unified Memory after all.

Do you have a link someone is running this on DT CPUs with Client GPUs?

All I have seen is purely CPU or running super distilled stuff
I do not believe there will EVER be solution that will be "cheap" and "fast".

EOT.
 

OneEng2

Senior member
Sep 19, 2022
451
669
106
Cuz it's better.

Every other client SoC of worth is using N2 in some form in the same timeframe.
N3 is N20, N2 is N16, it is betterer and gooder with no downsides, perf/$ is at least equal.
If AMD felt this strategy was a good path, Zen 5 desktop would be on N3B, not N4P.

Since this is a "speculation" thread, my "speculation" is that AMD will NOT spend the cash and risk production volume limitations with N2 for client on Zen 6.
 
Reactions: yuri69

Win2012R2

Senior member
Dec 5, 2024
741
740
96
If AMD felt this strategy was a good path, Zen 5 desktop would be on N3B
N3B was delayed (and was crap too), they had to derisk and change to N4, that's most likely why Zen 5 fallen behind expected gains, it would have been great on N3E

Plus Intel's 18A might be competitive next time around
 
Reactions: Thibsie

Win2012R2

Senior member
Dec 5, 2024
741
740
96
But how about Zen5c which supposedly uses N3E?
C design had to use N3 as otherwise it won't be dense enough, since it had to be exactly the same arch design they could not implement new stuff in it vs N4 version, they shrunk what they had on hand, which was design that had to work on N4.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |