Question Zen 6 Speculation Thread

Page 77 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

poke01

Diamond Member
Mar 8, 2022
3,301
4,546
106
But do I want to, I can easily pop in a 9950X3D and my gaming PC becomes a workstation. This is the true benefit of PC, the open ecosystem. Something we should keep ahold off.

Good on AMD for creating Zen cause without it, x86 would HAVE been dead. Cause Intel is doing jack sh*t.

You see going with Apple would be fine if they were open and a repair friendly company. Let’s just say there is more the pc ecosystem than just CPU performance.
 

mikegg

Golden Member
Jan 30, 2010
1,881
490
136
Hard disagree (and I use Linux anyway). Windows has WSL2 to run everything Linux at native speeds. Macs don’t run full blown visual studio, nor do they run a bunch of commercial CRMs and other software. None of the games I play are on the Mac. Linux runs more software than the Mac thanks to more ports and also WINE. WINE on Mac is much more limited due to a few reasons.

When I travel for client work, I take my PC. It has full compatibility with all the software I use.

Many of the companies I’ve worked with have transitioned away. Don’t get me wrong , Apple has its niche, but let’s not pretend they are bigger or more relevant than they are.

Regardless, this is a Zen 6 thread, not an Apple one.
Hard disagree. No one in Silicon Valley, where I work, uses a Windows computer for software development. If you agree that the best software comes out of Silicon valley, then you'd agree with me. If not, there's no point in arguing.

WSL2 is much better than native Windows terminal but it's not native. You're going to run into issues. Example, slower file system due to 9P overhead, higher RAM usage, uses a separate virtual network adapter, no native Systemd support, no native GUI app support, compatibiltiy issues with software that use low level Linux kernel features, increased performance requirement because it's literally virtualizing linux.
 

poke01

Diamond Member
Mar 8, 2022
3,301
4,546
106
Hard disagree. No one in Silicon Valley, where I work, uses a Windows computer for software development. If you agree that the best software comes out of Silicon valley, then you'd agree with me. If not, there's no point in arguing.
You know what’s funny, this is actually true. AMDs head of GPU software engineering was using an M4 Max Macbook as his main laptop until the Framework laptop Strix Point laptop came out.

So it’s ridiculous anyone here says Macs cannot be used for software development at all.
 

mikegg

Golden Member
Jan 30, 2010
1,881
490
136
So it’s ridiculous anyone here says Macs cannot be used for software development at all.
Anyone who says that doesn't know what they're talking about. No one uses a Windows computer for software development at any Silicon Valley companies. If you find one, that engineer is actually just using it for Linux.

The only people in Silicon Valley who use Windows are finance people because of Excel.
 

Jan Olšan

Senior member
Jan 12, 2017
506
978
136
Despite considerable changes in Zen 5, the Zen microarchitecture is just getting long in the tooth. Here's how often AMD has started from scratch (first level indicates new designs, second level shows iterations):
  • K5 - 1996
  • K6 - 1997
  • K7 (Athlon) - 1999
  • K8 (Athlon 64) - 2003
    • K10 - 2007
  • Bulldozer - 2011
    • Piledriver - 2012
    • Steamroller - 2014
    • Excavator - 2015
  • Zen - 2017
    • Zen 2 - 2019
    • Zen 3 - 2020
    • Zen 4 - 2022
    • Zen 5 - 2024
    • Zen 6 - 2026
Two things - first K8 was pretty much an iteration of K7, and there was another one between them (Athlon XP gets lumped together with original K7 but core is updated).

Second, I think it is a bit of a reductionist viewpoint and it fails to see a difference in how far the "iterations" go. Is it a tick or a tock? Even Intel's tocks were iterations. Zen 3 and 5 are pretty extensive tocks... (Edit: Actually, Zen 2 / 4 are quite extensive too, certainly closer to the idea of tock than to ticks).

You would probably do better to just treat "development lineage reboots" as an entirely different thing. That is going to be a rare occasion, probably more of an answer to company getting into unforeseen trouble. Having to throw away the uarch lineage and start largely anew is actually likely to be something the engineers want to avoid, because that means you need more time, more resources to succesfully deploy that particular step of the roadmap (and the ones after it too, likely), and you will be fighting various regressions and catching up with all the features of the old lineage.

Something to think about: How many times has Apple done such a reboot, actually, hmmm? Doing that frequently is certainly not part of their success recipe.
 
Last edited:

soresu

Diamond Member
Dec 19, 2014
3,613
2,926
136
Here's how often AMD has started from scratch (first level indicates new designs, second level shows iterations):
  • K5 - 1996
  • K6 - 1997
  • K7 (Athlon) - 1999
  • K8 (Athlon 64) - 2003
    • K10 - 2007
  • Bulldozer - 2011
    • Piledriver - 2012
    • Steamroller - 2014
    • Excavator - 2015
  • Zen - 2017
    • Zen 2 - 2019
    • Zen 3 - 2020
    • Zen 4 - 2022
    • Zen 5 - 2024
    • Zen 6 - 2026
Now that I'm more familiar with its history the K8 derived family looks like this fully expanded:
  • K8 (Athlon 64) - 2003
    • Barcelona (Phenom/Agena) - 2007
    • Shanghai (Phenom II/Deneb) - 2008
    • Istanbul (Phenom II X6/Thuban)
    • Husky (Llano APU)
Not sure about Husky having significant µArch improvements, but Shanghai and Istanbul definitely did.
 
Reactions: Gideon

StefanR5R

Elite Member
Dec 10, 2016
6,319
9,715
136
I hope one thing for Zen 6 platform will come to fruition.
EPYC CXL.mem to come to DT.
Turin can interleave CXL and DDR memory regions already, so ideally GPU CXL device attached to the root can see the memory of the host.

Linux is getting patches for address translation for Zen 5. Hopefully this Zen 6 goes further in this direction

With such a setup we could install 1 TB of DDR on CPU and let the GPU use all of that for some LLMs and other interesting use cases. Can turn your Linux PC to some LLM monster
Will not be the most performant but at least can run something interesting.
Is this something akin to Memory Caching from Apple M4 and M4 architectures?
Do you mean "Dynamic Caching" of the M3/M4 iGPU?
If so, no.
– Apple's Dynamic Caching is about the GPU's own resource management.
– CXL is about data sharing between CPUs and devices. CXL adopts the PCIe physical layer but is running an alternate transaction layer over it, one which in contrast to PCIe's transaction layer notably has a cache coherence protocol baked in. AFAIU: Cache-coherent DMA can be performed over the PCIe transaction layer too, but it is costly and therefore used only for smaller data like commands, status messages, buffer descriptors, and such, whereas all the usual data transfers from and to PCIe devices happen with cache-incoherent DMA. An introductory article to CXL with some slides: servethehome

Now I don't know how LLMs* work to be able to tell whether or not
[host CPU and lots of RAM] <--- CXL ---> [discrete GPU with some VRAM]​
would indeed enable more capable LLMs* than
[host CPU and lots of RAM] <--- PCIe ---> [discrete GPU with some VRAM]​
with the difference between the two that CPU and GPU access the host RAM in cache coherent manner in the CXL case but not in the PCIe case. In either case, the application would have to swap data between RAM and VRAM.
*) or other GPGPU applications

If yes, then the next question would be whether or not CPU vendors and GPU vendors recognize a market potential in CXL-enabled client products.
 
Reactions: Glo. and Joe NYC

DisEnchantment

Golden Member
Mar 3, 2017
1,769
6,709
136
Do you mean "Dynamic Caching" of the M3/M4 iGPU?
If so, no.
– Apple's Dynamic Caching is about the GPU's own resource management.
– CXL is about data sharing between CPUs and devices. CXL adopts the PCIe physical layer but is running an alternate transaction layer over it, one which in contrast to PCIe's transaction layer notably has a cache coherence protocol baked in. AFAIU: Cache-coherent DMA can be performed over the PCIe transaction layer too, but it is costly and therefore used only for smaller data like commands, status messages, buffer descriptors, and such, whereas all the usual data transfers from and to PCIe devices happen with cache-incoherent DMA. An introductory article to CXL with some slides: servethehome

Now I don't know how LLMs* work to be able to tell whether or not
[host CPU and lots of RAM] <--- CXL ---> [discrete GPU with some VRAM]​
would indeed enable more capable LLMs* than
[host CPU and lots of RAM] <--- PCIe ---> [discrete GPU with some VRAM]​
with the difference between the two that CPU and GPU access the host RAM in cache coherent manner in the CXL case but not in the PCIe case. In either case, the application would have to swap data between RAM and VRAM.
*) or other GPGPU applications

If yes, then the next question would be whether or not CPU vendors and GPU vendors recognize a market potential in CXL-enabled client products.
The concept is much simpler. In CXL.mem, the memory of the CXL device can be mapped into another, so it appears as memory in its address space

For instance a GPU of 16GB VRAM, can map another device memory, e.g. the host CPU memory as its own. Let say it can map host's 1 TB as its own and it will have its entire memory of 1TB + 16GB in its addressible region
Shaders can access this entire thing as if it is local VRAM without doing SDMA
Now you can load DeepSeek 670B Model in RAM, and it would appear to GPU as everything local in VRAM

But since it is also an active device, it needs CXL.cache to maintain cache coherency with CPU , e.g., L3. in case there is concurrent access.

Memory expander devices like Samsung's don't need CXL.cache because memory is not doing write operations on its own. Just CXL.mem
 
Reactions: Glo. and Joe NYC

StefanR5R

Elite Member
Dec 10, 2016
6,319
9,715
136
Shaders can access this entire thing as if it is local VRAM without doing SDMA
Now you can load DeepSeek 670B Model in RAM, and it would appear to GPU as everything local in VRAM
Logically, yes. Performance wise, there is still the large cliff between on-board and off-board memory. I am guessing that a naive dGPU implementation with CXL access might easily end up performing worse than a pure CPU implementation. Hence my thought that a CXL enabled implementation still needs to explicitly "swap" between local and remote RAM.

Edit: Regarding the addressing of host memory, I do admit that I am only familiar with the CPU view on classic PCIe DMA, not with the GPU view.
 
Last edited:
Reactions: Joe NYC

DisEnchantment

Golden Member
Mar 3, 2017
1,769
6,709
136
Logically, yes. Performance wise, there is still the large cliff between on-board and off-board memory. A naive dGPU implementation with CXL access might easily end up performing worse than a pure CPU implementation. Hence my though that a CXL enabled implementation still needs to explicitly "swap" between local and remote RAM.
Of course having lots of VRAM would be better than having CXL backed VRAM.
But it does not swap. The memory access at any address mapped to another device is simply routed, It is just like any NUMA access.
This is not me explaining the standard is like that.

Also most competitive models these days use MoE and the entire model is not activated. Case in point is DeepSeek. There are hotspots but you dont need to access everything, it is just that the entire model has to be loaded, a fraction of the entire model depending upon the expert being engaged gets accessed

The main advantage is being able to load the full model without getting distilled to oblivion.
 
Last edited:
Reactions: Joe NYC

OneEng2

Senior member
Sep 19, 2022
451
669
106
The most expensive node per wafer yielded is 18A, one of many reasons why it has little traction.
N2 is worth the cost bump over N3.
Not sure I agree that N2 is worth the cost bump over N3 for all applications. If AMD can compete with N3P on client, why would they pay for N2?
Silicon Valley. No software engineer here uses a Windows computer. It's 95% Macs and 5% Linux.

Hard disagree. No one in Silicon Valley, where I work, uses a Windows computer for software development. If you agree that the best software comes out of Silicon valley, then you'd agree with me. If not, there's no point in arguing.
Ahh... I am the head of a software department (and hardware). Exactly what percentage of computer sales do you think people like you and I represent?

In my office, we have mostly people using windows PC's .... even in development. Note, we are heavily embedded vs cloud and mobile apps.

Mobile group does use mostly Mac (because Apple is such a PITA and won't cross compile like EVERY other OS on the planet). THIS and THIS alone is why our mobile developers use a Mac. Cloud group is a mix of Mac, Windows and Linux. Embedded is 100% PC and Client computing is about 80% Windows.

Still, the VAST majority of business computer users are Windows. Look it up (around 85% IIRC).
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |