Question Zen 6 Speculation Thread

adroc_thurston · 2025-04-14T22:21:12-0400

Joe NYC said:
I wonder what the packaging technology will be for Venice.

There's a whole mix there.

Joe NYC said:
Very likely, different from client, which means sharing of CCDs will no longer be taking place.

Kinda?

Joe NYC said:
MLID said that on future roadmaps, AMD is not differentiating between classic and dense cores for servers.

sure as hell they are lol.

Joe NYC said:
From the press release (and the tweet), it mentions the chip was "brought up" which I imply being back at AMD and powered on.

Indeed.

gdansk said:
SF 4nm IOd seems likely.

No? No. TSMC is the only foundry with good analog.

gdansk said:
I wonder how many IOd and CCd variants there are planned then.

CCDs not many, other things yeah.

Saylick · 2025-04-14T23:05:59-0400

Joe NYC said:
From the press release (and the tweet), it mentions the chip was "brought up" which I imply being back at AMD and powered on.

So, perhaps, 1 year of debugging, validation and hopefully H1 or mid 2026 launch.

It's targeting Q3 2026, I believe.

adroc_thurston · 2025-04-14T23:28:45-0400

Saylick said:
It's targeting Q3 2026, I believe.

Indeed it does! Probably August, they love August server launches.

branch_suggestion · 2025-04-14T23:33:55-0400

Saylick said:
It's targeting Q3 2026, I believe.

Biggest customers will have supply in Q2 as is tradition.
Q3 is general availability.

dr1337 · 2025-04-15T00:17:55-0400

jpiniero said:
OEMs are pushing AI because Wall Street orgasms at the mere thought of it. Doesn't mean the OEM's actual customers are interested.

Its really the other way around, millions of people are using GPT and other commercially/closed models. Where as most LLM questions/and even lots of image stuff will run just fine on a 16gb GPU from 2020 using distilled models. Meanwhile look at apple AI, they critically depend on GPT simply because they don't have any models or public facing tech of their own.

So in this world where the real OEMs are HP, Dell, Lenovo, ect. And their customers have a pressing need for high bandwidth matrix calculations, and this job can either be done by a giant GPU chip using special GDDR ram, or you can put an NPU into a SOC, save tons of power, use the same ram chips already on the mobo, and still get good enough performance. And if everyone has AI on their machine, it really makes it easy for developers to make AI based apps and programs they can sell to people. This is both microsoft and apple's angle.

Right now it's awful for me to try and run AI on my non-accelerated CPU and I fully have to commit my GPU and lots of power for good performance. Where as NPUs on CPUs really open up the doors to things like games using AI for npcs, or in digital media where the NPU can be used to free up compute for things like effects and transitions. For development it really blows the hinges because it means running one AI into another, locally, suddenly becomes feasible and thats where things really shine. I for one really enjoy using different system prompts on the same model to get more varied results. ie, one programmer bot thinks its a webdev, the other I tell its john carmack, now give me code for X. Would be super nice to get both results at the same time for comparison.

Also look at the GPU market lately, it's not like there's a crypto rush right now, those sales and scalping is legit pressure from the AI market. And if AMD can leverage even a tiny bit of that onto desktop CPU, thats becomes a win for potential volume and margins.

jpiniero · 2025-04-15T01:57:59-0400

dr1337 said:
Its really the other way around, millions of people are using GPT and other commercially/closed models.

To make memes. Which is fun but silly that it's driving the entire financial system right now.

adroc_thurston · 2025-04-15T02:11:07-0400

dr1337 said:
Its really the other way around, millions of people are using GPT and other commercially/closed models

None of that has anything remotely resembling a viable business model.

MS_AT · 2025-04-15T04:15:44-0400

dr1337 said:
NPU into a SOC, save tons of power, use the same ram chips already on the mobo, and still get good enough performance.

NPU is not magical, it will steal MemBW from the system. So your concurrent use cases example is not so rosy. And AMD in their lemonade server is sticking to using NPU on strix point for prompt processing and gpu for token generation citing power efficiency

https://www.reddit.com/r/LocalLLaMA/comments/1jujc9p/comment/mm2rvch

dr1337 said:
I for one really enjoy using different system prompts on the same model to get more varied results. ie, one programmer bot thinks its a webdev, the other I tell its john carmack, now give me code for X. Would be super nice to get both results at the same time for comparison.

Hmm, that should be already doable, just use a batched mode, batch two inputs together, it will actually improve the compute to memBW needs. Unless you want to use 2 different models.

dr1337 · 2025-04-15T04:55:36-0400

jpiniero said:
To make memes. Which is fun but silly that it's driving the entire financial system right now.

Don't companies like adobe and apple make a non-insignificant amount of their money on selling new software? Its really not just people on social media using AI, there's even larger streamers that have an LLM+TTS generator respond to random chatter comments and everyone loves it. Content aware fill has been a thing since like 2015 and people in commercial media still use it all the time.

MS_AT said:
Hmm, that should be already doable, just use a batched mode, batch two inputs together, it will actually improve the compute to memBW needs. Unless you want to use 2 different models.

Takes the same amount of time, you only get so many tokens/iterations per second regardless of how you organize the workload. Idk unless like you're running a bunch of tiny models on a huge accelerator but that's not what I'm talking about. Its much better to have models be as big as possible and thats why you don't see anyone commercially trying to run/sell swarms of 4gb distills.

MS_AT said:
NPU is not magical, it will steal MemBW from the system.

There are instances where its better when the entire system can fully share resources instead of one component dependent on another. The main bottleneck really is the acceleration of the FP instructions in this case. You can even see this in Apple benchmarks, they do have tons of memory bandwidth esp compared to GPUs but the tokens per second isn't on the same level. SOC type NPUs aren't as powerful as the huge cards/OAM designs and that's really all just down to transistor value. But are they are way more efficient than current CPU instruction sets and that makes them desirable.

And in this day in age, the difference between pcie3 and pcie5 is so minimal for GPUs and other devices even like the 5090, that an SOC class NPU wouldn't make a difference for system BW 99.8% of the time.

MS_AT · 2025-04-15T05:22:28-0400

dr1337 said:
Takes the same amount of time, you only get so many tokens/iterations per second regardless of how you organize the workload. Idk unless like you're running a bunch of tiny models on a huge accelerator but that's not what I'm talking about. Its much better to have models be as big as possible and thats why you don't see anyone commercially trying to run/sell swarms of 4gb distills.

Yes, it takes the same amount of time with caveats, but you get your

dr1337 said:
Would be super nice to get both results at the same time for comparison.

I wasn't talking about the absolute time, but if you have a model that fits in your GPU or CPU if you have good membW, then you can get that almost for free.

dr1337 said:
You can even see this in Apple benchmarks, they do have tons of memory bandwidth esp compared to GPUs but the tokens per second isn't on the same level.

Which tokens per second. The generation usually is matching the available memory bandwidth, the prompt processing sucks due to lack of compute. It also matters at what kind of context size you compare. All popular outlets I have seen that started using LLMs in benchmarks fail to mention this.

dr1337 said:
But are they are way more efficient than current CPU instruction sets and that makes them desirable.

I thought we were comparing to GPUs

dr1337 said:
And in this day in age, the difference between pcie3 and pcie5 is so minimal for GPUs and other devices even like the 5090, that an SOC class NPU wouldn't make a difference for system BW 99.8% of the time.

The difference is not minimal, but the BW difference between PCIe5 and GDDR is so big, that everyone tries to keep the workload on the GPU to avoid transfers. And if you wanted to run inference on NPU fully it would need to have access to full memBW. In other words, NPUs make sense only on chips which have already high memBW, as using NPU on a desktop socket might give you better time to first token but won't give you noticeable increase in token generation over what CPU alone could do. And provided you have dGPU in the system that has enough memory to fit the model, NPU is pointless as offloading prompt processing will run into PCIe 5 bottleneck, unless the model is small enough, but if it is small enough, you can run it on GPU

On mobile SoC the situation in small 14 inch models where you cannot fit a dGPU, then things look more favourable for the NPU, especially since you get LPDDR with higher bandwidth to capacity ratio that what you can achieve on desktop sockets and all functional units share the same physical memory.

eek2121 · 2025-04-15T06:58:27-0400

dr1337 said:
Its really the other way around, millions of people are using GPT and other commercially/closed models. Where as most LLM questions/and even lots of image stuff will run just fine on a 16gb GPU from 2020 using distilled models. Meanwhile look at apple AI, they critically depend on GPT simply because they don't have any models or public facing tech of their own.

So in this world where the real OEMs are HP, Dell, Lenovo, ect. And their customers have a pressing need for high bandwidth matrix calculations, and this job can either be done by a giant GPU chip using special GDDR ram, or you can put an NPU into a SOC, save tons of power, use the same ram chips already on the mobo, and still get good enough performance. And if everyone has AI on their machine, it really makes it easy for developers to make AI based apps and programs they can sell to people. This is both microsoft and apple's angle.

Right now it's awful for me to try and run AI on my non-accelerated CPU and I fully have to commit my GPU and lots of power for good performance. Where as NPUs on CPUs really open up the doors to things like games using AI for npcs, or in digital media where the NPU can be used to free up compute for things like effects and transitions. For development it really blows the hinges because it means running one AI into another, locally, suddenly becomes feasible and thats where things really shine. I for one really enjoy using different system prompts on the same model to get more varied results. ie, one programmer bot thinks its a webdev, the other I tell its john carmack, now give me code for X. Would be super nice to get both results at the same time for comparison.

Also look at the GPU market lately, it's not like there's a crypto rush right now, those sales and scalping is legit pressure from the AI market. And if AMD can leverage even a tiny bit of that onto desktop CPU, thats becomes a win for potential volume and margins.

If you think millions of people are using LLMs and paying for the privilege to do it, I have a cheap bridge to sell you.

The highest paid adoption has been in the corporate sector, and that is not even close to “millions”.

Microsoft began forcing the bundling of their AI product with Office365 because they weren’t seeing the additional revenue they were hoping for. They also raised prices of Office365 because of this.

dr1337 said:
Don't companies like adobe and apple make a non-significant amount of their money on selling new software?

Adobe rents software, they don’t sell it. Apple only has a few paid products, they rent the rest.

Win2012R2 · 2025-04-15T07:19:32-0400

eek2121 said:
If you think millions of people are using LLMs and paying for the privilege to do it, I have a cheap bridge to sell you.

They are paying for it by giving out priceless information, far more than search engines were getting

eek2121 · 2025-04-15T07:29:38-0400

Win2012R2 said:
They are paying for it by giving out priceless information, far more than search engines were getting

Not disagreeing with that. 🤣

That info doesn’t really translate easily to tons of revenue, however, and large LLMs are expensive to develop and run.

dr1337 · 2025-04-15T07:46:44-0400

eek2121 said:
Adobe rents software, they don’t sell it. Apple only has a few paid products, they rent the rest.

I did mistype a bit there but I think my point still stands, its their interest to have good creative software. Just because they rent doesn't mean photoshop or final cut is free lol.

eek2121 said:
If you think millions of people are using LLMs and paying for the privilege to do it, I have a cheap bridge to sell you.

The highest paid adoption has been in the corporate sector, and that is not even close to “millions”.

Is that really what you thought I meant? Compared to the nearly 1bn people using openai alone as a huge indicator for the demand of AI? Here in 2025 I'd really think most people would know no product is truly free either.

Heck at this rate there's no way someone isn't training an LLM off these forums and crawling every major site right now, would explain the crazy lag we get sometimes too.

Nothingness · 2025-04-15T09:06:57-0400

eek2121 said:
If you think millions of people are using LLMs and paying for the privilege to do it, I have a cheap bridge to sell you.

ChatGPT has 20 million paying subscribers.

ChatGPT has added 4.5 million paying subscribers since the end of last year, according to The Information. With 20 million subscribers paying for the service, the company is earning “at least $415 million in revenue per month,” a rough estimate that doesn’t account for corporate plans or the new...

www.theverge.com

ChatGPT has added 4.5 million paying subscribers since the end of last year, according to The Information. With 20 million subscribers paying for the service, the company is earning “at least $415 million in revenue per month,” a rough estimate that doesn’t account for corporate plans or the new $200 a month Pro tier. The report adds that this could put OpenAI “well within reach” of its revenue projection of $12.7 billion this year.

Can I get my bridge?

Of course it doesn't mean they are profitable, but yes they have millions of paid subscribers on top of corporate licenses.

adroc_thurston · 2025-04-15T11:45:06-0400

Nothingness said:
Of course it doesn't mean they are profitable

Then it's not a business. Next.

Nothingness · 2025-04-15T14:00:02-0400

adroc_thurston said:
Then it's not a business. Next.

You mean AMD were not a business while they were leaking money? Waiting for your next stupid 3 words statement. Will that be some 40% silly claim?

And that was not my main point anyway.

adroc_thurston · 2025-04-15T14:25:04-0400

Nothingness said:
You mean AMD were not a business while they were leaking money?

Yeah they weren't.

Nothingness said:
And that was not my main point anyway.

The point is simple: ML slop farms do not have a business model to afford all that big iron running 24/7.
Simple enough for you?

madtronik · 2025-04-15T16:14:18-0400

adroc_thurston said:
Yeah they weren't.

The point is simple: ML slop farms do not have a business model to afford all that big iron running 24/7.
Simple enough for you?

Just a note. AFAIK YouTube has NEVER been a business because from its inception it has always lost money. Nowadays doesn't lose so much money as before due to Google making all they can do to break even. It was an objective to break even some years ago but they just gave up and didn't talk more about it. They just talk about YouTube revenues but never about profits.

Count that with IA could happen something similar. Burn tons of money from other profitable business in order to get as much market as possible and in 1-2 decades to try to break even.

fastandfurious6 · 2025-04-15T16:23:21-0400

we are in the era of peak accelerationism

true AI is the holy grail of computing and whoever thinks the opposite is deluded

LLM is a brute-force way towards it but it works. it can already replace two-digit % of thought work / office workforce and its capabilities increase exponentially. entire industries job markets are getting wrecked just because of LLMs

copilot stuff is just marketing/ambitious nonsense, but real fat LLMs can already replace real workers. all the layoffs of last years are largely codriven by LLM impact, also accelerates new projects, new products, new architectures etc

ontopic, my speculation, without LLMs both TSMC N2 and Zen6 would be very delayed for more years

adroc_thurston · 2025-04-15T16:59:38-0400

madtronik said:
AFAIK YouTube has NEVER been a business because from its inception it has always lost money.

It's an accretive play for GOOG's advertisement business plus it's far, far cheaper than anything ML.

madtronik said:
It was an objective to break even some years ago but they just gave up and didn't talk more about it.

Pretty sure they did by making end user experience worse.

madtronik said:
Count that with IA could happen something similar.

ML big iron is far, far more expensive to buy and operate.

Question Zen 6 Speculation Thread

Diamond Member

Diamond Member

Diamond Member

Senior member

Senior member

Lifer

Diamond Member

Senior member

Senior member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Junior Member

Senior member

Diamond Member