Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

poke01 · Aug 1, 2024

The better question if you are going solder RAM, you could save board space and have bigger busses if you use on-package but what are the disadvantages of soldering memory to the board vs MoP?

I can’t think of any for STX halo.

adroc_thurston · Aug 1, 2024

The Hardcard said:
I am a straight up bubble huffer. I say that AI functions will be the main reason for sales of computing devices by 2040. I still find it bizarre there’s so many people on various text site forums don’t see that AI is THE future of computing in society.

matrix math isn't all that useful.

The Hardcard said:
I believe the next 30 years will yield a bigger technological and social transformation than than the period from the dawn of civilization to 2024.

xir we're launching TLAMs at Taiwan in a few years.
You sure?

poke01 said:
I can’t think of any for STX halo.

MoP means SKU spam.
Suboptimal for a new swimlane new part new-new-new.

poke01 · Aug 1, 2024

adroc_thurston said:
MoP means SKU spam.

but amd loves that

adroc_thurston · Aug 1, 2024

poke01 said:
but amd loves that

Do they? Strix has whole two.

itsmydamnation · Aug 1, 2024

The Hardcard said:
I am a straight up bubble huffer. I say that AI functions will be the main reason for sales of computing devices by 2040. I still find it bizarre there’s so many people on various text site forums don’t see that AI is THE future of computing in society.

To be sure, new algorithms are needed. The current ones are just stopgaps, as impressive as they may be at these small simple tasks. But, just as current computers are many orders of magnitude more complex and capable than a Commodore 64, neural networks are going to follow the same trajectory, at about triple the speed.

Research is already underway to boost reasoning capacity by 100 to 1000 fold. Those algorithms will be here in a few short years, ones that will make large language models feel like stone age technology. Critically, those won’t be the last ones.

I believe the next 30 years will yield a bigger technological and social transformation than than the period from the dawn of civilization to 2024.

Yeah, I’m huffing big time.

this shows a complete lack of understanding about the hardware your algorithms are running on and the evolution of hardware and what the future looks like.

Your Computations are already running on hyper targeted and optimised hardware that has the biggest BOM's out side of mainframes we have ever seen. Also who exactly is making any money from AI outside Nvida? Im watching this daily as i want to move before the A** falls out of the market.

The Hardcard · Aug 2, 2024

adroc_thurston said:
matrix math isn't all that useful.

and yet is the main focus of every major chipmaker on the planet, as well as nearly all of the minor ones. From CEOs, CTOs, PhD fellows all the way down to interns, it is the most important aspect of every architecture and chip at every stage from concept through design, production and validation. Hmmmmmmmm.

adroc_thurston said:
xir we're launching TLAMs at Taiwan in a few years.
You sure?

Surely, you’ve noticed that multiple countries have begun spending hundreds of billions of dollars to ensure that they have angstrom scale fabs inside their borders. Each of those countries will massively increase spending and several other nations will join them.

itsmydamnation said:
this shows a complete lack of understanding about the hardware your algorithms are running on and the evolution of hardware and what the future looks like.

Your Computations are already running on hyper targeted and optimised hardware that has the biggest BOM's out side of mainframes we have ever seen. Also who exactly is making any money from AI outside Nvida? Im watching this daily as i want to move before the A** falls out of the market.

Well obviously, I think the lack of understanding is fully in your camp.

Llama 3 405B has for the last week been used and tested by thousands of people who have been heavily using LLMs for two to three years now. So far it consistently demonstrates effectively equal capabilities as ChatGPT 4o. And it will run on 2 Macbook Pros connected with a single Thunderbolt cable.

More importantly, LLMs are not the AI revolution. They are the precursor.

adroc_thurston · Aug 2, 2024

The Hardcard said:
and yet is the main focus of every major chipmaker on the planet, as well as nearly all of the minor ones. From CEOs, CTOs, PhD fellows all the way down to interns, it is the most important aspect of every architecture and chip at every stage from concept through design, production and validation. Hmmmmmmmm.

First time?

The Hardcard said:
Surely, you’ve noticed that multiple countries have begun spending hundreds of billions of dollars to ensure that they have angstrom scale fabs inside their borders. Each of those countries will massively increase spending and several other nations will join them.

That's not how it works.
Gigafabs are Taiwan-only and R&D expertise is also pretty much non-transferrable.

gdansk · Aug 2, 2024

Republic of China? No, it's tAIwan now.

Is there going to be a Zen 5 version of the MI300A?

csbin · Aug 2, 2024

oh, nooooooooo

https://twitter.com/x/status/1819215703142187164

Translated added -

As to why the equivalent delay is 1 and not 0.5, this is a major problem I'm having at the moment.The current version of the microcode seems that a single thread cannot see two decoders no matter what, that is, after the op$ is released or turned off, the front end directly becomes 4-wide and can only take 1 per cycle (regardless of whether there is a branch jump or not). This is obviously inconsistent with AMD's propaganda that a single thread can use two decoders, and more investigation is needed.

Mod DAPUNISHER

mostwanted002 · Aug 2, 2024

csbin said:
oh, nooooooooo

https://twitter.com/x/status/1819215703142187164

Translated added -

Mod DAPUNISHER

>RDNA double-issue flashbacks.

branch_suggestion · Aug 2, 2024

mostwanted002 said:
>RDNA double-issue flashbacks.

Well yeah, AMD is doing that funny PPA trick of stripping out hardware, adding double pump logic cause it is cheap and then having to rely on compilers to actually utilise the hardware.
As it turns out, the software is lagging the hardware and holding it back, as is tradition for AMD.
Whether there is a new AGESA in the next week or so that magically enables the core to use dual decoders for compatible 1t workloads, well that would make any review delay justified.
Yes I am coping.

misuspita · Aug 2, 2024

poke01 said:
If AMD wants to keep it around $2000

At 2000 it's a goner. For it to be successful it needs to be at 1200-1500€. There is already a product of Minisforum which combines a 7945hx and a 7600 XT which they are selling now for 1200€ (1000 right now). The price can't be more then cpu+gpu of same perf, or it will flop.

https://store.minisforum.com/products/atomman-g7-pt?mc_cid=66480a38a3

Geddagod · Aug 2, 2024

mostwanted002 said:
>RDNA double-issue flashbacks.

>+30% ipc is back on the menu boys 🤡

mostwanted002 · Aug 2, 2024

branch_suggestion said:
Well yeah, AMD is doing that funny PPA trick of stripping out hardware, adding double pump logic cause it is cheap and then having to rely on compilers to actually utilise the hardware.
As it turns out, the software is lagging the hardware and holding it back, as is tradition for AMD.
Whether there is a new AGESA in the next week or so that magically enables the core to use dual decoders for compatible 1t workloads, well that would make any review delay justified.
Yes I am coping.

^we are coping.

:copiumhuff:

misuspita · Aug 2, 2024

So by the time Zen6 launches, we'll have a Zen5 performance liftup?

AMDs "age like fine wine" strategy at work... 🤦🏻‍♂️

itsmydamnation · Aug 2, 2024

The Hardcard said:
and yet is the main focus of every major chipmaker on the planet, as well as nearly all of the minor ones. From CEOs, CTOs, PhD fellows all the way down to interns, it is the most important aspect of every architecture and chip at every stage from concept through design, production and validation. Hmmmmmmmm.

Surely, you’ve noticed that multiple countries have begun spending hundreds of billions of dollars to ensure that they have angstrom scale fabs inside their borders. Each of those countries will massively increase spending and several other nations will join them.

Well obviously, I think the lack of understanding is fully in your camp.

Llama 3 405B has for the last week been used and tested by thousands of people who have been heavily using LLMs for two to three years now. So far it consistently demonstrates effectively equal capabilities as ChatGPT 4o. And it will run on 2 Macbook Pros connected with a single Thunderbolt cable.

More importantly, LLMs are not the AI revolution. They are the precursor.

So one model that cant do anything useful is more efficient then another model that cant do anything useful....

how many more 10x's of everything do we need to get somewhere useful.

We kind of had 10x scaling in hardware moving from CPU SIMD to MIMD to GEMM engines to hardware aware sparsity, memory compression etc etc. Now we have massive clusters with chips at retile limits that are limited by the speed of light across derivatives of Clos fabrics In a word where CMOS scaling is dead.

I've seen one good use case of AI that works today in my field (high-end tech) and its not replacing a single job or driving large efficiencies. It will just give better situational awareness during failure conditions.

gdansk · Aug 2, 2024

Somehow 9% improvement in INT 1T doesn't sound as bad if it's still (apparently) 4-wide decode.
Didn't think they could get any more out of that.

adroc_thurston · Aug 2, 2024

gdansk said:
Didn't think they could get any more out of that.

int PRF is barely a bump over Z4.
They have many many more avenues to bloat left.

MS_AT · Aug 2, 2024

branch_suggestion said:
Well yeah, AMD is doing that funny PPA trick of stripping out hardware, adding double pump logic cause it is cheap and then having to rely on compilers to actually utilise the hardware.
As it turns out, the software is lagging the hardware and holding it back, as is tradition for AMD.
Whether there is a new AGESA in the next week or so that magically enables the core to use dual decoders for compatible 1t workloads, well that would make any review delay justified.
Yes I am coping.

I am afraid it's about bad marketing message and miscommunication. The materials were mentioning that decoders are statically partitioned in SMT mode. Now traditionally when you wanted to turn off SMT, you went to BIOS and disabled it. Now the question is, is the SMT mode static when enabled [If SMT is on in the BIOS is the core always in SMT mode] or is it dynamic like the interviews are leading us to believe.

CouncilorIrissa · Aug 2, 2024

I guess it's the reason why GCC still treats Zen 5 as 4-wide decode uarch.

gdansk · Aug 2, 2024

CouncilorIrissa said:
I guess it's the reason why GCC still treats Zen 5 as 4-wide decode uarch.

🤔 It seems that the AMD employee who submitted the patch knew it was 4-wide all along.
We were bamboozled by Mike Clark yet again.

coercitiv · Aug 2, 2024

Let me guess, gonna' fix it in Zen 5+.

CouncilorIrissa · Aug 2, 2024

gdansk said:
🤔 It seems that the AMD employee who submitted the patch knew it was 4-wide all along.
We were bamboozled by Mike Clark yet again.

Someone needs to run GNR with SMT off once it's released to see what's really going on.

coercitiv said:
Let me guess, gonna' fix it in Zen 5+.

View attachment 104336

Zen 5.5* 🤣

MS_AT · Aug 2, 2024

gdansk said:
🤔 It seems that the AMD employee who submitted the patch knew it was 4-wide all along.
We were bamboozled by Mike Clark yet again.

It might be other things too. These were early patches, they might have wanted just to get znver5 option added that would not dramatically break the situation for people using --march=native rather than to give accurate representation of the core. They might have wanted not to share everything or they might have forgot to update. Don't forget that CPUs are designed to handle less than ideal code [OoO, branch prediction etc.] so this won't have terrible effect. It would be much worse if somebody forgot to turn on all available instructions sets as that would hamper the generated code more.

tsamolotoff · Aug 2, 2024

The Hardcard said:
To be sure, new algorithms are needed.

The real AI has never been tried, just wait 2-3 weeks (insert 💲💲 here). Well, seriously, what kind of use the current transformers have apart from replacing the politicians (as they can lie and get even more delusional than the most flamboyant political figures around the world). And also, why do we need it integrated into general-purpose CPUs and GPUs at all, it could be just a separate addon board or card and we won't need to sacrifice 16 mb of cache and employ a weird dual ccd setup for this deadweight silicon

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Member

Diamond Member

Platinum Member

Senior member

Member

Senior member

Senior member

Golden Member

Member

Senior member

Platinum Member

Platinum Member

Diamond Member

Senior member

Senior member

Platinum Member

Diamond Member

Senior member

Senior member

Member