Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

JustViewing · Aug 18, 2024

MS_AT said:
About why Zen5 SIMD is memory bottlenecked I will let myself cite Y-Cruncher author:

Biggest bottleneck is L3 bandwidth. Simply 32B/s is not enough for full AVX512 throughput. The capacity of 32MB is good enough for 8 cores AVX512 non-streaming workloads.

Shmee · Aug 18, 2024

I am not sure if this has been mentioned yet in this thread or somewhere else, but it looks like there is a weird account bug in Windows 11 which can decrease gaming performance with Zen 4 and 5, at least so far. Apparently when using the built in administrator account, performance is slightly better. They mention that it may be possible this is also present in other CPUs, perhaps Zen 3 / Intel etc, but don't know yet.

I am wondering, does anyone know more about this and is this just for Windows 11 or is it present in 10 as well?

marees · Aug 18, 2024

Shmee said:
I am not sure if this has been mentioned yet in this thread or somewhere else, but it looks like there is a weird account bug in Windows 11 which can decrease gaming performance with Zen 4 and 5, at least so far. Apparently when using the built in administrator account, performance is slightly better. They mention that it may be possible this is also present in other CPUs, perhaps Zen 3 / Intel etc, but don't know yet.

I am wondering, does anyone know more about this and is this just for Windows 11 or is it present in 10 as well?

Super admin privilege is required to allocate extra memory.

https://twitter.com/x/status/1824936985737187515

jdubs03 · Aug 18, 2024

Shmee said:
I am not sure if this has been mentioned yet in this thread or somewhere else, but it looks like there is a weird account bug in Windows 11 which can decrease gaming performance with Zen 4 and 5, at least so far. Apparently when using the built in administrator account, performance is slightly better. They mention that it may be possible this is also present in other CPUs, perhaps Zen 3 / Intel etc, but don't know yet.

I am wondering, does anyone know more about this and is this just for Windows 11 or is it present in 10 as well?

Hardware unboxed mentioned I think in a Twitter post that some of their discord members were reporting that they see the same thing in Windows 10 as well.

marees · Aug 18, 2024

Shmee said:
I am not sure if this has been mentioned yet in this thread or somewhere else, but it looks like there is a weird account bug in Windows 11 which can decrease gaming performance with Zen 4 and 5, at least so far. Apparently when using the built in administrator account, performance is slightly better. They mention that it may be possible this is also present in other CPUs, perhaps Zen 3 / Intel etc, but don't know yet.

I am wondering, does anyone know more about this and is this just for Windows 11 or is it present in 10 as well?

This is probably not a windows bug but a "feature/bug" of AMD's CPU testing where they always enabled this mode

Shmee · Aug 18, 2024

Hmm interesting. So possibly just better performance in admin mode due to more memory access?

marees · Aug 18, 2024

So far, it has been a Jekyll & Hyde story for zen 5

Zen, 5 excuses vs zen 5$/day crypto mining

https://twitter.com/x/status/1824693836230345079

AMD Ryzen 9 7950X used by cryptominers, making $3 per day in profit from their CPUs

AMD's current-gen Ryzen 9 7950X processor is using used for cryptomining, as it's more profitable than GPU mining while Bitcoin rides new highs.

www.tweaktown.com

StefanR5R · Aug 18, 2024

This and other threads of the CPU subforum are steadfastly spammed by the claim that "many" parallel applications work on small data. (Generally without saying how many. I'll hand them that perhaps "many" popular benchmarks work on small data.) Now the thing is: If you feed an algorithm small datasets, then the typical outcome is that this algorithm scales poorly to high thread counts. [Edit: Because the dataset has to be sliced and diced into more and more tiny pieces, leaving little computation per piece while increasing the control and synchronization overhead.]

I'll just leave this here, fully aware that the spam will blissfully keep flowing no matter what.

CouncilorIrissa · Aug 18, 2024

vanplayer said:
There would be a core latency patch by the end of August, likely released with new chipset X870/B860. Typical AMD that release software after hardware launch. LOL.

Do you have a source for this?

PJVol · Aug 18, 2024

DrMrLordX said:
MLID didn't support his claim at all, so why is anyone defending it?

The question is, should he really?

DrMrLordX · Aug 18, 2024

PJVol said:
The question is, should he really?

Who, MLID? No, he's a leaker. He just says stuff and asks you to believe it (or not). Not supporting his leaks is his job, and it's on everyone else to be skeptical of his claims, rather than claim that they're backed somehow by veracity (or anything else).

StefanR5R · Aug 18, 2024

For a change, let's consider Ethernet FPS instead of Video Game FPS.

Michael Larabel (Phoronix) reports on a Linux kernel patch submission from AMD which shall enable support for Smart Data Cache Injection (SDCI) in future CPUs.

SDCI allows devices, for example network interface controllers, to place data into CPU caches instead of into RAM. Into the level 3 cache, notably. That is, Direct Memory Access turns into direct cache access.

Apparently, SDCI was at some point mentioned as a possible feature of Genoa and Bergamo, but if there was indeed such a plan, it didn't make it out of the labs until now.

The timing of the Linux kernel code submission lets me guess that this is already for Zen 5/ Turin, not for Zen 6.

Edit: Now what I am wondering is whether this wasn't supposed to be a Genoa feature to begin with, or if it couldn't be made to work with Genoa's CCDs or with Genoa's IOD, or…?

Mahboi · Aug 18, 2024

marees said:
https://twitter.com/x/status/1824693836230345079

Their childish whining is really grating.
Although I do think that we should summarise what we know about Zen 5 at this point. So much info has been flying around, we should recollect what we know.

PJVol · Aug 18, 2024

StefanR5R said:
Edit: Now what I am wondering is whether this wasn't supposed to be a Genoa feature to begin with, or if it couldn't be made to work with Genoa's CCDs or with Genoa's IOD, or…?

I wonder whether it's technically different from the L3 allocation that assigns (MMIO?) memory address space to the L3 partition, and which is available to users since... (don't remember which Zen gen)?

PJVol · Aug 18, 2024

Mahboi said:
Their childish whining is really grating.
Although I do think that we should summarise what we know about Zen 5 at this point. So much info has been flying around, we should recollect what we know.

They forgot to add "2% faster than a.. umm... 7800x3d"

Tuna-Fish · Aug 18, 2024

marees said:
Super admin privilege is required to allocate extra memory.

Not "extra memory" but more efficient paging. Normally, all memory on x86 windows is accessed through 4kB pages. If you want to access 4GB of space, you need to set up a million PTEs, which is a problem because the TLB can only cache a few thousand. The hardware also supports 2MB pages, which are a lot more reasonable, and on zen use the same tlb entries so the cache can cover 6GB on Zen 4.

IIRC large page support was previously available in windows, but a horrible bug was found in it (because very few people actually used it), at which point it was moved to admin only. Linux supports not only normal large pages, but also has transparent huge page support, meaning that if it's turned on, software that was not designed for large pages can make use of it.

PJVol · Aug 18, 2024

Tuna-Fish said:
Not "extra memory" but more efficient paging. Normally, all memory on x86 windows is accessed through 4kB pages. If you want to access 4GB of space, you need to set up a million PTEs, which is a problem because the TLB can only cache a few thousand. The hardware also supports 2MB pages, which are a lot more reasonable, and on zen use the same tlb entries so the cache can cover 6GB on Zen 4.

IIRC large page support was previously available in windows, but a horrible bug was found in it (because very few people actually used it), at which point it was moved to admin only. Linux supports not only normal large pages, but also has transparent huge page support, meaning that if it's turned on, software that was not designed for large pages can make use of it.

Does allocating huge pages using rebar require the same PVL?

igor_kavinski · Aug 18, 2024

CouncilorIrissa said:
The problem with high core count parts on mainstream platforms is that they raise the power delivery system requirements for motherboard manufacturers, because the CPU needs to have relatively high all-core boost clocks to make sense to begin with. Which in turn means general public would need to pay more for motherboards to essentially subsidise this small portion of the desktop market, which is a relatively small market on its own. Which is why I think AMD is reluctant to increase core counts on desktop: not only such an SKU would serve a relatively small niche of workloads that scale to high core counts AND don't need memory bandwidth, it would also require everyone else to pay for it.

OK, that's a valid point. But couldn't AMD support the higher core count (supposedly 24C/48T or 28C/56T) CPU on only the X670E/X870E mobos? Those are expensive to begin with so they should have the necessary power delivery components already in place. Not everyone is paying $400 for a mobo but those that do, they should get something in return for their dollars, such as support for higher core counts.

Timmah! · Aug 18, 2024

igor_kavinski said:
OK, that's a valid point. But couldn't AMD support the higher core count (supposedly 24C/48T or 28C/56T) CPU on only the X670E/X870E mobos? Those are expensive to begin with so they should have the necessary power delivery components already in place. Not everyone is paying $400 for a mobo but those that do, they should get something in return for their dollars, such as support for higher core counts.

IMO any board capable of running 16C should be able to run 24C… just at lower clocks.

blackangus · Aug 18, 2024

igor_kavinski said:
OK, that's a valid point. But couldn't AMD support the higher core count (supposedly 24C/48T or 28C/56T) CPU on only the X670E/X870E mobos? Those are expensive to begin with so they should have the necessary power delivery components already in place. Not everyone is paying $400 for a mobo but those that do, they should get something in return for their dollars, such as support for higher core counts.

They already do get something for their dollars.
Better looking
Better power delivery
More ports
Better audio
Better networking
Better heat management
Better Overclocking

And most importantly:
Better bragging rights

StefanR5R · Aug 18, 2024

(SDCI)

PJVol said:
I wonder whether it's technically different from the L3 allocation that assigns (MMIO?) memory address space to the L3 partition, and which is available to users since... (don't remember which Zen gen)?

Hmm, do you have a pointer to a description of this functionality? Was this one for CPU initiated accesses perhaps? SDCI is for device initiated accesses, the device being the writer.

Heartbreaker · Aug 18, 2024

Shmee said:
Hmm interesting. So possibly just better performance in admin mode due to more memory access?

Will just running games in Admin mode (click "run as administrator) help, or do you need to ran from that hidden super admin account?

MS_AT · Aug 18, 2024

LightningZ71 said:
The y-cruncher example is essentially a worst case scenario. It's also next to impossible to achieve. In reality, MOST, but not all, AVX-512 workloads that are not purely synthetic will not be constantly streaming the maximum amount of data continuously. They will digest chunks, manipulate it, test the results, then store the results of the manipulation or the findings of the test, then either wait on the non AVX-512 portion of the code to do things, or move on to the next chunk of data.

I think you misunderstood. The quote was saying that this "digest chunk, manipulate it, test the results" part has to be extremely large between loads to avoid memory bandwidth bottleneck if you want to load all cores, what makes it impractical. Once again I underline all cores. Sure you can find workloads that are single threaded by nature or for one reason or another won't hit memory bottleneck. But that doesn't mean the memory bottleneck doesn't exist. If the memory BW would be sufficient then we could start to talk about the backend bottleneck etc. The thing is that the core has much more capable backend than memory BW available to it.

LightningZ71 said:
32MB of l3 for 8 cores is plenty for most tasks, and represents as much or more l3 per core than any Intel avx-512 enabled product ever produced. The X3d parts will have 3x that amount. Yes, main memory bandwidth is limiting in synthetic or academic scenarios, but it isn't the end of the story.

You forget that L3 is victim cache for both Skylake-X and Zen architectures. That means you cannot prefetch into it. If your algorithm won't reuse the memory locations that got evicted from L2 to L3 then the importance of L3 is reduced. Once again it depends on algorithm in question.

JustViewing said:
Biggest bottleneck is L3 bandwidth. Simply 32B/s is not enough for full AVX512 throughput. The capacity of 32MB is good enough for 8 cores AVX512 non-streaming workloads.

Due to above, the L3 being a victim cache of L2, what is the biggest bottleneck depends on the algorithm. That's why 32MB might be good enough or might be too little. Statements like this are a bit too general and loose the nuance of the problem. For streaming workloads the GMI link bandwidth is the problem as it's lower than L2 to L1 bandwidth. Once you equalize them, the L2 will be a bottleneck and so on. If you have non streaming workload then the size of your working set and how you access the data will determine if the 32MB is good enough.

marees said:
https://twitter.com/x/status/1824693836230345079

I see HUB is milking the Zen5 release to the fullest. If I am not mistaken their own video was the source of the "admin rights give Zen cpus a boost" then they will do another video to discard it as something Zen specific. [which is funny as I saw other reviewers doing similar tests and showing that Intel was largely unaffected, but this is beside the point]. At the same time they were given the reviewer guide on hand that claims the game uplift is <= 5%. What would be more useful is if they brought a question to AMD why the review guide doesn't agree with promotional material and then do video about that. I mean they know that the gaming performance won't be improved no matter the weird trick they will try next only to paint the release in even worse light for more clicks... And except for one outlet I still haven't seen anyone try to benchmark if the core parking is doing anything for performance. But maybe as someone already suggested HUB will follow with a video, "Zen 5 gaming performance doesn't not improve in full moon..."

igor_kavinski said:
OK, that's a valid point. But couldn't AMD support the higher core count (supposedly 24C/48T or 28C/56T) CPU on only the X670E/X870E mobos? Those are expensive to begin with so they should have the necessary power delivery components already in place. Not everyone is paying $400 for a mobo but those that do, they should get something in return for their dollars, such as support for higher core counts.

I can already see the outcry on social media about artificial segmentation to milk customers

Heartbreaker said:
Will just running games in Admin mode (click "run as administrator) help, or do you need to ran from that hidden super admin account?

it's sufficient

MoistOintment · Aug 18, 2024

Hans Gruber said:
I think you have it backwards. The high end Intel chips will be made with 20A silicon. I think the low end Intel offerings are any CPU's below what we know as i3 CPU's. Those could be made with TSMC silicon.

Based on how silicon has been measured historically. Intel 20A silicon is essentially 5nm. I didn't say it. That is what has been published all over the web for several years. Intel has said for many years that their silicon offers much more density than TSMC silicon. Intel is halving their process node from 10nm to 5nm. That is a huge jump in size compared to TSMC going from 7nm to 5nm to 3nm.

There is no comparing the 14th generation to Arrow Lake 20A. The performance uplift and power efficiency gains may put it ahead of what N4P has done for Zen 5. That is what a lot of people have ignored. Many assume that Arrow Lake is Raptor Lake's next act. The reality is totally new silicon with a different architecture scheme.

People who do not take sides in the AMD vs Intel battle have been waiting for Arrow Lake because of the new silicon node. Like me, they want to see what it can do. The upcoming Arrow Lake CPU's are said to be from 65w TDP to 150w TDP for the highest end CPU's. I have heard the non K series CPU's will be 65w up to at least Intel 7 series.

I said before Zen 5 was released that a Zen 5+ with N3P would be necessary because of 20A and 18A further down the road from Intel.

I think it's best to just always use the correct node names to avoid confusion.

There's nothing inherent in 20A that would make calling it "5nm" more accurate.All node names should be treated as product names to denote where the fab believes they compete, relative to other fabs, or to denote mild improvements (TSMC N6 wasn't a shrink of N7, for example).

If TSMC N3 is better than Intel 3, but they're both within single digit % of each other, I would still say referring to both as "3nm nodes" is fine. Identical performance wouldn't be a realistic expectation.

Heartbreaker · Aug 18, 2024

MS_AT said:
it's sufficient

Link?

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Senior member

Memory & Storage, Graphics Cards Mod Elite Member

Senior member

Senior member

Senior member

Memory & Storage, Graphics Cards Mod Elite Member

Senior member

Elite Member

Senior member

Senior member

Lifer

Elite Member

Golden Member

Senior member

Senior member

Golden Member

Senior member

Lifer

Golden Member

Member

Elite Member

Diamond Member

Senior member

Junior Member

Diamond Member