Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 778 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Shmee

Memory & Storage, Graphics Cards Mod Elite Member
Super Moderator
Sep 13, 2008
7,690
2,671
146
I am not sure if this has been mentioned yet in this thread or somewhere else, but it looks like there is a weird account bug in Windows 11 which can decrease gaming performance with Zen 4 and 5, at least so far. Apparently when using the built in administrator account, performance is slightly better. They mention that it may be possible this is also present in other CPUs, perhaps Zen 3 / Intel etc, but don't know yet.

I am wondering, does anyone know more about this and is this just for Windows 11 or is it present in 10 as well?
 
Reactions: Hotrod2go

marees

Senior member
Apr 28, 2024
374
436
96
I am not sure if this has been mentioned yet in this thread or somewhere else, but it looks like there is a weird account bug in Windows 11 which can decrease gaming performance with Zen 4 and 5, at least so far. Apparently when using the built in administrator account, performance is slightly better. They mention that it may be possible this is also present in other CPUs, perhaps Zen 3 / Intel etc, but don't know yet.

I am wondering, does anyone know more about this and is this just for Windows 11 or is it present in 10 as well?
Super admin privilege is required to allocate extra memory.

 
Reactions: igor_kavinski

jdubs03

Senior member
Oct 1, 2013
712
316
136
I am not sure if this has been mentioned yet in this thread or somewhere else, but it looks like there is a weird account bug in Windows 11 which can decrease gaming performance with Zen 4 and 5, at least so far. Apparently when using the built in administrator account, performance is slightly better. They mention that it may be possible this is also present in other CPUs, perhaps Zen 3 / Intel etc, but don't know yet.

I am wondering, does anyone know more about this and is this just for Windows 11 or is it present in 10 as well?
Hardware unboxed mentioned I think in a Twitter post that some of their discord members were reporting that they see the same thing in Windows 10 as well.
 

marees

Senior member
Apr 28, 2024
374
436
96
I am not sure if this has been mentioned yet in this thread or somewhere else, but it looks like there is a weird account bug in Windows 11 which can decrease gaming performance with Zen 4 and 5, at least so far. Apparently when using the built in administrator account, performance is slightly better. They mention that it may be possible this is also present in other CPUs, perhaps Zen 3 / Intel etc, but don't know yet.

I am wondering, does anyone know more about this and is this just for Windows 11 or is it present in 10 as well?
This is probably not a windows bug but a "feature/bug" of AMD's CPU testing where they always enabled this mode
 

Shmee

Memory & Storage, Graphics Cards Mod Elite Member
Super Moderator
Sep 13, 2008
7,690
2,671
146
Hmm interesting. So possibly just better performance in admin mode due to more memory access?
 
Reactions: marees

marees

Senior member
Apr 28, 2024
374
436
96

StefanR5R

Elite Member
Dec 10, 2016
5,892
8,763
136
This and other threads of the CPU subforum are steadfastly spammed by the claim that "many" parallel applications work on small data. (Generally without saying how many. I'll hand them that perhaps "many" popular benchmarks work on small data.) Now the thing is: If you feed an algorithm small datasets, then the typical outcome is that this algorithm scales poorly to high thread counts. [Edit: Because the dataset has to be sliced and diced into more and more tiny pieces, leaving little computation per piece while increasing the control and synchronization overhead.]

I'll just leave this here, fully aware that the spam will blissfully keep flowing no matter what.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,892
8,763
136
For a change, let's consider Ethernet FPS instead of Video Game FPS.

Michael Larabel (Phoronix) reports on a Linux kernel patch submission from AMD which shall enable support for Smart Data Cache Injection (SDCI) in future CPUs.

SDCI allows devices, for example network interface controllers, to place data into CPU caches instead of into RAM. Into the level 3 cache, notably. That is, Direct Memory Access turns into direct cache access.

Apparently, SDCI was at some point mentioned as a possible feature of Genoa and Bergamo, but if there was indeed such a plan, it didn't make it out of the labs until now.

The timing of the Linux kernel code submission lets me guess that this is already for Zen 5/ Turin, not for Zen 6.

Edit: Now what I am wondering is whether this wasn't supposed to be a Genoa feature to begin with, or if it couldn't be made to work with Genoa's CCDs or with Genoa's IOD, or…?
 
Last edited:

PJVol

Senior member
May 25, 2020
698
621
136
Edit: Now what I am wondering is whether this wasn't supposed to be a Genoa feature to begin with, or if it couldn't be made to work with Genoa's CCDs or with Genoa's IOD, or…?
I wonder whether it's technically different from the L3 allocation that assigns (MMIO?) memory address space to the L3 partition, and which is available to users since... (don't remember which Zen gen)?
 

PJVol

Senior member
May 25, 2020
698
621
136
Their childish whining is really grating.
Although I do think that we should summarise what we know about Zen 5 at this point. So much info has been flying around, we should recollect what we know.
They forgot to add "2% faster than a.. umm... 7800x3d"
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,474
1,966
136
Super admin privilege is required to allocate extra memory.

Not "extra memory" but more efficient paging. Normally, all memory on x86 windows is accessed through 4kB pages. If you want to access 4GB of space, you need to set up a million PTEs, which is a problem because the TLB can only cache a few thousand. The hardware also supports 2MB pages, which are a lot more reasonable, and on zen use the same tlb entries so the cache can cover 6GB on Zen 4.

IIRC large page support was previously available in windows, but a horrible bug was found in it (because very few people actually used it), at which point it was moved to admin only. Linux supports not only normal large pages, but also has transparent huge page support, meaning that if it's turned on, software that was not designed for large pages can make use of it.
 

PJVol

Senior member
May 25, 2020
698
621
136
Not "extra memory" but more efficient paging. Normally, all memory on x86 windows is accessed through 4kB pages. If you want to access 4GB of space, you need to set up a million PTEs, which is a problem because the TLB can only cache a few thousand. The hardware also supports 2MB pages, which are a lot more reasonable, and on zen use the same tlb entries so the cache can cover 6GB on Zen 4.

IIRC large page support was previously available in windows, but a horrible bug was found in it (because very few people actually used it), at which point it was moved to admin only. Linux supports not only normal large pages, but also has transparent huge page support, meaning that if it's turned on, software that was not designed for large pages can make use of it.
Does allocating huge pages using rebar require the same PVL?
 
Jul 27, 2020
19,613
13,481
146
The problem with high core count parts on mainstream platforms is that they raise the power delivery system requirements for motherboard manufacturers, because the CPU needs to have relatively high all-core boost clocks to make sense to begin with. Which in turn means general public would need to pay more for motherboards to essentially subsidise this small portion of the desktop market, which is a relatively small market on its own. Which is why I think AMD is reluctant to increase core counts on desktop: not only such an SKU would serve a relatively small niche of workloads that scale to high core counts AND don't need memory bandwidth, it would also require everyone else to pay for it.
OK, that's a valid point. But couldn't AMD support the higher core count (supposedly 24C/48T or 28C/56T) CPU on only the X670E/X870E mobos? Those are expensive to begin with so they should have the necessary power delivery components already in place. Not everyone is paying $400 for a mobo but those that do, they should get something in return for their dollars, such as support for higher core counts.
 

Timmah!

Golden Member
Jul 24, 2010
1,510
824
136
OK, that's a valid point. But couldn't AMD support the higher core count (supposedly 24C/48T or 28C/56T) CPU on only the X670E/X870E mobos? Those are expensive to begin with so they should have the necessary power delivery components already in place. Not everyone is paying $400 for a mobo but those that do, they should get something in return for their dollars, such as support for higher core counts.
IMO any board capable of running 16C should be able to run 24C… just at lower clocks.
 
Reactions: Josh128

blackangus

Member
Aug 5, 2022
143
193
86
OK, that's a valid point. But couldn't AMD support the higher core count (supposedly 24C/48T or 28C/56T) CPU on only the X670E/X870E mobos? Those are expensive to begin with so they should have the necessary power delivery components already in place. Not everyone is paying $400 for a mobo but those that do, they should get something in return for their dollars, such as support for higher core counts.
They already do get something for their dollars.
Better looking
Better power delivery
More ports
Better audio
Better networking
Better heat management
Better Overclocking

And most importantly:
Better bragging rights
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,892
8,763
136
(SDCI)
I wonder whether it's technically different from the L3 allocation that assigns (MMIO?) memory address space to the L3 partition, and which is available to users since... (don't remember which Zen gen)?
Hmm, do you have a pointer to a description of this functionality? Was this one for CPU initiated accesses perhaps? SDCI is for device initiated accesses, the device being the writer.
 

MS_AT

Senior member
Jul 15, 2024
210
507
96
The y-cruncher example is essentially a worst case scenario. It's also next to impossible to achieve. In reality, MOST, but not all, AVX-512 workloads that are not purely synthetic will not be constantly streaming the maximum amount of data continuously. They will digest chunks, manipulate it, test the results, then store the results of the manipulation or the findings of the test, then either wait on the non AVX-512 portion of the code to do things, or move on to the next chunk of data.
I think you misunderstood. The quote was saying that this "digest chunk, manipulate it, test the results" part has to be extremely large between loads to avoid memory bandwidth bottleneck if you want to load all cores, what makes it impractical. Once again I underline all cores. Sure you can find workloads that are single threaded by nature or for one reason or another won't hit memory bottleneck. But that doesn't mean the memory bottleneck doesn't exist. If the memory BW would be sufficient then we could start to talk about the backend bottleneck etc. The thing is that the core has much more capable backend than memory BW available to it.
32MB of l3 for 8 cores is plenty for most tasks, and represents as much or more l3 per core than any Intel avx-512 enabled product ever produced. The X3d parts will have 3x that amount. Yes, main memory bandwidth is limiting in synthetic or academic scenarios, but it isn't the end of the story.
You forget that L3 is victim cache for both Skylake-X and Zen architectures. That means you cannot prefetch into it. If your algorithm won't reuse the memory locations that got evicted from L2 to L3 then the importance of L3 is reduced. Once again it depends on algorithm in question.
Biggest bottleneck is L3 bandwidth. Simply 32B/s is not enough for full AVX512 throughput. The capacity of 32MB is good enough for 8 cores AVX512 non-streaming workloads.
Due to above, the L3 being a victim cache of L2, what is the biggest bottleneck depends on the algorithm. That's why 32MB might be good enough or might be too little. Statements like this are a bit too general and loose the nuance of the problem. For streaming workloads the GMI link bandwidth is the problem as it's lower than L2 to L1 bandwidth. Once you equalize them, the L2 will be a bottleneck and so on. If you have non streaming workload then the size of your working set and how you access the data will determine if the 32MB is good enough.
I see HUB is milking the Zen5 release to the fullest. If I am not mistaken their own video was the source of the "admin rights give Zen cpus a boost" then they will do another video to discard it as something Zen specific. [which is funny as I saw other reviewers doing similar tests and showing that Intel was largely unaffected, but this is beside the point]. At the same time they were given the reviewer guide on hand that claims the game uplift is <= 5%. What would be more useful is if they brought a question to AMD why the review guide doesn't agree with promotional material and then do video about that. I mean they know that the gaming performance won't be improved no matter the weird trick they will try next only to paint the release in even worse light for more clicks... And except for one outlet I still haven't seen anyone try to benchmark if the core parking is doing anything for performance. But maybe as someone already suggested HUB will follow with a video, "Zen 5 gaming performance doesn't not improve in full moon..."
OK, that's a valid point. But couldn't AMD support the higher core count (supposedly 24C/48T or 28C/56T) CPU on only the X670E/X870E mobos? Those are expensive to begin with so they should have the necessary power delivery components already in place. Not everyone is paying $400 for a mobo but those that do, they should get something in return for their dollars, such as support for higher core counts.
I can already see the outcry on social media about artificial segmentation to milk customers
Will just running games in Admin mode (click "run as administrator) help, or do you need to ran from that hidden super admin account?
it's sufficient
 

MoistOintment

Junior Member
Jul 31, 2024
11
22
36
I think you have it backwards. The high end Intel chips will be made with 20A silicon. I think the low end Intel offerings are any CPU's below what we know as i3 CPU's. Those could be made with TSMC silicon.

Based on how silicon has been measured historically. Intel 20A silicon is essentially 5nm. I didn't say it. That is what has been published all over the web for several years. Intel has said for many years that their silicon offers much more density than TSMC silicon. Intel is halving their process node from 10nm to 5nm. That is a huge jump in size compared to TSMC going from 7nm to 5nm to 3nm.

There is no comparing the 14th generation to Arrow Lake 20A. The performance uplift and power efficiency gains may put it ahead of what N4P has done for Zen 5. That is what a lot of people have ignored. Many assume that Arrow Lake is Raptor Lake's next act. The reality is totally new silicon with a different architecture scheme.

People who do not take sides in the AMD vs Intel battle have been waiting for Arrow Lake because of the new silicon node. Like me, they want to see what it can do. The upcoming Arrow Lake CPU's are said to be from 65w TDP to 150w TDP for the highest end CPU's. I have heard the non K series CPU's will be 65w up to at least Intel 7 series.

I said before Zen 5 was released that a Zen 5+ with N3P would be necessary because of 20A and 18A further down the road from Intel.
I think it's best to just always use the correct node names to avoid confusion.

There's nothing inherent in 20A that would make calling it "5nm" more accurate.All node names should be treated as product names to denote where the fab believes they compete, relative to other fabs, or to denote mild improvements (TSMC N6 wasn't a shrink of N7, for example).

If TSMC N3 is better than Intel 3, but they're both within single digit % of each other, I would still say referring to both as "3nm nodes" is fine. Identical performance wouldn't be a realistic expectation.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |