@TheStilt .. Sorry if I've missed any of this. I can't stand Win 10, so don't like to use it except for a tablet.
Does parking a Core with Ryzen occur in logical pairs or one at a time?
What conditions have to be present in Win10 for the Core to be parked?
Are loads juggled about in Win 10? And seriously, why would they be if Cores are supposed to be idle->parked for efficiency? That makes no sense.
Core Park Manager and other tools can monitor and disable this feature, but so can the High Performance setting or changing min/max power values.
What's the DRAM/CCX inter bandwidth/latency with a few parked Cores? i.e. does it change?
Phenoms changed drastically.
What did you attain for L1, L2, L3 access latency? (tried Franck@CPUIDs tool?)
What is Ryzens idling frequency/voltage?
What are idle/load stock CPU temps like? Do the sensors seem accurate? (compared to K10 gen)
Can you monitor core throttling? Have you tried loading stock Ryzen to see if it throttles?
Trying to explain performance issues...
Two questions for
@The Stilt
1. Since the frequency of the data fabric is fixed to the memory frequency at a ratio of 1:2, does this mean that using faster memory would result in much faster performance and in what tasks would having faster fabric frequency be most beneficial?
It seems that fixing the data fabric frequency to the memory frequency impose significant restriction to the data fabric. In the future, could the data fabric frequency be decoupled from the frequency of the memory controller or perhaps the ratio could be changed for higher data fabric frequency?
1. In every ideal arch, these areas are on separate power planes and clock domains... Completely separate voltage islands.
This is a major shortsightedness by AMD with any 'IMC problems'. Phenom suffered due to IMC clocking and power shenanigans. That was 2006.
Having a linked CCX power plane, is even still backwards.
Its entirely possible that clocking might be impaired by their CCX more than purely 14nm LPP process. It was like that with Phenoms IMC/L3 previously.
AMD implemented decoupling characteristics back in 2007/2008 silicon and saved a ton of power and performance.
It is never a case of just decoupling alone tho and it just works. DRAM<->L3<->Fabric is very tricky to get right. Clocks and power generally pose major sync and timing issues to avoid corrupted data. Ryzens implementation is simply synonymous of a quick and easy job due to time constraints. It's obvious they couldn't afford to spend as much time as they needed on it.
Going forward, it would be the first area they would look to change.
2. These 'Windows issues' are AMD issues. We had them with Phenom for Christ's sake! When the workload bounced, with CnQ active, performance sucked and stuttered. AMD would have KNOWN about these issues during 'design considerations' 5 years back. It's AMD who has to adapt and get these fixed. Borked chip releases is only destroying their own image and income.
Secondly, Core Parking has to be implemented by AMD. If you don't have a working driver for your test OS, why allow this?
Telling reviewers to switch to the High Performance profile which doesn't park the core is just a band aid, at best and very misleading of your real world performance. Ryzen obviously has issues sleeping and waking the cores. Again, a Phenom issue.
3. Having two CCX at low interconnect bandwidth/high latency is an even bigger flaw, but this again will not be by design. This is going to pose a huge problem on Server workloads unless fixed. Forget HPC altogether.
4. Now you see why Server was not launched. AMD chose their smallest, least risk market to troubleshoot the chip.
5. Intellectual prediction won't magically gain 10-15% performance -- this is wishful thinking, like the pre-release hype. Every one of which turned out wrong.
Hmm, all because of Windows odd behavior vis-a-vis thread allocation. Windows NT, at least back to 3.5, switches threads between cores (then CPUs) according to some algorithm (always looked random to me). I remember a spat on COMP.ARCH between some server dude and Dave Cutler over this on a dual CPU system - task manager showed exactly 50% utilization on each CPU when running a single threaded process. AMD, apparently, couldn't afford too design and implement two separate CPUs for client and server (with both being monolithic)
re: 2) As Stilt pointed out, a ring 0 proggie can change this via core parking or from the command prompt with the /affinity switch can be used. I've been using Process Lasso for years to deal with this issue in some, high performance, programs (and to manipulate priority levels). It sounds like MS may be waiting for the April launch of Redstone 2 ("creators update") to fix this within the windows scheduler when Zeppelin CPU is detected. I would guess that the fix may already be in the latest Windows "Fast Ring" updates or will be in the next couple of weeks.
I'm sure K10 had an app that could change the skew when different PStates are entered, and even force certain PStates.
And it would show separate power planes portions.
Sent from HTC 10
(Opinions are own)