@kjboughton
What happens if the other 2 cores are enabled?
I am working on a theory which should allow me to quite accurately model core power under a variety of scenarios.
As others have found, there is no readily-available means in which to defeat the processor's internal power control circuitry... at least as far as I am aware.
I disabled two cores (per CPU) for a couple of reasons...
Empirical testing showed more x264 encoding performance (i.e. higher average fps during conversion) than with all cores enabled. Presumably this is due to the higher overall sustained core frequencies with two fewer cores to power. As you know, circuit power consumption is a function of both frequency and voltage (which, for the group, is why we reduce Vcc_in) and with increasing frequency there is a reduction in the power/performance ratio (incrementally more power required to operate at diminishing frequency increases). Now, those that OC without a power ceiling don't care as they're looking to maximize performance independent power draw. Here we are limited and so strive for the highest power/performance point in order to maximum computational throughput.
If you look at a functional block diagram of the HCC HASWELL-EP/EX die you'll note it's broken into two rings with an uneven number of cores per ring (8 cores in Ring1, 10 cores in Ring2):
So what I found was maximum power/performance, and thus performance, came when I reduced core count from 18 to 16 per CPU. (As an aside, with 32 real cores I saw no need to overtax the Windows scheduler by doubling this to 64 and so elected to keep HTT disabled.)
I could run the additional cores for effect but there would be no benefit (in fact, a reduction in performance). As well, in order to allow all cores to work a single instance of x264 I have disabled NUMA forming a single super-set of 32 cores instead of two 16-core nodes. This would be the equivalent of running a ThreadRipper in "Content Creation" mode. For gaming, I find there no need to switch configurations as six or eight (or whatever the game utilizes) cores run just fine up at 4GHz
Do you see any WHEA corrections at 105MHz BCLK?
Increased BCLK is not a problem. Too high a frequency is a problem. Eventually you simply need a higher core voltage to keep climbing.
Can check under sensors of HWiNFO.
Something funky with my system leads to freezes when I try to read sensors on my board whether that be in AIDA64, HWiNFO64, etc. Not sure what to make of this thus far... ask me again later.
The question was regarding package C-States, in particular C6. Can be checked under HWiNFO under package C6 residency IIRC or using throttlestop and I seem to recall an Intel app but don't remember it's name.
I believe you may referring to Intel XTU (Extreme Tuning Utility?) which I've never used. As far as I can tell, C6 is enabled and working properly with my system.
I am running Haswell microcode 0x1F. I regularly see a significant portion of core time spent in C6 (vs. C0/C1) when the system is idle.
Also do you see RAPL measurements for DRAM? Without the mod my board shows nothing but with the mod it appears to work. Might be linked somehow with PC6 problem and DRAM setup?
I do recall seeing RAPL figures before using the EFI Driver with your PowerCut
TM feature built-in. I now no longer see this data (in addition to the problems I am having in general reading some sensor data). If I am remembering correctly it was on order of 5-7W per stick of 8GB DDR4-2133R SR memory.
It certainly would have been interesting to see how these CPU's performed unlocked (E5-1691v3?) but perhaps segmentation, profit and Broadwell got in the way.
OK. So allow me to now spill my guts on this. I think it's odd how I always end up settling into the ultimate system no matter the latest generation release. This time is no different.
Some months ago (not sure if it's still true today due to price and demand) I purchased a pair of used E5-2696 v3 (OEM only, mind you) Xeons off eBay for approximately $800 total ($400 each).
A refurbished ASUS Z10PE-D8 put me out another $400. Throw in another $800 for 64GB of DDR4 (new) from a reputable company and you have a barebones that can run the pants off both the i9-7980XE (unreleased) and the RYZEN Threadripper 1950X with room to spare. Now that's a $2,000 spending spree but what's a i9-7980XE due to cost alone? $1,999? I think we know who's going to win that one...
My recommendation for any who needs to CRUNCH is to go this route. Disable NUMA: this creates a single set of cores that can be thrown at even the most NUMA-unaware application out there. If it's parallel, like x264 encoding, disable NUMA... sure you lose some compute efficiency as on average 50% of the memory operations will be to far memory, the other operations being to near memory. That aside, double the number of cores will drown out any effect here (unless we're talking latency-sensitive applications... we're not; we're talking bandwidth-sensitive, if that). The point being, there is no replacement for displacement; at least not when comparing v3 Xeons to "v5" Xeons. With this hack applied my dual 2696 v3 system effectively performs at the level of a 2696 v4 (hrmp, imagine that) AND I get 3.8 (or ~4GHz) turbo (depending on if I want to push BCLK or not, typically not). Given that the i9-7980XE is essentially a ~HCC "v5" Xeon, in this case two top-end "v4" Xeons can certainly outpace a single mid-range "v5" Xeon and there you are. Winner, three-generation-old dual Xeon system with parts that are just absolutely spilling into the market as those bringing up the rear finally ditch their v3 Xeons to scoop up used v4 Xeons being rapidly discarded by large enterprise as they fulfill orders for the new Xeon Scalable Processor series.
By the way, AMD's use of "Content Creation" mode, which disables NUMA, is precisely what I am recommending to those that may build a system similar to mine. And unless you're talking bench-marking and other synthetic focus, there's zero reason to reboot to go between modes (i.e. enable NUMA)... and I'm talking Xeon here... as 3.8 (or 4GHz) is plenty fast, especially with 32 REAL cores on tap, to play even the most demanding of today's games.
TL;DR
Don't buy i9 or Threadripper, pick up a pair of E5-2696 v3's and beat the snot out of both