Non-consumer workloads shouldn't suffer much from two different NUMA Nodes because they are parallelized carefully to avoid inter-thread dependencies as much as possible. If however windows assigns consumer tasks with less than 8 threads to cores across two different NUMA nodes that would just be a bad scheduler implementation of Microsoft.
I don't really get why Windows moves tasks across cores any way. Sure that distributes heat better, but it also causes a lot of unnecessary cache faults and context switches. They should rather keep tasks at a core and don't interrupt it if possible.
It is not to distribute heat that the thread migration occurs, it is more about being as energy efficient as possible.
At the time of Windows 7, tablets were not the first thing in mind and windows was run on desktops or laptops.
At the time of windows 8 and later on windows 10 , microsoft had a paradigm shift towards energy efficiency since windows 10 is thought of as a mobile OS. Meaning save as much power as possible.
Of course, the different competing programming groups at microsoft always cause good ideas to not come fully to fruit in the long run.
This is kind of why :
https://www.gamersnexus.net/news-pc/2870-ryzen-power-plan-update-min-frequency-90-pct
“Win7 keeps all physical cores awake, and parks SMT cores. Win10 keeps one physical and one logical core away (Core0+1), then parks the rest as often as possible. This change alone is what’s responsible for the cases where Win7 was faster than Win10 gaming performance, not the scheduler as the community thought.”
And my reasoning about it :
This is strange behavior at first sight for windows 10 in comparison except from a energy efficiency perspective.
I can image that if a thread is run on a logical core, that windows 10 would migrate the thread to a pysical core instead when utilization of the cores jump to max.
I can imagine that the kernel has performance counters to track how much utilization there is, i mean the taks manager shows it so it makes sense the kernel uses it as well.
So for energy efficiency keep threads on physical core and (SMT) logical core and keep all other physical and accompanying logical cores parked and move from the used logical core(because a physical core and logical smt core share the same hardware) to a possible free but parked physical core when core utilization is getting maxed out : Thread migration.
Just an idea, i do not know if this is really the case.