Hi everyone,
Just to update on a post I made earlier this year:
http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=threads/ryzen-strictly-technical.2500572/page-9#post-38776310
I took a crack at solving this issue and the problem was indeed due to false sharing. CS: Go uses a very similar lightmap baker to the SDK 2013 branch of Valve's Source engine, so it's evident that the issue is present there as well.
Here are the results of the false sharing removed vs. original:
(Note: CPU is a Ryzen Threadripper - affinity masks are set to 8 threads, 16 threads, and 32 threads respectively to simulate 1 CCX, 2 CCX, and 4 CCX processors.)
The negative scaling I experienced with the dual Xeon tests (machine I previously used) is also eliminated (not shown below).
Lower is better!
Here is the pull request with the fix: (Literally two lines of code)
https://github.com/ValveSoftware/source-sdk-2013/pull/436
It seems that AMD CPUs, particularly AMD FX CPUs but also Ryzen are much more susceptible to the effects of false sharing than Intel CPUs.
Also, I'd like to point out that 'vrad' has been used to show off the benefits of multi-core processors in the past and as a benchmark:
https://www.anandtech.com/show/2489/11
It's very likely that this issue was present, even back then.
EDIT: Also, the comparatively poor scaling from 16 to 32 threads is likely due to SMT and Amdahl's law (or, rather, that the rest of the code that used to take a comparatively small portion of the time now takes a relatively long time and seems to also have some scaling issues.)
The speedup in the fixed section of the code is likely a fair bit higher than what's shown here.