So instead, the programmer needs to be even smarter. He/she needs to determine a good use of the cores in all situations that can dynamically change as needed. Keeping those hungry cores fed is a lot of work. And then when you can keep them fed, then Amdahl's law kicks in where those tasks might not really be as parallelizable as the programmer hoped. Even worse, if you overdo the threads, you risk making the main thread of player movement bog down and that would be very detrimental to game play. That is a lot of money to pay programmers for very little user benefit and possibly make it worse.
You definitely want to stick with smarter programming. The goal is good performance, not keeping cores busy. We definitely don't want to turn it into busy work that just punishes lower core counts, just to make higher core counts look better, when smart programming would have had good performance everywhere.
Too many times that is how new tech is marketed. I remember Microsoft first pushing DX10 (and Vista) by funding some DX10 modes in games, that made DX9 look worse in comparison, except is was all BS, and modders enabled the same features in DX9, that looked and performed just as well.
Or HDR monitors that make SDR content look worse, so HDR seems better than it really is:
https://www.youtube.com/watch?v=cgBzpYTn_8c&index=3&list=LLAZUTUJGRCVV5Jvkk5k69wQ
Too often new features are oversold by purposefully sabotaging the current feature set. Thankfully there isn't much sign that this is happening for multi-cores.
In reality the proper approach to performance improvements is profiling code to see where the bottlenecks are and addressing those the best way possible, which won't always be throwing more cores at the problem.
Obviously you get it, but many don't get the simple truth of Amdahl's law (it's actually VERY simple from a math perspective), and that much of gaming won't benefit dramatically from big core counts.
There is also the assumption that games are hard coded for 4 cores because we had 4 cores for a long time.
This really isn't the case. Once you discover the portions of the code that are suitable for parallel coding, and do the work of making that section of code parallel, it is now ready for any number of threads. No programmer worth their salt would right a section of parallel code, and hard code it to 4 threads. They would read the amount of available cores from the the OS, and setting thread counts in accordance with that, or using OS/Language constructs that simply have the OS decide the appropriate amount of threads to apply to the parallel construct, like Apples GCD.
We aren't seeing big performance lifts in modern games because A) GPU is really the bottleneck, B) Amdahl Law means there typically will be very little performance uplift in moving from 4 to 8 cores in mixed serial/parallel software load.
It has nothing to do with games targeting 4 cores.
Modern games done with parallel coding to take advantage of 4 cores will automatically take advantage of 8 cores. The lack of performance boost is simply Amdahls law.