You have exactly the same issue with big.LITTLE. If a scheduler is theoretically capable of detecting high utilization threads and moving them from little to big cores it's also theoretically capable of moving them from SMT shared to dedicated physical cores. It's all a software problem.
I haven't seen a scheduler move low-utilization threads to a logical core over a physical core, ever. At least not under Linux. Which scheduler actually does this?
Did you see AMD going with SMT2 before they announced it? Did anybody see that first implementation beating Intel's HT with the very first implementation?
The only thing that was clear that was Zen would not be a CMT design. So the next logical conclusion was an SMT2 implementation, at least to copy Intel.
That's completely beside the point. Does the majority of desktop users need AVX2? Most very likely do not.
Depends on which users we're talking about here. Anyone who does encoding or rendering will like it. There are use cases. SMT4 though? Maaaaaybe, maybe not. I'd have to see some benchmarks to really understand how AMD's implementation of SMT4 would work before I was sold on it. If you asked me, do I want 8c/16t Zen3 or 4c/16t Zen3 on my desktop, obviously I'd prefer the former. 8c/32t probably moves me into a different price bracket/power envelope, making it maybe not an option for me anymore.
That's actually wrong unless you are specifically talking about HPC specifically. Servers in general are all about over-provisioning all kinds of resources, being prepared for the worst case resource usage scenarios.
HPC is one of the server applications where you'd want SMT4, so I was sort of erring on that side. Might be useful in high-utilization databases as well.
Patently wrong. The more cores a chip contains in one shared envelope the more the cores' activity will affect each other. The more cores can be put into deep sleep state the more headroom other cores can make use of. And as we know AMD developed Zen's microcode in PB in a way to dynamically make use of more headroom so it profits from that now already.
Now you're arguing thermals though, which is missing the point I'm making, since I'm assuming high CPU utilization overall for servers in scenarios where SMT4 might make sense. If all your cores are routinely sitting at 75% or higher utilization, then no, you do not worry about how the scheduler wakes up particular cores, since they aren't sleeping anyway.
But in the last two years AMD did the opposite of "selling more of the same since it works". Zen to Zen 2 completely changed the MCM topology. SMT is still very new to AMD, having been introduced only two years ago. Software support didn't prevent AMD from launching any of the Ryzen nor the Threadripper chips. Windows scheduler had serious issues with TR 1's NUMA, then again with TR 2 WX's unbalanced NUMA.
They sold SMT2 between 2017 and 2019. WRT SMT (or alternate strategies), that's "more of the same". They improved the individual cores and rejiggered IF links, but they didn't change their SMT strategy at all. They didn't go asynchronous core, they didn't go SMT4, they didn't kill SMT altogehter, they didn't resurrect CMT (thank goodness), etc.
resource allocation can be changed even after the creation of a VM,
In a matter of seconds? Milliseconds?
And disabling SMT/HT for cloud providers is due to them specifically offering resources per single vCPU, and you don't want this vCPU resource being a variable that depends on how many concurrent threads are on it. But that doesn't prevent server providers offering computing resources per CCX (or comparable big.LITTLE blocks) instead where SMT could be left enabled.
Okay, fair point. Some cloud providers might like SMT4. Others might not.
You yourself were arguing for the cat cores before.
Ah, but you have missed the larger point. Yes, I mentioned that a cut-down core or updated cat core might make more sense than moving to a wider Zen3 + SMT4. I'm still willing to acknowledge that it's 99.999999% unlikely that AMD would ever do such a thing. Making it even
less likely that AMD will adopt SMT4.
...which is part of the uncore and offers intra chip connectivity that one always needs on any chip...
You may notice that not everyone has these problems in their design.
And Zen cores can power gate everything except the shared L3$. (I think I remember the APUs can even power gate the L3$ itself since it's not shared due to its single CCX nature, not sure.)
If the L3 is inclusive then I don't think they can. ARM manages to gate off parts of the L3 by using one that's exclusive or . . . psuedo-exclusive or something.