Misunderstood in which sense? I thought that Zen5 has fairly accurate predictors, they are just slow. Or did you mean something else than accuracy? Genuinely asking.
What's the difference between a Fetch Predictor and a Branch Predictor?
When should a Branch Predictor deliver predictions?
What's the data provided by a Fetch Predictor?
Why am I using the term Fetch Predictor, not BTB?
Sorry, but there is something about Fetch, even more than the rest of the CPU, that makes the x86 contingent lose their goddamn minds. EVERY TIME I try to discuss the issue, it's like talking to a brick wall. I'm sick of it and wasting time on it.
Here's the proof. Verilator is by far the toughest "easy-ish to measure" load from the point of view of the FRONT-END.
SPEC is useless for testing front end, the code working set is tiny.
"Server" workloads are what you want, but most of those are difficult to run, verilator is the one fairly easy case to run.
This is from James Aslan's site, https://zhuanlan.zhihu.com/p/704707254, which being a mainland site is a freaking pain in the ass to deal with! You will have to register if you want to see anything, and you will never be able to comment because commenting requires a second stage of registration that requires a mainland phone number.
Regardless, the point is that when we push the front-end hard, ARM (and especially team Apple/ex-Apple) does vastly better than team x86. [Zen 4 is basically the same sort of level as Intel].
And yet team x86 refuse to listen every time you tell them they are doing it wrong...
(Firestorm was M1, Avalanche was M2.
Even Blizzard, the M2
small core does slightly better than raptor cove! That's on a different graph, but it achieves 2.50 on VTop1.)
OK, with all that rant out the way, go read:
Contribute to name99-org/AArch64-Explore development by creating an account on GitHub.
github.com
especially volume 4. That will tell you how to handle instruction flow PROPERLY.