I guess, this is what Mike Clark was so excited about already several years ago. Zen5 was not meant as one single generation to rule them all, but might be seen as a foundation for the coming years that might reap the benefits when process technology allows them to better use that new front-end. IMHO it is quite possible, they had to scale Zen5 back a bit due to N3 not being financially feasible.
I hope some of these folks who do microbenches (like David or Cheese) put up a new article once 9000 series are available.
According to David the bottleneck is somewhere down to cache/memory side which could explain some of the weaknesses of Zen 5, but of course only to unravel the next bottleneck.
BP seems to be very very good.
From his benchmark, I am wondering whether 2MiB L2 or even 1.5MiB L2 for Zen 5 would improve the situation for many apps with bigger hot code footprint.
2MiB of L2 would raise the Core area by roughly another mm2.
The only exception observed in the chart is the test with one branch per 64 bytes, where the latency spiked after exceeding 16384 branches. After calculation, it is easy to find that 16384 * 64B = 1 MB, which means that the code footprint has exceeded the L2 capacity when the latency increases.
Another thing is the unified scheduler for ALU caused some additional cycle in some cases but the advantage is that AMD could leverage a smaller PRF across 6 ALUs
Int PRF increase was a grand 16 entries only which is very surprising. whereas FP PRF doubled. Also no change in the number of available ALU scheduling entries (it decreased a bit actually but it is shared with AGU in Z4).
L2 to L3 still at 32B/cycle which is another factor. I thought this would finally be 64B/cycle
The 32B/cycle fabric which on fclk of around 2000MHz remains a key factor with latency, Strix probably have a much lower fabric clock I imagine, inter CCX latency is not addressed
I had hoped that at least L2 goes to 2M or increase the BW from L2 to L3, if it is 64B/cycle they could make do with 1M L2 and have access to the massive L3 including V Cache