Could it be simply that it was not worth it and that is why it was disabled? Quoting clamchowder from C&C
Also a question, which stage is doing the decoding now? Funnily it seems not to be the decoders themselves as both decoders and uop caches are sending instructions down to rename according to diagrams.
Is the decoder only doing part of the job? IIRC macro ops should be the "decoded" instructions that are later turned into uops before execution. Of course I might have misunderstood something. But in the software optimization guide they are making clear wording distinction that OpCache holds "instructions" vs macroOps in Zen4. Therefore what "decoders" are actually doing is identifying the instruction boundaries [the hardest part of the job I guess] if my understanding is correct.
I mean if the 2 decoders on 1 HW thread would require extra validation and they were seeing diminishing returns because the other parts of the system were not keeping up with demand, might be they simply gave up.My view is tunnel visioning on the decoders misses the elephant in the room. Backend memory access latency and frontend latency are holding back perf. You can find frontend bandwidth bound slots but there aren’t a lot of them. If the frontend was struggling to feed a 4-wide decoder due to BTB/iTLB/L1i miss latency, it’s not clear how much benefit you’d get from adding more decode slots that you also can’t feed. Also the uop cache covers most of the instruction stream even with kernel compilation.
Also a question, which stage is doing the decoding now? Funnily it seems not to be the decoders themselves as both decoders and uop caches are sending instructions down to rename according to diagrams.
Is the decoder only doing part of the job? IIRC macro ops should be the "decoded" instructions that are later turned into uops before execution. Of course I might have misunderstood something. But in the software optimization guide they are making clear wording distinction that OpCache holds "instructions" vs macroOps in Zen4. Therefore what "decoders" are actually doing is identifying the instruction boundaries [the hardest part of the job I guess] if my understanding is correct.