Question Zen 6 Speculation Thread

Page 26 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

FlameTail

Diamond Member
Dec 15, 2021
4,238
2,594
106
That's an interesting point.

ARM was able to reduce rhe decoder size by 4x in the Cortex X2/X3, by ditching AARCH32.
 

gdansk

Diamond Member
Feb 8, 2011
3,276
5,186
136
That's an interesting point.

ARM was able to reduce rhe decoder size by 4x in the Cortex X2/X3, by ditching AARCH32.
Weren't 32 bit ARM and 64 bit ARM more dissimilar than x86 is to x64?

And because Windows users have a bunch of 32-bit binaries with no source neither AMD nor Intel feel free to remove 32 bit mode.
Unless Microsoft writes another wow64 JIT converting x86 to x64? But that seems unlikely.
 
Reactions: lightmanek

Tuna-Fish

Golden Member
Mar 4, 2011
1,505
2,080
136
IIRC out of order is why CPUs need a branch predictor?

Nope, there are plenty of in-order CPUs that have branch predictors.

The reason you need a predictor is that modern CPUs are deeply pipelined. That is, it takes something like 10-20 cycles to actually completely execute a full instruction. As you can still have an instruction depend on a value created by a previous instruction and execute on the next cycle, most arithmetic and stuff does not see this latency at all, and it sort of looks like instructions take one cycle to complete. But this doesn't work for branches, for them you hit the full latency. So either you just twiddle your thumbs for 10-20 clocks every time there is a branch, or you guess. Early predictors were really dumb (you can get a net win with something really stupid, like the classic "all backwards branches taken, all forwards branches not taken"), but as improving prediction quality is both a performance and a power optimization, we have gotten really far from that.

why can’t AMD/Intel design a 10-wide decoder instead of these 2x4 and 3x3 ones?

x86 instructions can be anywhere from one to 15 bytes long. The possible starting point for the nth instruction covers a huge space that grows massively as the n grows. This is the one big advantage ARM has over x86; ARM instructions are fixed size.
 
Jul 27, 2020
20,921
14,496
146
x86 instructions can be anywhere from one to 15 bytes long. The possible starting point for the nth instruction covers a huge space that grows massively as the n grows. This is the one big advantage ARM has over x86; ARM instructions are fixed size.
How about they design a new layer that converts the variable instruction sizes to fixed sizes and create a large cache to index those translations? It will cost more transistors but this seems to be the only way x86 can get rid of its "variable size instruction" headache and baggage. Future compilations of applications can then just generate the fixed length instructions to bypass the translation layer. In time, only unsupported legacy applications will need to depend on the translation layer.
 

Nothingness

Diamond Member
Jul 3, 2013
3,137
2,153
136
How about they design a new layer that converts the variable instruction sizes to fixed sizes and create a large cache to index those translations? It will cost more transistors but this seems to be the only way x86 can get rid of its "variable size instruction" headache and baggage. Future compilations of applications can then just generate the fixed length instructions to bypass the translation layer. In time, only unsupported legacy applications will need to depend on the translation layer.
You basically described the uop cache 😀

Another possible trick is to add instruction boundaries in the icache (and possibly in the L2 cache). The problem is that you don't know if an icache line doesn't have data in the middle which would create wrong information. This can be very messy.

But neither of these features will alleviate the need to find the instruction boundaries which is quite expensive given how irregular x86 encoding is.
 
Jul 27, 2020
20,921
14,496
146
You basically described the uop cache 😀
But but but, the fixed length ISA is what's missing from that cache's function.

Create new fixed length ISA for x86. Translate variable length instructions to fixed length instructions and cache those translations for future reference to reduce latency. If, however, software has been recompiled to use the fixed length ISA, it bypasses the translation overhead. I understand that this is probably a gargantuan task to accomplish but the only one I can think of. When x86's main competitors (ARM and RISC-V) are making so much progress mainly due to fixed length instructions, x86 has no choice but to fall in line with backward compatibility in place otherwise it will be left hopelessly behind and never be able to catch up to those simpler designs in power efficiency.
 

GTracing

Member
Aug 6, 2021
168
396
106
But but but, the fixed length ISA is what's missing from that cache's function.
I think you're misunderstanding what a micro op cache is. Whether an instruction is variable length or fixed length makes no difference once it's translated to micro ops. Every x86 CPU since the original 8086 has used micro ops.

Create new fixed length ISA for x86. Translate variable length instructions to fixed length instructions and cache those translations for future reference to reduce latency. If, however, software has been recompiled to use the fixed length ISA, it bypasses the translation overhead. I understand that this is probably a gargantuan task to accomplish but the only one I can think of. When x86's main competitors (ARM and RISC-V) are making so much progress mainly due to fixed length instructions, x86 has no choice but to fall in line with backward compatibility in place otherwise it will be left hopelessly behind and never be able to catch up to those simpler designs in power efficiency.
If they're "translating" variable length instructions to fixed length at a hardware level automatically, that's essentially the same as supporting two ISAs. It would increase decoding overhead, not decrease. Not to mention the nightmare it would be to implement.

If they do it at a software level, then it's essentially the same as Rosetta 2 or Prism. Microsoft would have to be on board. And Intel would likely sue AMD (or vice versa). Even if they do come to an agreement to emulate newer x86 extensions, a fixed length x86 would still be a new ISA. At that point you might as well redesign it from the ground up.
 

GTracing

Member
Aug 6, 2021
168
396
106
Or AMD could implement it at the chipset driver level, without Microsoft's blessing.
How would the chipset driver know if a specific program is the old x86-64 or the new fixed length ISA? At some level, the OS would have to involved. And while I don't know the terms of their cross-licensing agreement, I doubt if it allows them to create x86 emulators.
 
Jul 27, 2020
20,921
14,496
146
How would the chipset driver know if a specific program is the old x86-64 or the new fixed length ISA?
Can't the driver read the first few instructions of the executable file to identify? Assuming mixing of new and old instructions in the same EXE isn't allowed.
 

GTracing

Member
Aug 6, 2021
168
396
106
Can't the driver read the first few instructions of the executable file to identify? Assuming mixing of new and old instructions in the same EXE isn't allowed.
Can they? I'm not super familiar with Chipset drivers, but that sounds pretty far fetched. I don't think the chipset drivers "know" when a new program is run. Even if they do, that would add a delay each time a program is started. Not to mention the whole translation aspect.
 
Jul 27, 2020
20,921
14,496
146
Even if they do, that would add a delay each time a program is started. Not to mention the whole translation aspect.
I think Rosetta was the same at first program execution. Subsequent executions were faster.




I don't see why x86 can't do the same while moving to a better ISA without x86 limitations.
 

gdansk

Diamond Member
Feb 8, 2011
3,276
5,186
136
Look up WoW64. Microsoft can handle running binaries with different instruction sets in that layer.

That said there is almost no point to remove the x86 encoding since x64 simply builds on it.

And there is no point adding a fixed size "version of x86." That is not happening. It would be done by switching to ARM. No point to add another RISC variant that complicates the front end. Or if it's the only instruction set supported then it requires writing another JIT and another layer for running x64 code in the instruction set (like they have already done for ARM, so just use that).
 

GTracing

Member
Aug 6, 2021
168
396
106
I think Rosetta was the same at first program execution. Subsequent executions were faster.

View attachment 105405


I don't see why x86 can't do the same while moving to a better ISA without x86 limitations.
The OS can run a translation layer, yes, but not the chipset driver.
 

Nothingness

Diamond Member
Jul 3, 2013
3,137
2,153
136
I think Rosetta was the same at first program execution. Subsequent executions were faster.
Yes, Rosetta2 is a mix of static recompilation and dynamic translation. That's the FX!64 of the 21st century.

I don't see why x86 can't do the same while moving to a better ISA without x86 limitations.
Rosetta2 only takes care of user code which is easier to run quickly than having to emulate system code (OS/kernel drivers).
 
Jul 27, 2020
20,921
14,496
146
The OS can run a translation layer, yes, but not the chipset driver.
Don't the drivers run in Kernel mode? If so, then they can do what the OS can do, unless the OS is limiting the drivers, even ones from the CPU manufacturer that should be rock solid by design.
 

GTracing

Member
Aug 6, 2021
168
396
106
Don't the drivers run in Kernel mode? If so, then they can do what the OS can do, unless the OS is limiting the drivers, even ones from the CPU manufacturer that should be rock solid by design.
That's not how that works. Windows provides an API. Drivers can only implement functionality that the API allows.

 
Reactions: igor_kavinski

Gideon

Golden Member
Nov 27, 2007
1,842
4,380
136
Interesting!

AMD will provide an update on their long-term open-source firmware strategy at the Open-Source Firmware Conference in September, focusing on their OpenSIL project, which is expected to eventually replace AGESA on future Ryzen and EPYC platforms. They aim for OpenSIL to be ready for production by 2026, spanning both client and server platforms.

If I'm reading this correctly Zen 6 will have open-source firmware instead (or more likely as an alternative to ) AGESA.
 

static shock

Member
May 25, 2024
133
60
61
I will bet the Medusa Ridge performance: 60% faster at same clock than Zen5 on SPECint rate-1. Doubled L2 and L3 sizes.
Same dual 4-wide decode plus same 6FPU 6ALU, 2nm TSMC. IoD is 3nm TSMC. IPC gains are from digging IPC on this wide ALU digging better IPC at FPU. I bet on AMD widening less this time and digging more IPC. Zen5 diagram already shows how wide and huge the IPC uplift must have been on Zen5.

Launch at Q1/26

I will cram this IPC gain to just 60% more IPC over Zen4.
 
Reactions: SteinFG
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |