New Zen microarchitecture details

Page 125 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
x86 chips aren't intrinsically less efficient than ARM chips.
In some ways x86 is much more inefficient:

agner in 2009 said:
The current x86 instruction set is the result of a long evolution which has involved many short-sighted decisions and patches.

agner in 2009 said:
The problem with the overcrowded instruction code space has been dealt with from time to time by several workarounds and patches. Today, there are far more than a thousand different instruction codes, and many of them use complicated combinations of escape codes, prefix bytes, and postfix bytes to distinguish the different instructions. This makes instructions longer than necessary and, more importantly, it makes the decoding of the instructions complicated.

agner in 2009 said:
PR considerations often have more weight than technical considerations. Currently, we have far more than a thousand instructions in the x86 instruction set. This is more than any programmer can memorize. It would be better to have fewer instructions and make each instruction more flexible so that it would cover more applications. But there is an obvious PR value in announcing that the newest processor has a bazillion new instructions. The weird and sometimes deliberately misleading names of the instruction set extensions are obviously decided by PR people rather than by technicians.

agner in 2009 said:
Unfair competition. The market often favors Intel instructions rather than AMD or VIA instructions for compatibility reasons. The latter companies can only copy new Intel instructions with a delay of a few years. AMD does not have access to a fair share of the opcode space to use for their innovations. Historically, AMD has used small corners of the opcode space to avoid the risk that Intel might assign another instruction to the same code. There is no part of the huge VEX opcode space that AMD can safely use without permission from Intel.

Feedback from users is always too late. When a new instruction set is published, there is often public criticism, but then it is too late to change anything. The secrecy around innovations makes it impossible to involve the larger software community in the decision making process.

PR considerations often have more weight than technical considerations.

Sub-optimal solutions. Some instructions could be implemented better at no extra costs. For example, the PANDN and PALIGNR instructions would be more efficient if the two operands were swapped. A public discussion would have corrected such lapses before it was too late.

The competition in the microprocessor market has certainly been good for the price and performance of CPUs, but it has not been good for the compatibility. In May 2009, AMD published a revision of their plans where they modified the coding scheme for better compatibility with AVX. In addition to a full support of AVX, the revised AMD plan contains most of the original SSE5 instructions under the new name XOP and with the new coding scheme. Unfortunately, Intel had changed their plans in the meantime! In December 2008, Intel published a revision of their plans which involved a change of the coding of the fused multiply-and-add (FMA) instructions. Now it was too late for AMD to change their design once more, so the first AMD processors with FMA will follow the premature Intel specification rather than Intel's later revision. It is difficult to obtain compatibility when you are following a moving target.

Most programmers don't care what is going on at the machine code level, so they can't see all the ridiculous consequences that this war has.
 
Last edited:

cdimauro

Member
Sep 14, 2016
163
14
61
First, I don't think that x86 thousand instructions. I don't know how Agner has counted them.

Second, it's mostly focused on instructions decoding, but from several years x86 have a uop cache which let x86 cores to skip several pipeline stages, even more than ARM ones, with great benefits about power consumption and performance.

Third, many "legacy" of x86 isn't normally used. Most of the code uses a subset of all available instructions.

Finally, I've found the ExtremTech article.
 

KTE

Senior member
May 26, 2016
478
130
76
Moreover, why are you posting intel marketing slides in a thread about AMD Zen? Are you just here to thread crap and push an agenda, along with your absurd marketing slides?
Yes, that must be it !

Uh, what mobile phone chip contracts would those be? From what I've seen over the last 10 years
Stick with us, champ. We were talking about future litho...

For instance:
http://fortune.com/2016/08/17/intel-arm-10nm/
http://www.eetimes.com/document.asp?doc_id=1330311

Sent from HTC 10
(Opinions are own)
 

bjt2

Senior member
Sep 11, 2016
784
180
86
x86 chips aren't intrinsically less efficient than ARM chips. There's a nice ExtremeTech article that analyzed it.

Could you please post the link? I would eagerly read it... EDIT: you posted it...

x86 isn't that easy to emulate.

PowerPC is a relatively easy architecture to emulate. x86 and x64 are different beasts.

I graduated in Naples university in 2001 and in the late '90s and early 2000 in the multimedial alley there were the iMacs with an emulator software, virtual PC if I remember well... They had windows 95 in it and judging from the low performance, a full blown x86 PC and related BIOS and peripherals, were emulated and not translated a la Rosetta style, or JIT, in java jargon...
Now we have much powerful CPUs and JIT compiler, and anyway if they switch to ARM CPUs, the x86 softwares will be phased out very fast as they did with power PC softwares... And new softwares will use tha fat binary technology that should work also on x86 machines with x86/ARM executables...
 
Last edited:

cdimauro

Member
Sep 14, 2016
163
14
61
Sure. That's what I expect.

Regarding Rosetta, it ran at about half the speed of a PowerPC. And, as I said, PowerPC is easier to emulate.

There's no doubt that current processors can run a JITer much better, but nevertheless x86/x64 is much more difficult to emulate, and I want to see how fluent is the emulation.
 

ElFenix

Elite Member
Super Moderator
Mar 20, 2000
102,358
8,447
126
this has been exceedingly interesting the last couple pages but maybe you guys should break it into its own thread?
 

bjt2

Senior member
Sep 11, 2016
784
180
86
Finally, I've found the ExtremTech article.

I think i already read this article... Anyway this confirms that an x86 CPU with same or better efficiency than an ARM CPU can be done...

Sure. That's what I expect.

Regarding Rosetta, it ran at about half the speed of a PowerPC. And, as I said, PowerPC is easier to emulate.

There's no doubt that current processors can run a JITer much better, but nevertheless x86/x64 is much more difficult to emulate, and I want to see how fluent is the emulation.

I want too... Anyway i would love information about the x86 layer in ITANIUM CPUs... From x86 to VLIW was a long shoot...
 

bjt2

Senior member
Sep 11, 2016
784
180
86
this has been exceedingly interesting the last couple pages but maybe you guys should break it into its own thread?

Maybe you are right, but we are not completely off topic... We tried to estimate Zen clocks from ARM clocks and informations about ARM vs x86 efficiency is the missing ingredient in the recipe...
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
unlocked multiplier: if the chipset supports it

Thats not what i wanted to read. Same crap with Intel, this will be a fail if we cannot OC with every chipset.
 
Mar 10, 2006
11,715
2,012
126
Thats not what i wanted to read. Same crap with Intel, this will be a fail if we cannot OC with every chipset.

Why? It's good old fashioned segmentation at work. Want the additional features? Buy the better motherboards with better chipsets. This helps AMD's MB partners more than anything else, gives them incentive to design good enthusiast boards.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
Why? It's good old fashioned segmentation at work. Want the additional features? Buy the better motherboards with better chipsets. This helps AMD's MB partners more than anything else, gives them incentive to design good enthusiast boards.

Im not an AMD OEM Motherboard partner, as a consumer i want to OC with every hardware available
 

DrMrLordX

Lifer
Apr 27, 2000
22,035
11,620
136
Dude I OCed the hell out of my old Sempron 2800+ with a Chaintech VNF3-250. It wasn't bottom-of-the-barrel but it was a budget fighter, that's for sure. 720 MHz OC on a budget part, yeehaw. Not gonna see those days again.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Thats not what i wanted to read. Same crap with Intel, this will be a fail if we cannot OC with every chipset.

Why? It's good old fashioned segmentation at work. Want the additional features? Buy the better motherboards with better chipsets. This helps AMD's MB partners more than anything else, gives them incentive to design good enthusiast boards.

I wonder how this might effect boards that are probably going to be chipset-less (Eg, X300).
 
Last edited:

IllogicalGlory

Senior member
Mar 8, 2013
934
346
136
+65% per-clock comparison with PD, if true it might be quite a leap compared to +40% against Excavator, because there might be <20% difference between PD and EXV.
The whole post is a joke (as in, it's actually a gag). There's no actual data. He's saying that the webinar had no information in a cute way. It's still 40% over Excavator based on internal estimates.
 
Reactions: cytg111

Tuna-Fish

Golden Member
Mar 4, 2011
1,475
1,978
136
A single, self written, znver1 optimized small loop fitting in the uop$ with a L1D$ friendly working set for those assuming an "up to" statement.

If I'm allowed to write a loop that tries to be pessimal on BD, I think I could make it be ~10-20 times faster on Zen (or Intel, k10, or whatever). Just write one byte to every cache line in a tight loop with no other instructions.
 

Elixer

Lifer
May 7, 2002
10,371
762
126
The most interesting thing that pops out in this interview is this:
Papermaster: We’re committed to open source software. You look at our microprocessor, we have an LLVM open source compiler to optimize the performance you get out of the CPU. When you look at accelerators, GPUs, we took our stack and put in open source. If you go to www.gpuopen.com you’ll see the software and the tools it takes to accelerate using our Radeon technology.

One does have to wonder, is the reason for the 40% increase in IPC because the internal tests they did are with a LLVM compiler that is optimized for Zen?

If that is the case, what would be the IPC increase for code that hasn't been optimized for Zen?
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |