New Zen microarchitecture details

Dresdenboy · Oct 6, 2016

Phynaz said:
One guy can't design a CPU. It takes thousands of people.

Oh really?

See here: http://www.megaprocessor.com/progress.html

One guy.

SarahKerrigan · Oct 6, 2016

CentroX said:
There must be some kind of unknown advantage with zen though. It is designed by keller afterall.

Assuming processors are good or bad based on the amazing effort of a single engineer is misguided.

Nothingness · Oct 7, 2016

Dresdenboy said:
Oh really?

See here: http://www.megaprocessor.com/progress.html

One guy.

Or this: https://www.parallella.org/2016/10/05/epiphany-v-a-1024-core-64-bit-risc-processor/ There's a report about Parallella that details the work done. Impressive.

But these are exceptions. The CPU design teams I have worked in are typically a few dozens of people working on the micro architecture and RTL dev. Then you have validation teams, people that define the ISA, and so on. So not several thousands, but several hundreds if you look at the big picture is what you need for complex CPU.

I never worked at Intel where, as far as I know, things are different, more structured (an architecture group feeds two teams, one doing RTL work, the other doing validation) and probably require a lot of people. This might also be one of the reasons Intel is less agile than the competition (but also tends to deliver better quality).

SpaceBeer · Oct 7, 2016

And as I know, Keller was project manager/team leader on Zen project, not design engineer or something. Though I'm sure he had few good inputs for Zen design

Toettoetdaan · Oct 7, 2016

CentroX said:
There must be some kind of unknown advantage with zen though. It is designed by keller afterall.

SpaceBeer said:
And as I know, Keller was project manager/team leader on Zen project, not design engineer or something. Though I'm sure he had few good inputs for Zen design

I'm just hoping he transferred some of the ideas of the Apple Ax(X) cores to the Zen and K12 core designs. As those cores have some impressive single-thread results: http://browser.primatelabs.com/ios-benchmarks
(Calculating it to IPC doesn't make it any less impressive, only that the OS isn't comparable)

Phynaz · Oct 7, 2016

Dresdenboy said:
Oh really?

See here: http://www.megaprocessor.com/progress.html

One guy.

That's one hell of a hobby!

Phynaz · Oct 7, 2016

Toettoetdaan said:
I'm just hoping he transferred some of the ideas of the Apple Ax(X) cores to the Zen and K12 core designs. As those cores have some impressive single-thread results: http://browser.primatelabs.com/ios-benchmarks
(Calculating it to IPC doesn't make it any less impressive, only that the OS isn't comparable)

Apple would sue AMD into oblivion if any of Apple's IP ended up in Zen. It's the norm for employment contracts to contain provisions that say something along of the lines of "anything you invent while working for us we own".

Toettoetdaan · Oct 7, 2016

Phynaz said:
Apple would sue AMD into oblivion if any of Apple's IP ended up in Zen. It's the norm for employment contracts to contain provisions that say something along of the lines of "anything you invent while working for us we own".

I can image that Apple would sue everyone for using transistors in a chip... But I'm not saying that AMD needs to copy stuff from others, just look at good examples

However, going back in time, this post from @raghu78 1,5year ago, could have some truth in it.

raghu78 said:
I am confident that Keller has gone for a Apple Cyclone like design given that he architected Cyclone and that Zen and K12 are going to be similar. In fact there are few similarities between Zen/K12 and Cyclone

http://www.anandtech.com/show/7910/apples-cyclone-microarchitecture-detailed

Zen has 6 integer pipelines and 3 FP pipelines. Cyclone has 4 integer ALU pipes, 2 Load/Store and 3 FP pipes. I am sure Zen has a similar design. I am also thinking that Zen can execute and retire up to 6 instructions/micro-ops per clock. As for decode and issue I think Zen might be 4 - 5 while K12 will be 5-6. I am willing to bet Zen is going to be very high IPC and competitive with Skylake.

Most of the discussion about this already happened back then, but it is an interesting thought.

Dresdenboy · Oct 7, 2016

Toettoetdaan said:
I can image that Apple would sue everyone for using transistors in a chip... But I'm not saying that AMD needs to copy stuff from others, just look at good examples

However, going back in time, this post from @raghu78 1,5year ago, could have some truth in it.

Most of the discussion about this already happened back then, but it is an interesting thought.

That high level stuff should be irrelevant (state of the art). Or did someone got sued for building a car with 6 tyres instead of 4?

It gets interesting at a level where we have no public info on Cyclone's side.

KTE · Oct 7, 2016

bjt2 said:
If you think that in a process with 1/6 of leakage, -20% capacitance, 1.5x conductance and 20% less vcore, you can't do an 8 core CPU in 95W with over 3.2GHz, well, I am not trying anymore to convince you... I quit. BELIEVE (this is a correct term here) what you want...

-Those figures are from where? I would love to see some evidence if you have any...

-1/6 leakage, etc... at what frequency?

-These process metric figures have always scaled as such or better. Look at Intels. They are transistor level figures. They have never directly translated from transistor level to chip level as you are incorrectly assuming.

-Less Vcore means nothing on its own. It's a function of smaller processes as current isn't scaling with it (due to leakage problems). Power used to scale do to Vcore and Current scaling. Now, best case, they lower one and raise the other, so current requirements are growing much higher than they used to be. This limits clockspeeds/heat.

-Leakage is in Amps i.e. current. And a chip has different types that constitute power draw to reach anywhere near the published TDP values. Even a hugely exaggerated 1/6 of... 20mA leakage idling is only 3.3mA, in leakage power.

-LPP is traditionally for good leakage power control and lower clocks, using denser cells... For mobile. That's by design.

-Where do you get your gate delay info from?

-Gate delays can hint at clocks only IF the process is maturing, yielding and performing well.

-BD was a speed demon design. Are you saying Zen is a design for higher clocks?

Looking at the 40% IPC claim, this cannot be the case as lower delay per stage would mean increased pipeline stages so higher instr latencies. That would mean lower IPC (more gate depth). It's either one or the other. You can't have both. I would assume at least a 20 FO4 logic depth delay in the critical timing parts for Zen. Willamette was 16ish.

RWT Bulldozer said:
AMD was unwilling to share any specifics on gate delays, although some discussions at comp.arch suggest a target of ~17 gate delays vs. ~23 for Istanbul

RWT Barcelona said:
The device described at ISSCC was targeted at 2.2-2.8GHz at 1.15V, while operating within a 95W maximum thermal envelope. AMD claims that their 65nm process has a 15ps FO4 inversion delay, which suggests that Barcelona’s pipeline is just a little less than 24 FO4 delays.

Yea, we can all remember this 1.15V 2.8GHz 95W CPU

-If they fail to get higher than 3.2GHz luanch, what will that mean? Is that a disaster in your books?

Abwx said:
It s clearly for the whole frequency range,

Where's the evidence for that?

The rest of your post was compete pseudoscience garbage. Like saying Mars is looking black today so it will definitely be 50C in Berlin.

Sent from HTC 10
(Opinions are own)

Abwx · Oct 7, 2016

KTE said:
-1/6 leakage, etc... at what frequency?

Leakage is a function of temp and voltage, in the case that interest us if leakage is 1/6 at say 0.9V it will be also 1/6 at 1V and so on for the full operating range..

bjt2 · Oct 7, 2016

KTE said:
-Those figures are from where? I would love to see some evidence if you have any...

-1/6 leakage, etc... at what frequency?

-These process metric figures have always scaled as such or better. Look at Intels. They are transistor level figures. They have never directly translated from transistor level to chip level as you are incorrectly assuming.

-Less Vcore means nothing on its own. It's a function of smaller processes as current isn't scaling with it (due to leakage problems). Power used to scale do to Vcore and Current scaling. Now, best case, they lower one and raise the other, so current requirements are growing much higher than they used to be. This limits clockspeeds/heat.

-Leakage is in Amps i.e. current. And a chip has different types that constitute power draw to reach anywhere near the published TDP values. Even a hugely exaggerated 1/6 of... 20mA leakage idling is only 3.3mA, in leakage power.

-LPP is traditionally for good leakage power control and lower clocks, using denser cells... For mobile. That's by design.

-Where do you get your gate delay info from?

-Gate delays can hint at clocks only IF the process is maturing, yielding and performing well.

-BD was a speed demon design. Are you saying Zen is a design for higher clocks?

Looking at the 40% IPC claim, this cannot be the case as lower delay per stage would mean increased pipeline stages so higher instr latencies. That would mean lower IPC (more gate depth). It's either one or the other. You can't have both. I would assume at least a 20 FO4 logic depth delay in the critical timing parts for Zen. Willamette was 16ish.

Yea, we can all remember this 1.15V 2.8GHz 95W CPU

-If they fail to get higher than 3.2GHz luanch, what will that mean? Is that a disaster in your books?

Where's the evidence for that?

The rest of your post was compete pseudoscience garbage. Like saying Mars is looking black today so it will definitely be 50C in Berlin.

Sent from HTC 10
(Opinions are own)

1/6 leakage was from the comparison of NEON FPU implemented on 28nm BULK HP (the same used for XV) and 14nm FF LPP, at the same 330mW total. Leakage went from about 100mW (on 330mW total) to 18mW (on 330mW total). At this power the NEON FPU went from 1.something GHz at about 1V to 2.41 GHz at about 0.9V. I don't have a link at hand and you can search yourself.
For the other figures, they were given from another user in this thread few pages ago... Maybe you missed them... I don't know the source, but seems reasonable, given the HUGE differences between a BULK process (gate on one side, very thin conducting layer) and a fin fet process (gate on three sides, almost fully conductive channel)... It is not necessary to be an electronic engineer to predict more transconductance (if you know what it is)... And lower capacitance is normal if you make all smaller: the capacitance is proportional to transistor area...

For the FO4 again... If I increase latencies and stage number, i had to be an awful engineer if I increase also the FO4...
The fact that a low FO4 INTEL design had poor IPC, was due to two little 16 bit dual pumped ALU (with 64 bit code...Awful), few FPU pipeline and lack of L1i instruction caches, with too little L0 uop cache... And other things that maybe i have forgotten.

It's perfectly feasible to do a low FO4 and medium IPC design. Take an high IPC design, break the stages, add registers and buffers, and if the branch prediction is good enough, you have lost only a few of IPC... Obviously there is a limit: each stage has a penality of 2.5FO4 at least, so if you double the stages, you don't get half FO4... Generally never go below 13 FO4 and 17 is the optimal in performance/watt, according to an IBM research paper, published some years before Bulldozer design... Maybe they take advice from this paper... But they put too few ALUs and a too bad cache system...

cdimauro · Oct 7, 2016

bjt2 said:
The fact that a low FO4 INTEL design had poor IPC, was due to two little 16 bit dual pumped ALU (with 64 bit code...Awful), few FPU pipeline and lack of L1i instruction caches, with too little L0 uop cache... And other things that maybe i have forgotten.

Just to point-out, support for 64-bit code was added on P4's Prescott, which had 32-bit ALUs, as well as double L1 data cache and double L2 cache.

KTE · Oct 8, 2016

Abwx said:
Leakage is a function of temp and voltage, in the case that interest us if leakage is 1/6 at say 0.9V it will be also 1/6 at 1V and so on for the full operating range..

And frequency.

Because higher frequency requires higher voltage/current->causes more leakage->more heat.

Sent from HTC 10
(Opinions are own)

bjt2 · Oct 8, 2016

cdimauro said:
Just to point-out, support for 64-bit code was added on P4's Prescott, which had 32-bit ALUs, as well as double L1 data cache and double L2 cache.

And indeed had better performance than northwood... At the times i developed an FFT based filter program... I had a 2.8 Northwood with HT. Splitting the calculations on two threads gave me only a 7% increment in performances... This tells us that northwood had insufficient pipeline numbers... Nowadays the HTT/SMT gain is much more...

Abwx · Oct 8, 2016

KTE said:
And frequency.

Because higher frequency requires higher voltage/current->causes more leakage->more heat.

Not frequency, only voltage..

You can supply a gate with say 1V and it will drain a current, if you dont clock the circuit and that frequency is hence 0 (since it s static) it will leak the same that if it s clocked at say 1GHz...

bjt2 · Oct 8, 2016

Abwx said:
Not frequency, only voltage..

You can supply a gate with say 1V and it will drain a current, if you dont clock the circuit and that frequency is hence 0 (since it s static) it will leak the same that if it s clocked at say 1GHz...

His point is that if and when you need to clock lower a transistor, you can also lower the Vcore... It would be inefficient to not do so...

Abwx · Oct 8, 2016

bjt2 said:
His point is that if and when you need to clock lower a transistor, you can also lower the Vcore... It would be inefficient to not do so...

Of course but if the circuit is not clocked the leakage will still remain, so it s independent of frequency, now that one need to increase voltage, and hence leakage, to clock at higher frequency is another matter..

bjt2 · Oct 8, 2016

Abwx said:
Of course but if the circuit is not clocked the leakage will still remain, so it s independent of frequency, now that one need to increase voltage, and hence leakage, to clock at higher frequency is another matter..

Also the voltage domain are not so much, so if we e.g. clock gate an FPU during INT only calculation, that FPU is fed with high Vcore anyway and so it leak more...

Dresdenboy · Oct 8, 2016

Abwx said:
Not frequency, only voltage..

You can supply a gate with say 1V and it will drain a current, if you dont clock the circuit and that frequency is hence 0 (since it s static) it will leak the same that if it s clocked at say 1GHz...

This is the direct voltage -> leakage relatoinship. KTE actually added the frequency -> voltage(min) relationship. Both make f -> V_min -> I_leak.

Abwx · Oct 8, 2016

bjt2 said:
Also the voltage domain are not so much, so if we e.g. clock gate an FPU during INT only calculation, that FPU is fed with high Vcore anyway and so it leak more...

Certainly but the leakage is very low with finfets, typicaly leakage is about 1/10^6 times the switching current, it s low but it imply all the transistors that are in the CPU while switching does not, hence leakage will be something like 20-30% of the losses with planar transistors FI.

Notice that power gating is not that efficient because the gating transistors will still leak and the result is to halve the leakage of a given circuit only when it s not functional, in this respect clock gating can be more efficient because it s much faster than power gating who can be applied only over a lot of cycles.

Edit : Mark Papermaster talking about Zen and finfets :

The scalability you have with finFETs is really quite a large range because it has very little leakage. When you turn off your clocks—when you are not doing active work—you can get very close to nil energy, and leakage is lower than previous technologies.

KTE · Oct 9, 2016

bjt2 said:
1/6 leakage was from the comparison of NEON FPU implemented on 28nm BULK HP (the same used for XV) and 14nm FF LPP, at the same 330mW total. Leakage went from about 100mW (on 330mW total) to 18mW (on 330mW total). At this power the NEON FPU went from 1.something GHz at about 1V to 2.41 GHz at about 0.9V. I don't have a link at hand and you can search yourself.

-Low IPC cores in mW is a completely different matter to dealing with high IPC Cores at 1-20W.

-mW gain is noise at this level with a CPU at 95W.

-Such gains also do not scale with size/speed/power. Every process has a sweet range.

-LP low IPC chips at low GHz are making the same frequency/efficiency gains which x86 made 10-20years ago. That's the low hanging fruit, and was A LOT easier to attain than at today's x86 level. HP processors made these same gains from 1980-2000. But those gains are COMPLETELY uncomparable to today's uarch/process changes.

Hence why the A15 is higher/near Bobcat power with similar performance but FAR lower efficiency than XV/SKYL. It's a bit of of a DUH moment for x86.

-Papermaster touted same energy per cycle at much higher performance very recently. That also goes against your correlation here.

Whenever he was asked about 14nm, he deflected onto 7nm and Zen+ talks. That's saying something.

In contrast, whenever he and (Ruth) were asked about Zens performance, it really seemed like the chip is an enthusiast Broadwell competitor. They explicitly said this.

-One other aspect you're missing from the equation is that extra energy budget is used to buy a lot more transistors/cache.

-Another critical aspect being overlooked is that smaller/tighter components -> increased localized heating -> increased leakage. FAR more hotspots limiting frequency and power. All the chip might be cool but one hotpot causes thermal runaway. Add resistance in there too now for this vicious cycle. This greatly limits clocks/power.

For the other figures, they were given from another user in this thread few pages ago... Maybe you missed them... I don't know the source, but seems reasonable, given the HUGE differences between a BULK process (gate on one side, very thin conducting layer) and a fin fet process (gate on three sides, almost fully conductive channel)... It is not necessary to be an electronic engineer to predict more transconductance (if you know what it is)... And lower capacitance is normal if you make all smaller: the capacitance is proportional to transistor area...

Yes, but that is all isolated FET level performance in unknown conditions vs. mass production +1billion tranny chip under standard conditions. The two do not correlate directly. There are far too many other factors, too.

Intel 45nm to 32nm said:
The decreased oxide thickness and reduced gate length enables a >22% transistor performance gain in terms of drive current.
These transistors provide the highest drive currents and tightest gate pitch reported in the industry. Leakage current can also be optimized for a >5X reduction in leakage over 45nm for NMOS transistors, and >10X reduction inleakage for PMOS transistors.

Does not correlate to the performance metrics of a full equivalent processor.

For the FO4 again... If I increase latencies and stage number, i had to be an awful engineer if I increase also the FO4...

Yea if you add pipe stages in the critical timing logic, the latencies automatically increase. If you began with a 23 FO4 design for instance, you would add stages/latch to increase clocks with a bit of timing overhead, which means less logic per pipeline stage, so FO4 logic depth would decrease per stage (lower than 23). Remember also the extra inverters for buffering/saving clock and data signals at each stage. That's the IPC vs Frequency comprise.

Depth latencies are in ps depending on the GHz... 2.5 FO4 is what is seen as the minimal for a flop, around 5 for a P4 ALU for example. Research on optimal FO4 at current nodes tends to be outdated being based on Willamette and Alpha examples using SPEC95/2000 (Alpha achieved 27.8x SPECInt 95 perf improvement in 6 years with 8.3x improv in cycle time and 3.5x in architecture).

The IBM study you are quoting (A. Hartstein et al?) using time for instruction execution to define performance was again using outdated loads and architectures. It also assumed an infinite cache model and used loads with a lot of ILP and minimal stalls. They also found the optimal to be very different for the given workloads (10-28 stages for traditional vs modern vs old SPEC).

It's perfectly feasible to do a low FO4 and medium IPC design.

What is low FO4 and what is medium IPC?

Because for us, high and low IPC are relative to the competition and era. Skylake would be high right now. And IPC is also heavily influenced by caches, trace/uop caches and branch prediction.

Abwx said:
Certainly but the leakage is very low with finfets, typicaly leakage is about 1/10^6 times the switching current, it s low but it imply all the transistors that are in the CPU while switching does not, hence leakage will be something like 20-30% of the losses with planar transistors FI.

Leakage has many different components. Static and dynamic just to oversimplify.

Leakage has also an extra component with the arrival of fins.

Power/leakage differs with temperature, voltage and frequency.

0 frequency transistor models are useless in the real world, AKA Ioff. nMOS/pMOS are always characterized for Idsat/Ioff and switch frequencies tested. That's the whole point of modelling process curves.

Where are you getting 14nm FF leakage figures from?

Sent from HTC 10
(Opinions are own)

KTE · Oct 9, 2016

Abwx: Ioff leakage is a small portion of static power at these nodes to begin with.

Sure, it becomes a bigger problem when you have +8cores.

Interesting quote from Papermaster. Thanks, I missed that.

I'm going to ignore the leaks for now. AMD is talking about K8 and being back at 20% TAM in the server world and winning the high performance DT market. They're saying Zen will be competitive top to bottom, price, performance and power. From what AMD is saying, explicit words, their tone and language, I'm expecting minimum 40% IPC increase rather than maximum or average and I am NOW expecting something past the IVB range of performance.

Either that, or AMD talks complete bullshit. Hello Q1

Sent from HTC 10
(Opinions are own)

bjt2 · Oct 9, 2016

KTE said:
-Low IPC cores in mW is a completely different matter to dealing with high IPC Cores at 1-20W.

-mW gain is noise at this level with a CPU at 95W.

-Such gains also do not scale with size/speed/power. Every process has a sweet range.

This was an ARM presentation on the 14nm FF, with an isolated NEON FPU of an A53 (or 57, don't remember). Obvioysly the leakage is proportional to the transistors number and the isolated NEON FPU is at most 1/6 of a Zen CPU. Even if 18mW are trascurable, we are talking passing from 30% of power wasted in leakage (100mW on 330mW), to 5% (18mW on 330mW)... On a 95W CPU this is from 30W wasted in leakage, to 5W, with the 25W that can be invested in more clock...

KTE said:
-LP low IPC chips at low GHz are making the same frequency/efficiency gains which x86 made 10-20years ago. That's the low hanging fruit, and was A LOT easier to attain than at today's x86 level. HP processors made these same gains from 1980-2000. But those gains are COMPLETELY uncomparable to today's uarch/process changes.

Hence why the A15 is higher/near Bobcat power with similar performance but FAR lower efficiency than XV/SKYL. It's a bit of of a DUH moment for x86.

A53/A57 are ASICs with at least 30 FO4 per stage. This is the reason for this lower clock... Anyway I found a graph that projected to up to 4.3GHz the consumption of this NEON FPU, being about 1W. Even if Zen draw 10 times this FPU and is done with 30 FO4, should draw 10W/core at 4.3GHz...

KTE said:
Yea if you add pipe stages in the critical timing logic, the latencies automatically increase. If you began with a 23 FO4 design for instance, you would add stages/latch to increase clocks with a bit of timing overhead, which means less logic per pipeline stage, so FO4 logic depth would decrease per stage (lower than 23). Remember also the extra inverters for buffering/saving clock and data signals at each stage. That's the IPC vs Frequency comprise.

Depth latencies are in ps depending on the GHz... 2.5 FO4 is what is seen as the minimal for a flop, around 5 for a P4 ALU for example. Research on optimal FO4 at current nodes tends to be outdated being based on Willamette and Alpha examples using SPEC95/2000 (Alpha achieved 27.8x SPECInt 95 perf improvement in 6 years with 8.3x improv in cycle time and 3.5x in architecture).

The IBM study you are quoting (A. Hartstein et al?) using time for instruction execution to define performance was again using outdated loads and architectures. It also assumed an infinite cache model and used loads with a lot of ILP and minimal stalls. They also found the optimal to be very different for the given workloads (10-28 stages for traditional vs modern vs old SPEC).

If I remember well, here or on semi someone (Dresdenboy?) posted a ppt of the various FO4, with 2.5 being the stage penality for CUSTOM designs like AMD and INTEL's and up to 10 for ASICS like ARM, with ARM chips having 30-70 FO4, causing this so low clocks.

This 17 FO4 maybe is outdated, but Bulldozer is not too far away, otherwise i can not explain 4.3GHz on the awful 28nm BULK where ARM chips topped at 2.5GHz. If the latter has FO4 of 30 (the minimum for ARM) then we are in the same league...

KTE said:
What is low FO4 and what is medium IPC?

Because for us, high and low IPC are relative to the competition and era. Skylake would be high right now. And IPC is also heavily influenced by caches, trace/uop caches and branch prediction.

I was talking of medium IPC because I was supposing to start from an high IPC desing, break the stages in more pieces, with the goal of increasing clock and so losing some IPC for the longer latencies and so longer branch misprediction penalities... But if the branch predicition is good and we add only 2.5 FO4 per stage, a 17.5 FO4 per stage gives 15 FO4 for the logic and maybe something useful can be done...

bjt2 · Oct 11, 2016

Even if outdated, this paper has some interesting considerations...

This is the FO4 paper:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.79.2260&rep=rep1&type=pdf

Citing: "Specifically, in this paper we show that as dynamic power increases in importance, the optimal pipeline depth shifts to shorter pipelines."

So for a low power and/or high leakage design, an high FO4 is still acceptable. For an high dynamic power and/or low leakage (14nm has VERY low leakage) CPU, low FO4 is advisable.

Also max performance is biased toward very low FO4... If we don't care of power drawn, the best FO4 is (citing): "The Pentium 4 machine was studied by Sprangle and Carmean [7] using IPC degradation factors for adding cycles to critical processor loops. All of these studies report an optimum pipeline depth in the range of 8 to 10 FO4 inverter delays, including both logic delay and latch overhead."

EDIT: also "Specifically, in this paper we show that as dynamic power increases in importance, the optimal pipeline depth shifts to shorter pipelines. In a similar vein clock gating moves the optimum position to deeper pipelines. Increases in leakage power, which is becoming increasingly important in processor design, also pushes the optimum pipeline depth to larger values."
Anyway, if the leakage is low, increasing transistors number due to more pipeline stages, is not critical for power consumption... Maybe only for chip area...

EDIT2: here http://www.eecs.harvard.edu/~dbrooks/micro2002-optpipeline.pdf is another paper that concludes with 15+3 FO4 (17.5 in the case of AMD that has 2.5FO4 overhead). This target specifically out of order superscalar architectures. The first targeted in order architecture with the mention to the fact that there is not much difference in the outcome and in order is simpler.

New Zen microarchitecture details

Golden Member

Senior member

Diamond Member

Senior member

Junior Member

Lifer

Lifer

Junior Member

Golden Member

Senior member

Lifer

Senior member

Member

Senior member

Senior member

Lifer

Senior member

Lifer

Senior member

Golden Member

Lifer

Senior member

Senior member

Senior member

Senior member