New Zen microarchitecture details

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Mar 10, 2006
11,715
2,012
126
Yeah, it's easy to see the word ripoff as a negative. It's actually a compliment.

What's the limitation of the process?

I said "riff on" not "rip off."

A link that you may find helpful:

One source of the word riff is the music world, in which it's not uncommon for a musician to take a tune s/he has heard, and then perform a little variation on that tune. It happens all the time in some jazz sessions.

To "riff on Hamlet," then, would be for a person to take a line from the Bard and play with it, explore it, have some fun with it, look at it in various ways, explore it for levels of meaning and possible connections to other concepts and ideas, and so on.

http://english.stackexchange.com/questions/225963/what-does-a-riff-on-shakespeare-mean
 

DrMrLordX

Lifer
Apr 27, 2000
21,813
11,168
136
we all can admit Intel probably found the best optimized way to do things and AMD is simply adopting the same techniques.

Well, it's either that, or AMD decided that approaching the existing x86 market with something too different from Intel's offerings would probably bite them in the ass (again). People currently expect x86 CPUs to behave like Intel processors. There are too many circumstances where AMD CPUs from the last 5+ years "could have done better" but didn't thanks to stuff like FMA4, xOP, ST performance, blah blah blah.
 

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
I said "riff on" not "rip off."
You're right. My brain substituted a common expression for an uncommon one. I'll try to read more slowly.
DrMrLordX said:
but didn't thanks to stuff like FMA4, xOP, ST performance, blah blah blah.
The first two were due to AMD trying to adapt to Intel's changes concerning SSE5 specs, right?
 

KTE

Senior member
May 26, 2016
478
130
76
I'm still curious if Zen, if primarily server centric, uses any localized interconnect buffering. That would really up the ante.

Sent from HTC 10
(Opinions are own)
 

Shamrock

Golden Member
Oct 11, 1999
1,439
560
136
I happened among this article today (published Aug. 18th)

http://venturebeat.com/2016/08/18/amds-takes-biggest-jab-at-intel-in-years-with-zen-processor/

Some informative claims, and some we already know, but a few stuck out for me.

AMD says the “breakthrough performance” of Zen can challenge Intel’s fastest processor to date — the 10-core Broadwell-# processor.

It did mention the 3Ghz test with downclocking the Intel. So, is this claiming AMD will have more than 3 Ghz?
AMD said that power would be competitive, the frequency we saw would be even higher at production than what we saw, and production of what we saw could be produced at scale.
 

KTE

Senior member
May 26, 2016
478
130
76
^Even with equal IPC and equal power usage, AMD would need a base 4GHz Zen to challenge Intels fastest in 2016.

Forget 2017.

Sent from HTC 10
(Opinions are own)
 

Ancalagon44

Diamond Member
Feb 17, 2010
3,274
202
106
I for one quite like what has been revealed about Zen so far. I would definitely consider buying one depending on the price/performance ratio compared to Intel processors next year.

My prediction would be that, in terms of IPC, it won't beat Haswell but it will come close, depending on the benchmark. In floating point heavy code, it will be further behind than in integer heavy code. This might mean that game performance won't get the major performance boost we are hoping for.

A lot depends on the clockspeeds that they can achieve. If they can only manage around a 3GHz base clock for the top SKU, they are in trouble. If they can hit 3.4GHz base and maybe 3.7GHz turbo boost for the top SKU, I think they will be in with a very good chance. If I were AMD, I'd probably rather shoot for higher clockspeeds even if it harms efficiency. At least, definitely on the desktop CPUs.
 
Reactions: Gikaseixas

hrga225

Member
Jan 15, 2016
81
6
11
Sheduler per pipe still baffles me.Is this their solution for handling SMT?Anyone have any idea?

Edit: Or the high level diagram is confusing me?
 
Last edited:

deasd

Senior member
Dec 31, 2013
555
870
136
An interesting reading from WCCF. He implys the SMT mechanism is much more like IBM Power than Intel Nehalem and those successors, that said utilizing more resources when executing another thread. But I take it with a grain of salt:

http://wccftech.com/amd-zen-architecture-hot-chips/#comment-2855691209

We'll have to wait for benchmarks, but I'm growing ever more suspicious that Zen's SMT implementation is more like Power8's than anything Intel's produced so far. Intel's approach has been to allow a second thread to use unused CPU resources, but doesn't really over-provision those resources (a single thread can very nearly saturate the whole CPU). On Power8, they can scale up to 8 threads per core (Zen will only do 2), but they make that viable by doubling down on key CPU resources in the first place (Instruction Cache, rename registers, etc.). The end result is that the second SMT thread on Intel increases overall performance by around 15-25%, but on Power8 the second SMT thread can increase overall performance by around 60% in some workloads. In Layman's terms, Power8's 'hyperthreads' are more useful than Intel's.

AMD haven't talked about rename registers yet, but they have revealed that the instruction cache is 64KB per core; perhaps not-so-coincidentally, that's double the size of Skylake's instruction cache, and the same size as Power8's. The L1 Data cache is only 32K in all of these processors, but its rather odd in processor design to have your instruction cache be twice the size of your L1 data cache -- unless you have a good reason. There's only two reasons I can think of -- either that second thread chews through a lot more instructions than in competing SMT designs, or possibly the uOp Cache can spill to L1. Looking at the slide from HotChips that shows which CPU resources are exclusive, competitively shared, or arithmetically arbitrated, has me leaning toward the former, though they might not have overprovisioned CPU resources enough to match Power8 fully. There were also rumors months back about Zen doing some really novel things with SMT, which would seem to back that up.

The implication of that would be that Zen could run at a lower clockspeed than Intel's current Broadwell DE but still match in overall threaded performance (but perhaps giving up 10-15% single-threaded performance (not clock-normalized)). For the mainstream, they could release a quad-core CPU at similar clocks to Skylake, and outperform it in threaded workloads. In gaming workloads, since current consoles make 6-7 threads available to games, a quad-core Zen with 4 hyper-threads giving ~60% additional performance would give a lot bettter performance than a quad-core i7 with 4 hyper-threads giving ~20% additional performance. In fact, that Zen would would have a throughput comparable to 6-7 dedicated cores.

We won't know until someone does an architecture deep-dive or we have benches showing SMT gains much larger than intel's. But its looking increasingly likely from what I see.

But I partly agree with him that L1-inst is unreasonably as 2 times large as L1-data, this is very similar to Excavator, and even Bulldozer has 64:16 ratio of L1 inst/data, which have much more resources and even dedicated pipeline for another 'thread'.
 
Last edited:

hrga225

Member
Jan 15, 2016
81
6
11
But I partly agree with him that L1-inst is unreasonably as 2 times large as L1-data, this is very similar to Excavator, and even Bulldozer has 64:16 ratio of L1 inst/data, which have much more resources and even dedicated pipeline for another 'thread'.
Yes,that part is a given,but performance figures in conclusion are bit over the top.He forgets that Power is SMT4,ofcourse,firing 2nd thread gives huge boost in performance.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,422
1,759
136
Sheduler per pipe still baffles me.Is this their solution for handling SMT?Anyone have any idea?

The architecture of the schedulers has no impact on SMT. At that point, the uops coming in are already effectively anonymous -- the schedulers don't know or care from which thread they are from.

Having many 1-way schedulers has one big disadvantage -- each uop is assigned to scheduler on dispatch, and each scheduler can only execute one op per clock. Imagine a situation like:

1: add eax, ebx
2: add ecx, eax
3: add edx, eax

both 2 and 3 depend on the result of 1, but are independent of each other. If they all get assigned to a single scheduler, this piece of code takes 3 cycles to run, while a monolithic scheduler could run 2 and 3 simultaneously on different execution units. This disadvantage is not as bad as it sounds, as the simple solution of just spreading workload evenly across the queues is optimal most of the time. It does occasionally lose a cycle or two to scheduling losses, though.

The main advantage is that they are much easier to make larger, easier to clock higher and use less power. Intel Haswell has 60-entry scheduler that is shared between all instruction types. Zen has 6x14 entry for integer, plus unannounced separate queue for FP. The total Zen scheduler window is likely even bigger than the 97-instruction Skylake.
 
Reactions: Ajay and ElFenix

hrga225

Member
Jan 15, 2016
81
6
11
The architecture of the schedulers has no impact on SMT. At that point, the uops coming in are already effectively anonymous -- the schedulers don't know or care from which thread they are from.

Having many 1-way schedulers has one big disadvantage -- each uop is assigned to scheduler on dispatch, and each scheduler can only execute one op per clock. Imagine a situation like:

1: add eax, ebx
2: add ecx, eax
3: add edx, eax

both 2 and 3 depend on the result of 1, but are independent of each other. If they all get assigned to a single scheduler, this piece of code takes 3 cycles to run, while a monolithic scheduler could run 2 and 3 simultaneously on different execution units. This disadvantage is not as bad as it sounds, as the simple solution of just spreading workload evenly across the queues is optimal most of the time. It does occasionally lose a cycle or two to scheduling losses, though.

The main advantage is that they are much easier to make larger, easier to clock higher and use less power. Intel Haswell has 60-entry scheduler that is shared between all instruction types. Zen has 6x14 entry for integer, plus unannounced separate queue for FP. The total Zen scheduler window is likely even bigger than the 97-instruction Skylake.
Thank you!
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Sheduler per pipe still baffles me.Is this their solution for handling SMT?Anyone have any idea?

Edit: Or the high level diagram is confusing me?
The separate schedulers should be fine, as this simplifies the design, allows higher frequencies, and saves power. And integer/AGU ops usually have low latencies, so that each scheduler doesn't need to look at too many ops to fill a latency induced gap. FP instructions have longer latencies, thus the FPU has a unified scheduler.
 

hrga225

Member
Jan 15, 2016
81
6
11
The separate schedulers should be fine, as this simplifies the design, allows higher frequencies, and saves power. And integer/AGU ops usually have low latencies, so that each scheduler doesn't need to look at too many ops to fill a latency induced gap. FP instructions have longer latencies, thus the FPU has a unified scheduler.
Yes,I am aware of that(with little help from Tuna-fish and yours,like your blog btw.and majord's diagram)now.My confusion was also from mix up in terminology;dispatcher-sheduler.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
^Even with equal IPC and equal power usage, AMD would need a base 4GHz Zen to challenge Intels fastest in 2016.

Forget 2017.

Sent from HTC 10
(Opinions are own)
Yeah, 3.75 GHz assuming perfect scaling. I wish AMD really had a choice for Foundry. Zen looks like an excellent achievement by AMD with the potential for some solid wins if marketing does it's job. But, having come this far, AMD really needs to be able to push uArch updates through quickly to maintain pace. That, and, GFL needs to ramp up it's 14nm process for higher clocks as well. It's allot to ask, but it sure would be nice to have a competitive choice in CPUs. Can't wait till 1Q17 to see real benchmarks!
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
Now we have established some kind of agreement that a 95w tdp ~180mm2 cpu on a low freq process is not 2% faster than a 140w tdp 240mm2 die on a high freq process costing 1100 usd.

It would be a bit more interesting - to say the least - if we got some info about the efficiency.
The writes about it is very slim imo in relation to its crucial importance.
 
Reactions: Phynaz

Abwx

Lifer
Apr 2, 2011
11,172
3,868
136
It would be a bit more interesting - to say the least - if we got some info about the efficiency.
The writes about it is very slim imo in relation to its crucial importance.

Sous Blender, qui profite pleinement du multi-threading, Zen était légèrement devant tout en consommant un petit peu moins selon AMD

http://www.hardware.fr/news/14749/amd-dit-peu-plus-zen.html

Litteraly translated :

Under Blender, wich benefit fully from multithreading, Zen was slightly ahead while consuming a little less according to AMD.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
Yes, real benchmarks would be very usefully. I don't expect to get them yet on ES silicon, but we could get lucky of the next couple of months.
Throughput, in terms of an Apple-to-Apples comparison running SPEC compared to an 8 core BW-E (or Xeon) is what we ultimately need. At release, we'll get all sort of benches on various apps, games etc. So, the waiting continues.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |