AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Page 39 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

lolfail9001

Golden Member
Sep 9, 2016
1,056
353
96
As I said
"instant access to any data it needs without the need to copy the data around a few times first"
Not instant as in 0ms time

Instant stands for 0 or close to 0 time, not for "without the need to copy the data around a few times first".

Not to mention that if you won't avoid copying stuff into RAM, and if you XPoint as RAM.. well, it's going to be sloooow.

As for EDRAM: the reason it worked is because it was faster than plain RAM access. So, as cache, it could tank some bandwidth requirements (see Anandtech's article on Skull Canyon RAM scaling).
 

DrMrLordX

Lifer
Apr 27, 2000
22,000
11,560
136
Memcopy is only part of the equation. Latency is still an issue. If all Intel needed to bump up iGPU performance was the assurance that there would be no copy events between system RAM and framebuffer (or any caches), a full implementation of SVM being baked into graphics APIs to eliminate the need for framebuffer would be preferable. Then you'd have zero copy events after copying all necessary data into system RAM.

And, as an aside, with modern iGPUs, the idea that we need framebuffer anymore is a bit silly. Gen9 and GCN 1.2 (and later) should support SVM or SVM-like behavior. Graphics APIs intended to accommodate dGPUs need to be reworked for integrated graphics.
 

cdimauro

Member
Sep 14, 2016
163
14
61
There is no indication that Zeppelin will able to match, let alone better Broadwell-E. Based on just the figure released by AMD (40% higher IPC than Excavator), it should remain 10-20% behind in overall IPC
Interesting. Is there any source for the IPC values of such CPUs?
This is quite likely to happen, however not all "modern workloads" will use 256 bit instructions and in FP128 bit code BW lead it is at least debatable.
Well, looking at the Blender test, it seems that even in 128-bit FPU code (assuming that SSE code was used for it) Zen will not shine, despite its 4x units (compared to the 2x of Intel's CPUs).
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Interesting. Is there any source for the IPC values of such CPUs?

Well, looking at the Blender test, it seems that even in 128-bit FPU code (assuming that SSE code was used for it) Zen will not shine, despite its 4x units (compared to the 2x of Intel's CPUs).

You mean the IPC of Excavator?
You can see some figures from the Anandtech (Athlon X4 845) and Phoronix reviews.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,543
10,169
126
I dont hate intel at all. as a matter of fact I have bounced around for the last 2 years going from AMD 750K to 760K to Intel i7 920 to l5639 to x5650 to E5-2650 to FX-8350 then to E5-2670 and when that died I bought a AMD 5350 to hold me over while I am building my 6700K rig.

Wow, that's a lot of platforms to jump between in 2 years. Kind of like me, I guess. I like to experiment.
 
Mar 10, 2006
11,715
2,012
126
Thanks for the link. I'm not sure how confident I am in the source so I guess we can only guess now anywhere between 3 and 5 months for retail release.

My 2500k/motherboard has been acting up lately. I've replaced my primary SSD with a Samsung 480gb EVO and I'm still getting random shutdowns and freezes. I think after 5 years of running its just ready to retire so I'm looking to build at an minimum 6c/12t system around next Spring.

Sounds like the machine is crapping out Hopefully it will last until Kaby and Zen release so you can choose from both companies' offerings.
 

KTE

Senior member
May 26, 2016
478
130
76
Design wise I would recon the L2 caches on Zen are the most limiting factors for Fmax. If AMD would modify Piledriver to have similar L2 characteristics as Zen does, the resulting part would have Fmax of ~2.8GHz, instead of the usual ~4.7GHz Hopefully AMD has managed to improve their L2 caches in Zen, because they have been the first limiting factor since K7. Even the GCN GPUs suffer from this.
Are you saying that as an opinion, or based on usage/testing of Zen behind closed doors?

I fail to see clocks being hugely limited by a block of SRAM.

Sent from HTC 10
(Opinions are own)
 

VirtualLarry

No Lifer
Aug 25, 2001
56,543
10,169
126
I fail to see clocks being hugely limited by a block of SRAM.


I was watching a YouTube video about G3258 overclocking, and they indicated that the uncore multiplier (which controls the clock of the L3 cache, I believe), can be a limiting factor in overclocks, and often, you can clock the CPU Cores higher, if you lower the uncore clock multi.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Are you saying that as an opinion, or based on usage/testing of Zen behind closed doors?

I fail to see clocks being hugely limited by a block of SRAM.

Sent from HTC 10
(Opinions are own)

Your post makes very little sense to me, in any aspect.

Why would it be necessary to test Zen behind the closed door to know the limitations on the previous designs, from experience?
L1 & L2 caches, with few exceptions (e.g Bobcat) work at the core frequency. Can the cache speed scale infinitely or if there is a limit, what happens when it is reached?
 

KTE

Senior member
May 26, 2016
478
130
76
Your post makes very little sense to me, in any aspect.

Why would it be necessary to test Zen behind the closed door to know the limitations on the previous designs, from experience?
L1 & L2 caches, with few exceptions (e.g Bobcat) work at the core frequency. Can the cache speed scale infinitely or if there is a limit, what happens when it is reached?
So my question's answer is: It's your opinion and not inside knowledge. That's fine. I needed clarification as I didn't know if you had developed some cues by testing of any Zen samples.

Your question I'll answer: Sure, it can have a limit and can be a limiter. But I don't believe, as you stated that you do, that L2 is what will limit Zens frequency... Not until ~4GHz I don't anyway. If the FO4 is similar, it is possible for Zen to be hitting XV (and even Deneb) speeds, by design. Also,

Firstly, SRAM in seclusion is generally easiest to clock high at low volts. Hence why every new process is shown off using these. Intels 14nm bitcells were hitting 1.5GHz at 0.6V. However, the type of SRAM (4 MOSFET vs 6 or 7 or 8) will make a difference.

Secondly, I don't know if Zen will hit 3.2-3.8GHz within 9 months, or not, and I'm really not sure if 3.2-3.4GHz will launch. Indications are negative but they have delayed for a reason that would be substantial. I really don't know how good the process is, nor the design or the process learning curve/maturity level. Timing bugs and lower performing process are entirely possible at this stage of a new m.arch+new process. So far, all indications point to low clocks, combined with the LPP historical data, but why that is, and if they can be improved with tuning and maturity, and how quickly, remains to be seen.

Agena clocked piss poor Brisbane style. Deneb clocked awesome, hitting 3.7GHz at lower power. Same low pipeline stage design, FO4 design.

Lastly, yes, cache occupies a large chunk of delay in the processors cycle-limiting paths. Hence why I don't disregard it as a factor. I don't however believe it will be the most critical factor at play.

Sent from HTC 10
(Opinions are own)
 

KTE

Senior member
May 26, 2016
478
130
76
I was watching a YouTube video about G3258 overclocking, and they indicated that the uncore multiplier (which controls the clock of the L3 cache, I believe), can be a limiting factor in overclocks, and often, you can clock the CPU Cores higher, if you lower the uncore clock multi.
Timing sync problems with coupled/decoupled blocks, I can certainly understand causing an upper threshold limit.

Perhaps not for a flagship HP processor within expected frequencies though.

Sent from HTC 10
(Opinions are own)
 

bjt2

Senior member
Sep 11, 2016
784
180
86
Engineers tend to design balanced CPUs, to avoid waste effort and transistors on a fast pipeline stage, being limited by a slower one.

Bulldozer is a fast design, with 1 4 port scheduler for the INTEGER side and 1 4 port scheduler for the FPU side.
INTEL's core has a potentially slower unified 8 port scheduler and indeed, even if on a mature 14nm fin fet process, reaches a slower clock with its CPUs...

Zen has a similar 4 port scheduler for the FPU, but what AMD's engineer have done for the INTEGER scheduler?
Zen has 4 ALU and 2 AGU, versus 2+2 of bulldozer. If they had used a 6 port scheduler, this could have slowed down the CPU.
They could have used a 4 port scheduler for the ALU and a 2 port scheduler for the AGU.
They instead used six 1 port scheduler, that is not the best solution in terms of IPC...

Why? Are they crazy? An 1 port scheduler can potentially be clocked higher than a 4 port scheduler.
The only reason I see is that they did this to clock Zen even higher... There is no other reason to impair IPC, other than clock higher the CPU...
 

Justinbaileyman

Golden Member
Aug 17, 2013
1,980
249
106
Wow, that's a lot of platforms to jump between in 2 years. Kind of like me, I guess. I like to experiment.

Yeah I know its a lot of jumping around but it always turned out that there was something missing from those platforms I was looking for.I mean every time for 2 years straight. I did however really liked the X5650 but there was no USB 3.0 or Sata 3 and The E5-2670 was really awesome but out of both motherboards I had both had flaky Sata controllers and would consistently drop drives and both x5650 and the E5-2670 would boot into windows super slow say compared to my old i7 920.Right now I am really into buying very cheap lower end hardware and overclocking the snot out of it. Like I just bought a AMD 5350 and then upgraded to 5370 and it overclocks like beast. does what I need it to for now and no drives are being dropped but its lacking in power to encode media. Hence the reason I bought the 6700K. Trying to get the Gigabyte SOC Force Motherboard to pair with my 6700K but they are never instock.
 

swilli89

Golden Member
Mar 23, 2010
1,558
1,181
136
Sounds like the machine is crapping out Hopefully it will last until Kaby and Zen release so you can choose from both companies' offerings.
And I hate it too, being the miser that I am I was kind of hoping I'd get a decade out of this 2500k. Before this I got about 5 years out of my insanely overclockable E6600, man what a legend of a chip. I think I had it from a stock 2.4 to 3.8ghz. 60% increase!

If Intel decides to compete and offer a 6core unlocked Kaby for $300 it will make my decision a lot harder..
 

cdimauro

Member
Sep 14, 2016
163
14
61
You mean the IPC of Excavator?
You can see some figures from the Anandtech (Athlon X4 845) and Phoronix reviews.
Of all actors: Excavator, Broadwell... and some other contender (Skylake...). A nice article will be appreciated.
INTEL's core has a potentially slower unified 8 port scheduler and indeed, even if on a mature 14nm fin fet process, reaches a slower clock with its CPUs...
[...]
Zen has 4 ALU and 2 AGU, versus 2+2 of bulldozer. If they had used a 6 port scheduler, this could have slowed down the CPU.
[...]
They instead used six 1 port scheduler, that is not the best solution in terms of IPC...

Why? Are they crazy? An 1 port scheduler can potentially be clocked higher than a 4 port scheduler.
I'm not expert in this field, so I'm asking if you can explain such difference. I suppose that dispatching a uop by an 8 port scheduler imposes similar clock limits by dispatching a uop to one of the 6 schedulers. Maybe you need a 3 input MUX in both cases? Is it correct or not?
 

cdimauro

Member
Sep 14, 2016
163
14
61
If Intel decides to compete and offer a 6core unlocked Kaby for $300 it will make my decision a lot harder..
What says the Intel's roadmap for Kaby Lake? Is there any 6 or 8 cores coming (for desktop or enthusiast)?
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
There is 2x 10GBe on die, but honestly that is the first part I'd fuse off in a home cpu to be able to ask for more for the server ones.

Any links for that?

I do admit it seems very likely as even the older AMD "Seattle" Cortex A57 octocore has 2 x 10 GbE....just wondering where the info came from.
 
Last edited:

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
I'm not expert in this field, so I'm asking if you can explain such difference. I suppose that dispatching a uop by an 8 port scheduler imposes similar clock limits by dispatching a uop to one of the 6 schedulers. Maybe you need a 3 input MUX in both cases? Is it correct or not?
It would take a lot of research for us forum members to find out about the true details. Zen's 6 uop dispatch could actually be a 4+2 (derived from 4 instructions with up to 2 mem operands). The schedulers might have a mechanism for scheduling ops with quickly available operands and one for ops with operands needing an additional cycle according to one patent filing. This could be good for higher clk freq. at a iso voltage or lower power at iso clocks.
 
Last edited:

bjt2

Senior member
Sep 11, 2016
784
180
86
Of all actors: Excavator, Broadwell... and some other contender (Skylake...). A nice article will be appreciated.

I'm not expert in this field, so I'm asking if you can explain such difference. I suppose that dispatching a uop by an 8 port scheduler imposes similar clock limits by dispatching a uop to one of the 6 schedulers. Maybe you need a 3 input MUX in both cases? Is it correct or not?

Correct. Roughly the first problem is that an 8 port MUX requires a 3 bit mux, instead of 2 for a 4 port mux and none for an 1 port "mux". AMD is expert of 1 port "muxes" because k7-k10 had 3 separate 1 port scheduler with 1 ALU + 1 AGU coupled. So roughly the FO4 of this process is O(log2n). Then there is the logic to issue the operations on the correct port: with 8 mixed port, on intel, this is a nightmare, because of the different latencies of INT and FP instructions: one of the task of the scheduler is to issue the instruction to the correct port at the correct time to avoid conflict at the last stage of writing into the register. Probabily the port assignation was changed in the years also to simplify this logic and squeeze some other MHz... The 4 port on previous AMD design is simpler not only because is 4 ports, but also because it FPU or INT only... Less combinations to manage. The scheduler could be simpler. And with INT or FPU only instruction, the latencies are more uniform: much simpler to predict the correct clock at which issue an instruction to avoid conflict. So a 4 port unified scheduler is anyway more complex of a dedicated one. The 1 port scheduler on Zen don't have to mux anything. They only must assure to issue a ready instruction at the correct time, possibly between a bunch of them.
I don't know why they didn't this also on the FPU. Probabily because the FPU uops have more uniform latencies, or they are very much specialized (e.g. on the FADD port there are only fast uops and on FMUL ports only slow instruction), so this is not an issue, or the IPC loss is too high... If I remember well, FPU on AMD was ever with an unified scheduler, even on k7-k10. So they can have optimized in the years the scheduler. This is clearly visible in the 32nm SOI bulldozer, with 5GHz top clock (even if on 220W), with an higher overclock wall than INTEL on better processes...
 
Last edited:

bjt2

Senior member
Sep 11, 2016
784
180
86
It would take a lot of research for us forum members to find out about the true details. Zen's 6 uop dispatch could actually be a 4+2 (derived from 4 instructions with up to 2 mem operands). The schedulers might have a mechanism for scheduling ops with quickly available operands and one for ops with operands needing an additional cycle according to one patent filing. This could be good for higher clk freq. at a iso voltage or lower power at iso clocks.
According to the press scheme, there are 6 1 port scheduler, whose only job should be assure the correct timing issue and the choice of a good instruction to issue between a few in the queue... If it was a 4+2 scheme, even if with this connections, they would have highlighted on the scheme, i think... They instead highlighted the 6x1 lanes scheme, even admitting that this can lead to loss of IPC... I didn't hear the Hot Chips conference, so i don't know if someone ever asked, but i didn't found anything on clock advantage of this scheme... Too bad no one asked... Or reported...
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,705
1,231
136
Zen has a two level scheduler.

Unified 10 ports from the Micro Queue -> 6 ports to Int, 4 ports to FP. = Integer & Floating Point Level 2 Scheduler
6 non-unified ports via ALU0-3 Queues, AGU0-1 Queues. + Unified four ports via FP Scheduler. = Integer & Floating Point Level 1 Schedulers

ALU/AGU Queues can be remapped/renamed (dependency removal logic) through the Stack Memfile/Map-Retire stage. While, the micro-op queue does not have that capability.
 
Last edited:

bjt2

Senior member
Sep 11, 2016
784
180
86
Zen has a two level scheduler.

Unified 10 ports from the Micro Queue -> 6 ports to Int, 4 ports to FP. = Integer & Floating Point Level 2 Scheduler
6 non-unified ports via ALU0-3 Queues, AGU0-1 Queues. + Unified four ports via FP Scheduler. = Integer & Floating Point Level 1 Schedulers

ALU/AGU Queues can be remapped/renamed (dependency removal logic) through the Stack Memfile/Map-Retire stage. While, the micro-op queue does not have that capability.

Should this mean higher clock? I mean, even in comparison with Bulldozer...
 

bjt2

Senior member
Sep 11, 2016
784
180
86
Quad-core Models = FX-8370E // Octo-core Models = Opteron 6386 SE. For the max expected clock rates, imho.

So 2.8 GHz for the 8 core? How come the Zen ES @3GHz not melted then, but draw less power than a 3GHz clocked BW-E?
 

Nothingness

Diamond Member
Jul 3, 2013
3,031
1,971
136
According to the press scheme, there are 6 1 port scheduler, whose only job should be assure the correct timing issue and the choice of a good instruction to issue between a few in the queue... If it was a 4+2 scheme, even if with this connections, they would have highlighted on the scheme, i think... They instead highlighted the 6x1 lanes scheme, even admitting that this can lead to loss of IPC... I didn't hear the Hot Chips conference, so i don't know if someone ever asked, but i didn't found anything on clock advantage of this scheme... Too bad no one asked... Or reported...
It's extremely difficult to select multiple uops from a single queue in a single cycle at a high (let's say > 2 GHz) frequency.

As far as I know only Intel is using a unified scheduler. I guess that part of their design is tuned at the transistor level. But I wonder if they still use a fully unified scheduler, or if it's internally splitting the queue.
 
Status
Not open for further replies.
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |