AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Page 76 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

bjt2

Senior member
Sep 11, 2016
784
180
86
Since with blender test we know that the MT mean IPC is the same as broadwell-E there are 2 possibilities:

1) AMD SMT it's better than INTEL's. So in ST Zen will be slightly inferior.
2) AMD SMT it's worse than INTEL's. So in ST Zen will be slightly better.

This in blender.
Pick your choice...

But please note that, at least in blender, it can't be both SMT and ST IPC worse than INTEL's...

Anyway Zen was an early ES, with 2 memory channels, vs 4, and probabily with the RAM at low clock, so final product can be better (at least in clock frequencies)...
 

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
In volume notebooks, AMD will have to deal with CNL-U...at least a full generation behind on process technology again.
It would seem more than a generation ahead given how super aggressive Intel is on 10nm power consumption. I wonder if Apple has had any influences there. With the modem pilot experiment that Apple did, which seems to have been successful even if the modem itself sucks, with the ARM Artisan IP announcement at IDF, a transistion to Intel 10nm in 2018 would be a possibility. I think the battle Intel 10nm vs. TSMC 7nm for A12 has not yet been fought.
 
Mar 10, 2006
11,715
2,012
126
Since with blender test we know that the MT mean IPC is the same as broadwell-E there are 2 possibilities:

1) AMD SMT it's better than INTEL's. So in ST Zen will be slightly inferior.
2) AMD SMT it's worse than INTEL's. So in ST Zen will be slightly better.

This in blender.
Pick your choice...

But please note that, at least in blender, it can't be both SMT and ST IPC worse than INTEL's...

Anyway Zen was an early ES, with 2 memory channels, vs 4, and probabily with the RAM at low clock, so final product can be better (at least in clock frequencies)...

In the specific Blender test shown by AMD with unspecified hardware and software configurations just before AMD announced a follow on stock offering to pay off debt.
 

jpiniero

Lifer
Oct 1, 2010
14,841
5,456
136
It would seem more than a generation ahead given how super aggressive Intel is on 10nm power consumption.

Intel has said so little about 10 nm that to say that seems premature. Which makes comparing Cannonlake to Zen at this point fruitless.
 

blublub

Member
Jul 19, 2016
135
61
101
For AMD to survive 2017 all ZEN has to do is be a lot better than XV - and all signs post in the direction that it's going to be just that.
If ZEN+ isn't narrowing the gap to Intel this would be worrisome as it would indicate that the design can't be improved much.

Although I have to admit that if ZEN benchmarks fall below Haswell the stock is gonna tank as AMD suggest it can keep up with its blender test
 

NTMBK

Lifer
Nov 14, 2011
10,269
5,134
136
In volume notebooks, AMD will have to deal with CNL-U...at least a full generation behind on process technology again.

In servers, Zen needs to go up against Skylake-EP, which will certainly be a formidable architecture.

Gaining share against Intel is not going to be easy.

Certainly not. A lot depends on how well Intel can get their 10nm process yielding.
 

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
Intel has said so little about 10 nm that to say that seems premature. Which makes comparing Cannonlake to Zen at this point fruitless.
They have given a presentation at IDF. A slide with gate delay and energy. Energy was such a big decrease compared to other nodes, even FinFET, that it can't just be coincidence. Bohr explicitly said in webcast that the focus was on energy not performance.
 

cdimauro

Member
Sep 14, 2016
163
14
61
I remember your article on the pyton interpreter... In this case it's true that it's a big switch, but as with common instructions in CPU, that are encoded with shorter codes or are decoded in one uop (the famous 96% of instructions decoded in 1 uop), this can be said also for emulators. If the uop cache is big enough, you will find in it the most used code snipped, corresponding to the most used emulated instructions... I bet that more than 90% of emulated code can be reduced to a few dozen instructions, and the switch branches that take care of them can be safely stored in a 1k-2k uops cache... When a less used opcode must be emulated, the new snippet kicks out a less used bunch of instructions... But the majority would stay in cache... This is in essence the task of a cache...
Yes, but the micro-op cache is very small, at least compared to the regular L1C. And we don't know how it works. For example, it might work well only for loops, and have scarse or no associativity at all (e.g.: only a few "fragments" are cached).
Long story short: with "fragmented/heavy-branch" code its contribute can be small or even none.

Regarding emulators (and VMs too), it's true that there are more common opcodes/instructions, but still they are too many and contain too many instructions to be kept all (or even a good part) into an uop cache.

I know for sure (C)Python's VM, which executes A LOT of instructions for a very simple operation, like a very very common "INT + INT" operation. If you take a look at the executed code, there are some branches and several instructions, which'll be split in many uops. Repeat the same for some other common operations, and you can easily understand that even the L1C isn't enough for keeping the average working set.
My own opinion is that it is looking to end up as 20-25% slower than Intels SKL "all around". I don't care to dwell into the ST/MT depths just yet.

Part of that will be SKLs clockspeed advantage, which I'm sure of.

IPC wise, I couldn't put a figure on it.

This is the same difference BD 8150 had against SNB 2600k, and the top models were even more ahead.

AMD typically loses a lot of revenue due to its repeated quarterly delays. Such delays buy a lot of time for the competitor.

They fail miserably in ambush or sudden counter fire.

Sent from HTC 10
(Opinions are own)
If the very last information reported above is true, the situation is much worse...
 

bjt2

Senior member
Sep 11, 2016
784
180
86
In the specific Blender test shown by AMD with unspecified hardware and software configurations just before AMD announced a follow on stock offering to pay off debt.

So you don't buy that Zen has similar IPC as BWE in blender? It seems they used the same official version downloadable from the site...

Yes, but the micro-op cache is very small, at least compared to the regular L1C. And we don't know how it works. For example, it might work well only for loops, and have scarse or no associativity at all (e.g.: only a few "fragments" are cached).
Long story short: with "fragmented/heavy-branch" code its contribute can be small or even none.

Regarding emulators (and VMs too), it's true that there are more common opcodes/instructions, but still they are too many and contain too many instructions to be kept all (or even a good part) into an uop cache.

I know for sure (C)Python's VM, which executes A LOT of instructions for a very simple operation, like a very very common "INT + INT" operation. If you take a look at the executed code, there are some branches and several instructions, which'll be split in many uops. Repeat the same for some other common operations, and you can easily understand that even the L1C isn't enough for keeping the average working set.

If the very last information reported above is true, the situation is much worse...

In this case then the 4 way decoder could give a steady flux of 2 instructions per thread per clock... I imagine that an interpreter has not anyway an high IPC, probabily below 1, certainly below 2... So 4 decoders for two thread, plus something in the uop cache, should be sufficient... Anyway on INTEL CPUs the uop cache is 1.5K uops if I remember well, not so small... I hope that on Zen is at least 1k uops...
I remember this fact that there are apparently simple instructions that are translated in a very long chain in the pyton interpreter, but I hope that there are also simpler cases...
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
In the specific Blender test shown by AMD with unspecified hardware and software configurations just before AMD announced a follow on stock offering to pay off debt.
And the big institutionalized professional investors didnt even see it and still haven't !
 

LTC8K6

Lifer
Mar 10, 2004
28,520
1,575
126
Does it also bug you when Intel relabels i3's as i5's and i7's in laptops?
No, because i3's did not have turbo. If the laptop 2c/4t chips had turbo, then they were different from i3's. As far as I know, the mobile i5 and i7 chips all had turbo boost, differentiating them from i3 chips.

It may be that the new Kaby Lake i3's have turbo, though.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Besides AMD has already demoed Zeppelin with SMT enabled (Blender).
Good point. OTOH this was the 8C variant, while Naples is a different story. But so is also the type of SMT problem. It might not be a bug per se (non functional SMT), but some configuration/OS issue resulting in lower performance with SMT enabled. There is some adaptive application aware power management, which might be wrong here.
 

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
I read that Zen overclocks to about 4.2GHz. So at least with Zen people will stop complaining about Intel's processors not overclocking high enough since Zen (even) worse. That same dude says it matches Haswell performance.

Don't know if this was already posted here.

That Cinebench R15 MT score
6900K @ 5.1GHz 2,100~
8C/16T ZEN @ 5.1GHz 2,000 (workaround disabled)

https://webcache.googleusercontent....20/amd-zen-141214/+&cd=2&hl=en&ct=clnk&gl=eng
https://webcache.googleusercontent....41214/index3.html+&cd=3&hl=eng&ct=clnk&gl=eng
 

KTE

Senior member
May 26, 2016
478
130
76
I don't think SMT is buggy at this stage per se, but maybe a multisocket implementation needs work.

I'd expect AMDs SMT to be pretty darn good given their much lower IPC in general previous uarchs.

It's their ST w/o SMT/MHz/Power I'm worried about. The first because they've left it too late; 20-40% simply wouldn't be enough to compete above cheapy Value in 2017-2018. 40-60% would be excellent, and make up a lot of lost turf but would need improving on ASAP to stay competitive.

Secondly, 95W is fine, but 3.1GHz@95W and then 3.3GHz@125W isn't

They will pit n+2 cores to Intels, with SMT, so I'm not worried about the MT perf.

Sent from HTC 10
(Opinions are own)
 
Mar 10, 2006
11,715
2,012
126
Reactions: witeken

piesquared

Golden Member
Oct 16, 2006
1,651
473
136
It has been known for several months that desktop comes first. Sounds a lot like A64 and FX51 coming to desktop first then server to follow. Although FX51 w/ socket 940 required ECC RAM, whereas Summit Ridge and AM4 doesn't appear to require it.
 

Soulkeeper

Diamond Member
Nov 23, 2001
6,713
142
106
Thanks, I guess i'll be waiting awhile. Unless I see some good deals on xeon parts for the holidays.
 

Thunder 57

Platinum Member
Aug 19, 2007
2,814
4,108
136
It has been known for several months that desktop comes first. Sounds a lot like A64 and FX51 coming to desktop first then server to follow. Although FX51 w/ socket 940 required ECC RAM, whereas Summit Ridge and AM4 doesn't appear to require it.

Except that Opteron came before A64 by a good six months or so. This time around though AMD has said desktop will come first.
 

cdimauro

Member
Sep 14, 2016
163
14
61
So you don't buy that Zen has similar IPC as BWE in blender? It seems they used the same official version downloadable from the site...
Which does't use AVX/-2...
In this case then the 4 way decoder could give a steady flux of 2 instructions per thread per clock... I imagine that an interpreter has not anyway an high IPC, probabily below 1, certainly below 2...
It depends on the emulated code/instructions, but I think that it's reasonable to expect an IPC close to 1.
So 4 decoders for two thread, plus something in the uop cache, should be sufficient... Anyway on INTEL CPUs the uop cache is 1.5K uops if I remember well, not so small... I hope that on Zen is at least 1k uops...
It doesn't change the picture: they are not enough. Not even the L1C is enough. And we don't know the politics which are used to cache the uops.
I remember this fact that there are apparently simple instructions that are translated in a very long chain in the pyton interpreter, but I hope that there are also simpler cases...
Sure: one of the most common case (LOAD_FAST) is very fast and needs a few instructions. But talking about "few" it means around ten if you follow the path from the bytecode fetch 'til the real executing, and the jump back to the fetch code section, and consider that 3 conditional jumps are executed, plus an unconditional one (at the very end).

Which is a normal scenario for an emulator which is works the usual, "interpretative", way.

When a JIT is involved it's quite different, because most of the overhead / non-linear code is represented by software "guards" (when checking the types, or when checking if some interrupt / signal happened), and "loop guards" (to avoid being indefinitely stuck into a loop).
 
Status
Not open for further replies.
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |