Samsung outs Exynos 9 Series 9810

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Andrei.

Senior member
Jan 26, 2015
316
386
136
Yeah, claiming "better than ZEN IPC 100%" is jumping the gun "a bit". At least a while ago samsung invested heavy resources into optimizing for benchmarks.
I don't understand what the fuss is about admitting that is has higher IPC. It's a much lower clocked architecture. Apple has the highest IPC in the industry right now.

I still remember a A9 Samsung S3 outpacing a Krait Nexus-4 in many popular CPU benchmarks in phone reviews (while Krait essentially had 40%+ more IPC and similar clocks).
Err no. Krait had lower IPC than an A9. The 4412 was the better SoC that generation.
 
Reactions: Nothingness

Gideon

Golden Member
Nov 27, 2007
1,714
3,937
136
Yeah my bad, I was confusing it with A8 I guess (though Krait might have had better FP performance). I just remembered seeing benches, where a Samsung phone, that definitely had an old in-order architecture, soundly beat the Nexus 4 left and right (except the GPU benches).

And I wouldn't consider 2.9 Ghz to be "low". Well I guess then the X86 chipmakers should really be ashamed of themselves, delivering such uncompetitive designs (especially Intel, for all those years!) While mobile chipmakers manage to extract so much extra performance out of every generation.
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,153
136
Zen and Skylake aside from clocking higher also have to deal with workloads that no smartphone is ever expected to do. SPECInt is a better representative of desktop (yes not just workstation) performance than Geekbench, as the old and crusty Windows software environment is quite different to Android or iOS, and in these workloads you see just how big the gap is. At 3.2GHz EPYC is easily 3x the single threaded performance of Exynos 8895, and with SMT that's 5x per core.

Not to take anything away from Apple and Samsung, but this is complete apples to oranges (heh) to an extreme measure. I don't think it's fruitful to make IPC comparisons of mobile SoC's vs full blown x86 behemoths in Geekbench. It doesn't really tell you anything of value. The x86 designs aren't made to scale down to phones, and the ARM designs aren't made to scale up to servers.

It's alright to make the comparison in Geekbench, but it needs to be understood and not just looked at as is. Otherwise you draw conclusions that are based on false premises.
 

asendra

Member
Nov 4, 2012
156
12
81
Zen and Skylake aside from clocking higher also have to deal with workloads that no smartphone is ever expected to do. SPECInt is a better representative of desktop (yes not just workstation) performance than Geekbench, as the old and crusty Windows software environment is quite different to Android or iOS, and in these workloads you see just how big the gap is. At 3.2GHz EPYC is easily 3x the single threaded performance of Exynos 8895, and with SMT that's 5x per core.

Not to take anything away from Apple and Samsung, but this is complete apples to oranges (heh) to an extreme measure. I don't think it's fruitful to make IPC comparisons of mobile SoC's vs full blown x86 behemoths in Geekbench. It doesn't really tell you anything of value. The x86 designs aren't made to scale down to phones, and the ARM designs aren't made to scale up to servers.

It's alright to make the comparison in Geekbench, but it needs to be understood and not just looked at as is. Otherwise you draw conclusions that are based on false premises.

https://www.anandtech.com/show/9766/the-apple-ipad-pro-review/4

A9x wasn't 3x to 5x slower than Skylake on SPEC06, and that was two SOC generations ago for Apple.
Also, I would say Apples has advanced their CPU performance quite a bit more than Intel during this time..

but yes, in general there's little to gain by comparing such different designs. I only find it interesting in Apples case due to the theoretical overlap between iPad Pros and MacBooks, which share very similar TDPs and underlying OSs, which might allow Apple to ditch Intel if they wanted to
 
Reactions: CatMerc

CatMerc

Golden Member
Jul 16, 2016
1,114
1,153
136
https://www.anandtech.com/show/9766/the-apple-ipad-pro-review/4

A9x wasn't 3x to 5x slower than Skylake on SPEC06, and that was two SOC generations ago for Apple.
Also, I would say Apples has advanced their CPU performance quite a bit more than Intel during this time..

but yes, in general there's little to gain by comparing such different designs. I only find it interesting in Apples case due to the theoretical overlap between iPad Pros and MacBooks, which share very similar TDPs and underlying OSs, which might allow Apple to ditch Intel if they wanted to
The CPU's here have an order of magnitude less power available to them than the one I was using. That and the A9X is still faster than the SoC's I mentioned in single core, so some of the gap closes there.

So in the end we are both getting the same comparative results.
 

Nothingness

Platinum Member
Jul 3, 2013
2,757
1,405
136
A9x wasn't 3x to 5x slower than Skylake on SPEC06, and that was two SOC generations ago for Apple.
Also, I would say Apples has advanced their CPU performance quite a bit more than Intel during this time..
That comparison was heavily biased due to the use of icc in 32-bit mode for x86[*]. It's pointless to compare x86 vs ARM chips.

[*] Last time I checked icc vs gcc on an i3770 the geomean of the int part was 45% better for icc. icc is a SPEC compiler
 
Reactions: Lodix and CatMerc

Thala

Golden Member
Nov 12, 2014
1,355
653
136
That comparison was heavily biased due to the use of icc in 32-bit mode for x86[*]. It's pointless to compare x86 vs ARM chips.

[*] Last time I checked icc vs gcc on an i3770 the geomean of the int part was 45% better for icc. icc is a SPEC compiler

Indeed. Contrary to popular belief its only the compiler that matters for low level benchmark and not the OS. Likewise even when going with SPEC when comparing ARM vs. x86...the used compilers should be the same (either llvm or gcc).
 
Reactions: CatMerc

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Jim said we can for the same number of transistors have about a 10% bigger OOOE engine with arm then with x86, Micheal Clake said we can deliver Zen level of performance regardless of ISA.

Except of course that only getting Zen performance would not be an achievement for any ARM architecture of similar size and power.

IF you search RWT you will find the wars about ISA covered very well, to me i would summarize the issue as at 4 wide decode x86 spends more transistors on the front end but it doesn't cost you power, uop caches help that limit and save power, over 4 is a big problem. ARM ISA has some nicer load operations.

At that point your done, everything else weak vs strong memory ordering are all just different trade offs for different workloads.

There is so much wrong with the x86 ISA, that i do not even know where to start. But claiming that uop cache saves power is...interesting. In addition i do not know any workload where a strong memory ordering model gives you advantages. In contrary implementing sequential consistency cost you gates and power in addition to limiting your performance. It is very natural for an OOO architecture to have weak memory ordering.
Issue is, that due to backwards compatibility x86 never made the jump to weak ordering. Back in the seventies sequential consistency was naturally given. Today however, when at any given point in time you have several transactions ongoing, it is a challenge to make them observable in program order without barriers.
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,153
136
Except of course that only getting Zen performance would not be an achievement for any ARM architecture of similar size and power.



There is so much wrong with the x86 ISA, that i do not even know where to start. But claiming that uop cache saves power is...interesting. In addition i do not know any workload where a strong memory ordering model gives you advantages. In contrary implementing sequential consistency cost you gates and power in addition to limiting your performance. It is very natural for an OOO architecture to have weak memory ordering.
Issue is, that due to backwards compatibility x86 never made the jump to weak ordering. Back in the seventies sequential consistency was naturally given. Today however, when at any given point in time you have several transactions ongoing, it is a challenge to make them observable in program order without barriers.
uOp cache saves power on x86 vs not having one. Of course not having the power hungry decoder saves more, but x86 is x86.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,864
3,418
136
Except of course that only getting Zen performance would not be an achievement for any ARM architecture of similar size and power.



There is so much wrong with the x86 ISA, that i do not even know where to start. But claiming that uop cache saves power is...interesting. In addition i do not know any workload where a strong memory ordering model gives you advantages. In contrary implementing sequential consistency cost you gates and power in addition to limiting your performance. It is very natural for an OOO architecture to have weak memory ordering.
Issue is, that due to backwards compatibility x86 never made the jump to weak ordering. Back in the seventies sequential consistency was naturally given. Today however, when at any given point in time you have several transactions ongoing, it is a challenge to make them observable in program order without barriers.

Uop cache is claimed buy both intel and amd to save power, do arm claim loop caches save power ( i would assume they do). Any time you can power down your front end is a good thing.

I dont really have time to write a full reply myself, so it just link this:
https://www.realworldtech.com/forum/?threadid=131745&curpostid=131806

So how strong is the memory ordering in the M3............
 

eastofeastside

Junior Member
Nov 19, 2011
17
3
81
Great Zen vs M3 discussion.

My interest in asking is on the potential for ARM to have an impact over x86 in low to mid-range Windows 10 laptops and Chromebooks.

And secondly, on the possibility for an ARM based PS5/XBOX next-gen console CPU.

I specifically wanted to know if a big ARM core could have an advantage over mobile Ryzen and i3 and i5. Ryzen is not a pure mobile core like M3, perhaps K12 was supposed to address the mobile low-power part of AMD's strategy.

ARM Ares next-gen ARM core will launch this summer at Computex. I'm excited for the future of next-gen ARM cores to expand beyond phone and tablets, especially when they hit 7nm.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Uop cache is claimed buy both intel and amd to save power, do arm claim loop caches save power ( i would assume they do). Any time you can power down your front end is a good thing.

Yes x86 vs x86 uop cache saves power as you do not have to run full decode every time. It is not about powering-down though, as the decoder do not have separat/split power domains - it is just about less activity in the decoders, what saves power.

I dont really have time to write a full reply myself, so it just link this:
https://www.realworldtech.com/forum/?threadid=131745&curpostid=131806

I just happen to disagree with Linus. Seems he is no CPU architect and his SW arguments are blown up out of proportion. Barriers are typically not needed in user level code but are hidden within the OS. The background is, that say two threads/contexts only have to reason about memory ordering at the synchronization points - that is unless you do implement synchronization in application-level code, but this would be bad practice anyway.
Regarding verification of the OS itself, indeed you might end up with more barriers than needed on a particular architecture, but correctness is pretty much decidable at this point.
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,153
136
Great Zen vs M3 discussion.

My interest in asking is on the potential for ARM to have an impact over x86 in low to mid-range Windows 10 laptops and Chromebooks.

And secondly, on the possibility for an ARM based PS5/XBOX next-gen console CPU.

I specifically wanted to know if a big ARM core could have an advantage over mobile Ryzen and i3 and i5. Ryzen is not a pure mobile core like M3, perhaps K12 was supposed to address the mobile low-power part of AMD's strategy.

ARM Ares next-gen ARM core will launch this summer at Computex. I'm excited for the future of next-gen ARM cores to expand beyond phone and tablets, especially when they hit 7nm.
There wouldn't be much benefit. There's a reason K12 was shelved, despite already having running engineering samples.

The reality is that x86 and ARM these days barely have any difference. You are looking at maybe a mm^2 of saving on chip size and maybe 10% higher efficiency. And with every subsequent node this drops lower and lower, as the x86 parts of the chip become relatively smaller and less power hungry since they don't become more complex.
 
D

DeletedMember377562

Lol, and this is because you say so? Provided evidence is no prerequisite/necessary condition for the truth of a statement. You statement is irrational.

Actually provided evidence is exactly a prerequisite for truth of statement. My statement is completely rational, unlike yours. You're making prediction claims about the performance of a future architecture based on some evidence you say you saw yourself, personally, but refuse to provide us with that same evidence. If you can't see the issue here then you have some serious issues.

I still think at 2.9Ghz with this IPC on final product. Anyway is still a dead end, being tied to buy a Samsung or a Apple phone to have this kind of performance is far from what we want.

Doesn't matter. According to Thala, it's completely fine to have a 60-75% single core + ~15% multi core disadvantage, as long as we have 10% GPU advantage on the SD845...
 
Last edited by a moderator:
Reactions: Arachnotronic

eastofeastside

Junior Member
Nov 19, 2011
17
3
81
There wouldn't be much benefit. There's a reason K12 was shelved, despite already having running engineering samples.

The reality is that x86 and ARM these days barely have any difference. You are looking at maybe a mm^2 of saving on chip size and maybe 10% higher efficiency. And with every subsequent node this drops lower and lower, as the x86 parts of the chip become relatively smaller and less power hungry since they don't become more complex.

I appreciate what you are saying about ARM versus x86 efficiency differences.

I still have another question from another perspective, though. Ryzen is a server/desktop core from which the lower grade, mobile versions are binned. Take Jaguar and Atom as example of cores architected specifically for low power, isn't there a significant tdp/mm2 advantage gained from using a dedicated low power chip for a low power application, versus using a bigger mobile variation of a desktop grade core clocked down for low power use?

Seeing as Jaguar is dead and Atom isn't an option for a console, would an ARM core made specifically for the thermal range of a console have a significant mm2/tdp advantage over trying to adapt Ryzen for a console application?

Maybe K12 was supposed to be the lower power solution for AMD in the place of Jaguar. I hope it pops back on the radar before too long.
 
Last edited:

Thala

Golden Member
Nov 12, 2014
1,355
653
136
The reality is that x86 and ARM these days barely have any difference. You are looking at maybe a mm^2 of saving on chip size and maybe 10% higher efficiency. And with every subsequent node this drops lower and lower, as the x86 parts of the chip become relatively smaller and less power hungry since they don't become more complex.

Where are these numbers coming from? From my experience with both architectures, the efficiency deviation is much higher. Similar with node drops, what i am seeing is, that the gap is not closing due higher impact of leakage. Point in case, uop cache helps to save dynamic power but you have increased leakage. x86 will never become an efficient architecture due to inherent flaws, which only can be worked around with increasing higher cost. On a very abstract level the latest ARM and x86 architectures might look similar, but they look very different when you look into the actual micro architecture.
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,153
136
Where are these numbers coming from? From my experience with both architectures, the efficiency deviation is much higher. Similar with node drops, what i am seeing is, that the gap is not closing due higher impact of leakage. Point in case, uop cache helps to save dynamic power but you have increased leakage. x86 will never become an efficient architecture due to inherent flaws, which only can be worked around with increasing higher cost. On a very abstract level the latest ARM and x86 architectures might look similar, but they look very different when you look into the actual micro architecture.
The decode part is absolutely tiny, and it doesn't get any more complex with a node shrink or architectural update. It just shrinks, so the relative area it takes shrinks with each generation.

As for power efficiency, this guy knows things about K12 that aren't public.
Considering K12 is as close to an x86 design converted to ARM (or vice versa) we have, it speaks volumes about what the difference is these days. The differences you're talking about are true for simple x86 designs, but Intel and AMD with their decades of iteration on x86 managed to close the gap to the point where it really doesn't matter. The uOp cache was the final nail in the coffin for the ARM vs x86 difference. FinFET's dropped leakage a LOT too.

ARM will still be best for small cores, but x86 isn't going anywhere for tablet and higher performance.

I appreciate what you are saying about ARM versus x86 efficiency differences.

I still have another question from another perspective, though. Ryzen is a server/desktop core from which the lower grade, mobile versions are binned. Take Jaguar and Atom as example of cores architected specifically for low power, isn't there a significant tdp/mm2 advantage gained from using a dedicated low power chip for a low power application, versus using a bigger mobile variation of a desktop grade core clocked down for low power use?

Seeing as Jaguar is dead and Atom isn't an option for a console, would an ARM core made specifically for the thermal range of a console have a significant mm2/tdp advantage over trying to adapt Ryzen for a console application?

Maybe K12 was supposed to be the lower power solution for AMD in the place of Jaguar. I hope it pops back on the radar before too long.
The main objective of Jaguar and Atom was being small and cheap, not just power efficiency. Zen and modern Intel cores can scale down in power well enough to the point where they're often more efficient than the small cores. The problem is they're bigger, and therefore costlier to make. It's actually the main reason why Apple get so far ahead in both performance and efficiency. A big core isn't necessarily less efficient, in fact with the right engineering it can be even more efficient. But it will cost more.

For AMD, they'd rather eat the per chip costs of having a bigger core than the costs of designing a new small core just for consoles.
 
Last edited:

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Doesn't matter. According to Thala, it's completely fine to have a 60-75% single core + ~15% multi core disadvantage, as long as we have 10% GPU advantage on the SD845...

Do not put words in my mouth! I never did a general statement like this.

What i said was:
Personally i would value a better GPU performance higher, than better single core performance.
 
Reactions: CatMerc

CatMerc

Golden Member
Jul 16, 2016
1,114
1,153
136
Do not put words in my mouth! I never did a general statement like this.

What i said was:
While I disagree about the GPU notion, I do agree that we should be more careful about interpretation of comments. It just degrades discussion quality otherwise. +1
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
The decode part is absolutely tiny, and it doesn't get any more complex with a node shrink or architectural update. It just shrinks, so the relative area it takes shrinks with each generation.

First if you are going wider you eventually need to add decoders so you have to scale the frontend along with backend. Second the decoders are monsters compared to ARM decoders in particular if you include the uop cache. Missing in the cache costs you additional delays making the pipeline frontend-bound much more often than on ARM. There are lots of other issues with x86 on top like memory model - which also impacts cache coherency implementation, small architectural register set, memory operands, atomic operations, descriptor tables, segmentation etc. Many of the things, which were okayish in the seventies you still find today only in x86.


Ok. Rumored power numbers while running an unknown use-case on a not released architecture. Sounds not particularly credible. From my knowledge K12 was never really finished.

FinFET's dropped leakage a LOT too.
Yes FinFETs would have lower leakage compared to the similar small planar process. However truth is leakage was going up from 28nm planar TSMC to 14nm finFET Intel. It would have increased more when going down to an hypothetical planar 14nm process - but that was not my point.

ARM will still be best for small cores, but x86 isn't going anywhere for tablet and higher performance.

My point is that we will see ARM cores, which are faster than anything x86 at lower power in the not too distant future. If they are easily penetrating the Windows desktop market is a different question. I assume it also depends how well Microsoft plays their cards with Windows on ARM.
 
Last edited:

Nothingness

Platinum Member
Jul 3, 2013
2,757
1,405
136
There are lots of other issues with x86 on top like memory model - which also impacts cache coherency implementation, small architectural register set, memory operands, atomic operations, descriptor tables, segmentation etc. Many of the things, which were okayish in the seventies you still find today only in x86.
As a side note, I bet we'll see more ARM cores do "magic" D to I cache snooping. JIT has become too prevalent to ignore the cost of explicit cache maintenance.
 

eastofeastside

Junior Member
Nov 19, 2011
17
3
81
The main objective of Jaguar and Atom was being small and cheap, not just power efficiency. Zen and modern Intel cores can scale down in power well enough to the point where they're often more efficient than the small cores. The problem is they're bigger, and therefore costlier to make. It's actually the main reason why Apple get so far ahead in both performance and efficiency. A big core isn't necessarily less efficient, in fact with the right engineering it can be even more efficient. But it will cost more.

For AMD, they'd rather eat the per chip costs of having a bigger core than the costs of designing a new small core just for consoles.

K12? Was K12 supposed to be the "new small core"? It was obviously designed to a high level before being shelved. Is expecting K12 or a future variant in consoles, low power Windows laptops, or Chromebooks a stretch?
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,153
136
K12? Was K12 supposed to be the "new small core"? It was obviously designed to a high level before being shelved. Is expecting K12 or a future variant in consoles, low power Windows laptops, or Chromebooks a stretch?
I never said K12 was supposed to be a small core. As for using it on consoles, would make backwards compat harder, which was one of the reasons both Sony and MS moved to x86. When a new gen arrives, backwards compat would be far easier. Especially MS who is moving away from the traditional generations model, and moving more towards having something like the smartphones model.

Low power windows laptops maybe, but that would be assuming there's a benefit for it over normal Zen. Chromebooks maybe.

AMD shelved it as a product, it will only make an appearance if a customer orders it as a semi-custom product, per Lisa's words.

Second the decoders are monsters compared to ARM decoders in particular if you include the uop cache.
https://en.wikichip.org/w/images/0/0e/amd_zen_core_die.png
Out of a 7mm^2 Zen core, the decode area uOp cache included is 0.865mm^2. And out of a 213mm^2 chip that is the final Zeppelin 8 core design, the total cost is 6.92mm^2.

From my knowledge K12 was never really finished.
K12 got to the running engineering sample stage. It just never turned into a product since AMD didn't see the value.

My point is that we will see ARM cores, which are faster than anything x86 at lower power in the not too distant future.
I completely disagree. Soon enough they will all meet the same dead end ILP extraction wise that Intel (and soon AMD) met. It's easier to follow than to trail blaze, don't expect the meteoric performance jumps we see right now to continue for long. There isn't much more that can be improved hardware wise for absolute performance, not without completely blowing up power budgets to the point where just adding more cores is far more efficient.

And even though it appears like Apple and soon Samsung are gaining on Intel and AMD, remember that the two have been stuck on 14nm for years now, while Apple and Samsung are getting the benefits of a node shrink. Once Intel and AMD move to 7nm (Well, 10nm for Intel), the bar will be set higher for ARM designs to beat.
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |