Core M v.s. A8X in Geekbench 3

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Qwertilot

Golden Member
Nov 28, 2013
1,604
257
126
Very pointless trying to count transistors when such large chunks of the Ax designs aren't anything to do with the CPU/GPU.

I wonder who we'd trust so that any limitations of a CoreM notebook are intrinsic to the chip rather than device? Apple perhaps, but if they were going to use M rather than some 'normal' ULV chip you'd think they'd probably have announced it at that last event.
 

North01

Member
Dec 18, 2013
88
1
66
First review shows severe throttling.

http://www.ultrabookreview.com/5486-lenovo-yoga-3-pro-review/

Eg. Stressing both cpu and gpu reduced the cpu frequency to 500mhz and gpu to 150mhz.

From the article:
So it looks to me that Lenovo are capping down the performance once a certain temperature is met. On top of that, analyzing the HWInfo logs, it looks to me that Lenovo have set the TDP limit at only 3.5W for this test unit (with occasional spikes at up to 12W – LP3), bellow the nominal TDP of 4.5 W. If I’m not wrong, manufacturers are allowed to set their own Core Package Power. And if that’s the case, final releases might get faster, and at the same time there’s a fair chance we’ll see more powerful Core M devices in the future. Upping the Core’s allowed wattage should yield significantly better performance.
 

Homeles

Platinum Member
Dec 9, 2011
2,580
0
0
You know, Lenovo probably would have been okay if they had used a circular heatsink design like the Surface Pro 3. So I'd chalk it up to design.

Oh, and that 3.5W thing sounds ridiculous. Why would they do that?
 
Last edited:

dahorns

Senior member
Sep 13, 2013
550
83
91
You know, Lenovo probably would have been okay if they had used a circular heatsink design like the Surface Pro 3. So I'd chalk it up to design.

Oh, and that 3.5W thing sounds ridiculous. Why would they do that?

It is especially ridiculous considering how much more powerful the Yoga Pro 2 is.

Keep in mind that the only testing/reviews we've seen have been from journalist samples. This may be something that Lenovo can address in firmware updates/final consumer products.

From what I've read, it sounds like there is still some thermal headroom available in the device. Can they just push an update to change the settings to 4.5W?
 

Zink

Senior member
Sep 24, 2009
209
0
0
3.5W average seems low but spikes to 12W is likely higher than an iPad ever hits, hence the active cooling. In AnandTech iPad Air testing, peak platform power went as high as 12W on the Air, up from 4W with the CPU idle so that is about 8W peak for the CPU+platform.

They found that the system throttled down to about 10W total for the platform after about a minute. That is 6W higher than with the CPU idle implying that they are managing to cool 5-6W from the SOC. The passive cooling ability of a tablet decreases with thickness according to Intel so it will be interesting to see how the Air 2 throttles.

http://www.anandtech.com/show/7460/apple-ipad-air-review/3
 
Last edited:

dahorns

Senior member
Sep 13, 2013
550
83
91
What else would you expect when trying to feed 24EUs worth of graphics performance and 2 desktop class CPU cores simultaneously in a 3.5W power envelope? Intel did a tremendous job with Broadwell, but you can't expect much more than how a 7W Haswell performed. I don't really understand why Lenovo didn't set the TDP at 6.5W, though.

Honestly, what is the point of using the 5Y70 chip at 3.5W? Why wouldn't they just use the 5Y10?
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
Why would three cores be unlikely? The XBox 360 had three cores, and it worked just fine.

1. Xbox 360 is a rare chip where you see an odd core count. In traditional desktop, server and mobile devices a fully enabled chip does not have a odd core count.

2. The 3 billion transistor count its 50% more than the A8 SOC. Thats a lot of transistors and can easily pack 2 more CPU cores .

Anyway I am guessing quad core improved Cyclone 1 Ghz base (4 threads) , 1.5 Ghz turbo (1/2 threads). Lets see how it turns out.
 

Mopetar

Diamond Member
Jan 31, 2011
8,008
6,454
136
1. Xbox 360 is a rare chip where you see an odd core count. In traditional desktop, server and mobile devices a fully enabled chip does not have a odd core count.

It's not an actual rule. Also AMD sold 3 core CPUs (1 of 4 cores disabled) for a while, so it's not exactly unheard of. Typically powers of 2 are used since things tend to work out better that way with computers, but again it's not a hard and fast rule, just a custom.

2. The 3 billion transistor count its 50% more than the A8 SOC. Thats a lot of transistors and can easily pack 2 more CPU cores .

You're leaving out the likely increase to the GPU cores/clusters (rumored to be 6 instead of the 4 found in the A8) as well as any additional extra cache that was included.

It seems unlikely that the iPad needs 4 cores. It doesn't have real multi-tasking so there aren't a lot of uses for 3 cores, let alone 4. My guess is that they're working on something that allows multiple apps side-by-side and they couldn't get the software done in time, but had already committed to having the hardware to support the feature.

I suspect that the third core won't see a lot of actual use for most apps and will be off most of the time.
 

UnmskUnderflow

Junior Member
Sep 24, 2014
7
0
0

@jfpoole

Can I nitpick a bit? The GEMM kernel

for (unsigned i = 0; i < N; i ++) {
for (unsigned j = 0; j < N; j ++) {
for (unsigned k = 0; k < N; k ++) {
C[j] += A[k] * B[j][k];
}
}
}


While functionally correct, this violates the 2 golden rules of GEMM
1.) Don't thrash the LS (2x loads per mul/accumulate)
2.) Don't have dependent mul/accumulate

In a small enough memory footprint, this could give undue advantage to either LS's that can handle enough outstanding scatter/gather transactions or to FPs that have a "cascading" FMA (ARM has lots, x86 none). This may have undue side effects in perf comparisons.

IBM and Intel have spent decades optimizing their BLAS to avoid this style and maximize the HPC bandwidth. May I ask why GB chose this implementation? Again, not functionally wrong...just surprising that each final matrix element is fully solved before moving columns/rows.

BTW, I appreciate what Primate Labs is doing to develop something that can be run on multi-ISAs and platforms. Hope you're open to this feedback.
 

thepath

Junior Member
Mar 9, 2013
10
0
16
5Y10 is not best Core M part and it does not even support hyperthreading

Core M 5Y70 on other hand should blow Apple A8X out of water
 

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
1. Xbox 360 is a rare chip where you see an odd core count. In traditional desktop, server and mobile devices a fully enabled chip does not have a odd core count.

2. The 3 billion transistor count its 50% more than the A8 SOC. Thats a lot of transistors and can easily pack 2 more CPU cores .

Anyway I am guessing quad core improved Cyclone 1 Ghz base (4 threads) , 1.5 Ghz turbo (1/2 threads). Lets see how it turns out.

How about a 15C IVY-BRIDGE EX?
I suspect that the third core won't see a lot of actual use for most apps and will be off most of the time.

I think if they really want to compete with INTEL they need a form of SMT, of course adding a form of SMT implementation to a core is a hell of a lot more complicated then adding an another core not that adding an another core is trivial. It's not a copy and paste.
 
Last edited:

Homeles

Platinum Member
Dec 9, 2011
2,580
0
0
3.5W average seems low but spikes to 12W is likely higher than an iPad ever hits, hence the active cooling. In AnandTech iPad Air testing, peak platform power went as high as 12W on the Air, up from 4W with the CPU idle so that is about 8W peak for the CPU+platform.

They found that the system throttled down to about 10W total for the platform after about a minute. That is 6W higher than with the CPU idle implying that they are managing to cool 5-6W from the SOC. The passive cooling ability of a tablet decreases with thickness according to Intel so it will be interesting to see how the Air 2 throttles.

http://www.anandtech.com/show/7460/apple-ipad-air-review/3
I have little doubt that the A8X has a higher peak power draw. The 20nm process is not enough to outweigh the 50% increase in draw from adding 50% more cores. And if the frequency is higher, then that only adds more current.

Of course, we're talking about the worst case scenario, which is an entirely academic exercise and has no signicifant real world application. And since the A8X doesn't have any direct competition from Core M, it's doubly meaningless except for satistfying one's curiousity.
 
Last edited:

Morgoth780

Member
Jul 3, 2014
67
2
71
Pretty impressive.

From what I've seen, the A8X is on 20nm right? What process is the Denver Tegra K1 on? Because if it's on a larger process that makes it very impressive in terms of CPU performance.
 

Khato

Golden Member
Jul 15, 2001
1,225
280
136
SHA and memory speed seems to be the A8X main benefits.

But again, the cross platform issue.

Is the Asus one even using dual channel?

Coming back to this after actually scrolling all the way down on the A8X Geekbench score and noticing the memory results... Quite certain that the Asus T300FA used for comparison is single-channel. Reason for such is that if you look at the results for a known single-channel Haswell device - http://browser.primatelabs.com/geekbench3/575799 - the memory bandwidth scores match up pretty much perfectly.

Also, it's primarily SHA1 where A8X benefits. The SHA2 results are more in-line since I believe that ARMv8 only has instructions for the SHA256 hash function? So the other 5 SHA2 hash functions don't benefit from acceleration. Can easily see how much of an advantage the compiler using the ARMv8 SHA crypto instructions yields by comparing the 32 vs 64 bit results.
 

cytg111

Lifer
Mar 17, 2008
23,546
13,113
136
Huh. If it's really that comparable across platforms then an a8x at 2.0Ghz would be a beast. I could actually see them doing that. A 2.6-3Ghz modified a9 or a9x (so next gen) in a MacBook Air.

And WindowsRT becoming a relevant alternative.
 

MisterLilBig

Senior member
Apr 15, 2014
291
0
76
So it seems like a 20nm 3 or 4 core Denver chip would destroy the A8X.

Denver looks promising but we still don't have scores for the actual Denver K1 on a product, right? The score I have seen are at 2.5GHz, that's not in any product, to my knowledge.


Are there any real 'problems' to designing a tri-core chip?

Nintendo also went for a tri-core on the WiiU. And looking at the score of Denver TK1 at 2.5GHz, NV should have gone with a tri-core also, IMHO.
 

asendra

Member
Nov 4, 2012
156
12
81
I thought it was on 28nm but I wasn't sure.

So it seems like a 20nm 3 or 4 core Denver chip would destroy the A8X.

The thing is, you don't get extra points for releasing a chip in a lesser node. So, if and when nvidia release a 20nm Denver, it will be compared to whatever chips get released in that timeframe, more than likely a A9, and a A9X, at 14nm (link)

Also the A8X design is scaled up version of a phone chip, one that performs really well too, so they have already demonstrated it's versatility. I'm yet to see a scaled down version of Denver that fits in a phone design and still maintaining it's advantages.
 
Last edited:

ams23

Senior member
Feb 18, 2013
907
0
0
Well you have to remember that Apple has full control over the hardware and software stack for quicker time to market, and has a huge normalized SoC die size and transistor budget too. So for Tegra K1 Denver to compete with A8X (if not exceed A8X in some areas) in single-threaded CPU performance and GPU performance (with a more advanced and forward-looking GPU feature set too) is a great accomplishment. That said, NVIDIA, Qualcomm, Intel, etc. are only indirect competitors to Apple, because they provide SoC hardware for open ecosystems such as Android and/or Windows tablets and phones.

Note that with a 28nm fab. process, it was not really feasible to add more than two Denver CPU cores without blowing past SoC die size and transistor budget. And one has to think that the vast majority of mobile apps cannot take good advantage of more than two CPU cores in the first place (literally every single iOS product other than iPad Air 2 has no more than two CPU cores).

A8X is most certainly not a scaled up version of a phone chip. The GX6650 6-cluster GPU is entirely new and has not and will not be seen in a phone anytime soon (if at all), and a 3-core enhanced Cyclone CPU has not and will not be seen in a phone anytime soon (if at all) either (not to mention the fact that the A8X memory interface is wider and more power hungry than A8 too). On the other hand, Tegra K1 Denver is expected to make it's way into a high end smartphone(s) and is said to be less power hungry and more power efficient than the Tegra K1 Cortex variant.
 
Last edited:

jfpoole

Member
Jul 11, 2013
43
0
66
And, as you say, the effect of such will typically carry over to the non-x86 ISAs in terms of missing out on optimizations. (After all, I'd imagine much of that logic is more in the 'front end' of the compiler which wouldn't necessarily be ISA specific?)

That's our expectation as well. This approach won't help with bugs in the "backend" of the compiler (e.g., code generation) for architectures that aren't part of the automated system. Our goal for v4 is to have the automated system cover all of the architectures we ship so that we won't have to fall back on manual validation.

Anyway, with respect to the validity of the subject of this thread - comparing x86 to ARM by means of Geekbench - what manner of checks are performed on the non-x86 versions of Geekbench to ensure that the compiled code is similarly optimized? Or put another way, to ensure that the results for such comparisons are based on the architecture rather than the compiler? Because when I see architectures of different ISAs with similar execution resources producing markedly different results my first thought is that it's due to how the algorithm ends up being compiled to each ISA.

For Geekbench 3 all of the cross-ISA checks we performed were manual. We examined the results for "similar" processors with different instruction sets. We examined the generated code for the smaller kernels. Even enabling and disabling compiler optimizations and observing how that affected different architectures (in particular vectorization) was useful in tracking down issues.

If you know of cases where architectures with different ISAs but similar execution resources produced unexpected results with Geekbench 3 please let me know as we'd like to investigate further. While we won't be able to fix Geekbench 3 anything we find will help contribute to the v4 development process.
 

jfpoole

Member
Jul 11, 2013
43
0
66
Also, it's primarily SHA1 where A8X benefits. The SHA2 results are more in-line since I believe that ARMv8 only has instructions for the SHA256 hash function? So the other 5 SHA2 hash functions don't benefit from acceleration. Can easily see how much of an advantage the compiler using the ARMv8 SHA crypto instructions yields by comparing the 32 vs 64 bit results.

ARMv8 has instructions for SHA-1 and SHA-2 (SHA-256). Geekbench 3 only uses the ARMv8 instructions for the SHA-1 test (SHA-2 is a straightforward-ish software implementation). IIRC ARMv8 SHA-1 is 4x faster than C++ SHA-1 on the A8.

Happy to post 32-bit results for the A7, A8, and A8X if folks would find that useful?
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |