AnTuTu and Intel

jfpoole · Jul 12, 2013

Exophase said:
Thank you for the additional information on this.

Using uninitialized data would certainly explain some presence of denorms. But Jon Tyler's claim of 100% of the data set consisting of denorms seems very bizarre. If we're talking about single-precision floats the exponent field has to be 0 for a denorm and this doesn't include zero and negative zero. So given a uniform distribution there'd be a less than 1/256 (0.4%) chance of an uninitialized 32-bit floating being a denorm. Uninitialized data doesn't tend to really follow uniform distributions, it may well have a lot of stuff that looks like denorms due to containing 32-bit integers that aren't large enough (> 0x7FFFFF) to trip out of denorms. But they could also have a lot of zeroes, negative numbers, < 32-bit int datatypes, strings, pointers, code, etc etc that would tend to sway away from this. If you just so happen to have really bad luck in what sort of stuff you were doing beforehand it could have ended up with a dataset that's fully denormal, but I still find this pretty bizarre. It'd be good to get some measurements on this.

I do have to ask, is Geekbench doing anything to verify that the results of the benches are correct, like a checksum? IMO this is an important step in benchmarks, you want to at least make sure that the compiled code is doing what it's supposed to and not cutting corners that result in the wrong answers.

I'm not sure why Jon Tyler claimed 100% of the data set contained denormals. Given the nature of the bug between 30% and 40% of the operations were at risk of using denormals. I don't know how many of these operations actually used denormals, though; I'll have to write a test to check the actual data being referenced.

Geekbench 2 did not verify that the results of the workloads were correct (results were verified during development on the platform the developer was working on, but there were no automated checks in place that would run on all platforms).

Geekbench 3, on the other hand, will have extensive build-time tests that verify the workloads are operating as expected. Geekbench 3 will also have lightweight run-time tests as well, but we're putting most of our validation effort on tests that we run internally so that we can keep the Geekbench run-time as low as possible.

Concillian · Jul 12, 2013

VirtualLarry said:
Didn't someone say that there were "lies, damn lies, and benchmarks"?

And then there are phone benchmarks where the thermal allowance starts high... then slowly degrades as the chassis heats up.

Phone benchmarks are basically completely useless not that phone SoCs have gotten bigger and bigger gaps between turbo speeds and realistic steady state speeds. Nothing more frustrating than playing a game on your phone and performance starts fine, then 10 minutes in things are turning into a slide-show as speeds start throttling.

mrmt · Jul 12, 2013

Exophase said:
Have they though? You see they now put a disclaimer everywhere that says that their compilers are optimized for their processors (http://software.intel.com/sites/default/files/m/0/1/3/opt-notice-en_080411.gif)

As of September 2010 at least they didn't drop the biased dispatcher, see the post on 2010-09-22 here:

Yes, that's what I said in the post. The FTC wanted, among other things, Intel to drop the biased dispatcher but could not bend them on the issue even in the settlement.

PPB · Jul 12, 2013

Exophase said:
Have they though? You see they now put a disclaimer everywhere that says that their compilers are optimized for their processors (http://software.intel.com/sites/default/files/m/0/1/3/opt-notice-en_080411.gif)

I think so far they didnt, the only thing we got out of the settlement is a tiny sized paragraph text in ark's website, saying that their compiler may or may not run the most optimized code for other vendor's processors.

Exophase · Jul 12, 2013

mrmt said:
Yes, that's what I said in the post. The FTC wanted, among other things, Intel to drop the biased dispatcher but could not bend them on the issue even in the settlement.

Well I guess Intel didn't really have to do anything then

I really don't get the lengths they're going through in avoiding this. I don't know if they just don't like changing their approach on anything because it looks like they're admitting they're wrong or what. Their CPUs (Core series but I'm sure Silvermont, even Saltwell) are plenty competitive on their own merits and they don't need to resort to compiler games to be in a good position. This just damages their credibility.

Exophase · Jul 12, 2013

jfpoole said:
I'm not sure why Jon Tyler claimed 100% of the data set contained denormals. Given the nature of the bug between 30% and 40% of the operations were at risk of using denormals. I don't know how many of these operations actually used denormals, though; I'll have to write a test to check the actual data being referenced.

Geekbench 2 did not verify that the results of the workloads were correct (results were verified during development on the platform the developer was working on, but there were no automated checks in place that would run on all platforms).

Geekbench 3, on the other hand, will have extensive build-time tests that verify the workloads are operating as expected. Geekbench 3 will also have lightweight run-time tests as well, but we're putting most of our validation effort on tests that we run internally so that we can keep the Geekbench run-time as low as possible.

Thanks for the update. I'm looking forward to Geekbench 3.

jfpoole · Jul 12, 2013

Exophase said:
Thanks for the update. I'm looking forward to Geekbench 3.

If you (or anyone else) would like to try out a beta build of Geekbench 3 send me an email (john at primatelabs dot com) and I'll set you up. We're about a week away from having something ready for external testing.

Mondozei · Jul 13, 2013

CTho9305 said:
Great post, Exophase. A friend who works on an ARM design has been ranting for a while about Antutu and Geekbench and the quality of code currently coming out of JITs on ARM... I really hate the cross-ISA situation in terms of benchmarking. The worst part is that generally-credible reviewers don't caveat their articles enough, so people actually give credit to these results. It's worse than the 80s IPC comparisons across RISC/CISC because the macroscopic workload characteristics aren't even the same.

That was very interesting.

Now please explain to me like I was a 5 year old.

monstercameron · Jul 13, 2013

it seems antutu put out an emergency update http://thetechjournal.com/electroni...eats-intel-atom-after-benchmark-revised.xhtml

mrmt · Jul 13, 2013

Exophase said:
I really don't get the lengths they're going through in avoiding this. I don't know if they just don't like changing their approach on anything because it looks like they're admitting they're wrong or what. Their CPUs (Core series but I'm sure Silvermont, even Saltwell) are plenty competitive on their own merits and they don't need to resort to compiler games to be in a good position. This just damages their credibility.

From what I get they don't want the compiler as reference for anything, they want ICC to be the compiler on the market, but one that is only useful to Intel customers. And for most customers, this is a non issue as AMD market share is too small to bother. My company, for example, has over 400.000 PCs and servers scattered around multiple sites, not a single one is AMD.

StrangerGuy · Jul 13, 2013

monstercameron said:
it seems antutu put out an emergency update http://thetechjournal.com/electroni...eats-intel-atom-after-benchmark-revised.xhtml

They used to cheat in Quake 3 in the P4 days too, Q3 ran optimized codepath on the P4 but non-optimized for Athlon XP. With a modified dll the Athlon XP actually beat the P4 without any loss of image quality. Xbit had an article about it but I'm not sure whether it is still there. Not like it actually matters in practical terms since the framerates for either side are already in the hundreds.

sontin · Jul 13, 2013

monstercameron said:
it seems antutu put out an emergency update http://thetechjournal.com/electroni...eats-intel-atom-after-benchmark-revised.xhtml

So CloverTrail+ with a 1,8GHz 2C/4T Atom scores around 4800points in the CPU test. My 1,3GHZ Tegra 3 (A9) gets 6350points or 32% more.

Using the new numbers i don't think that Silvermont will be faster than A15.

krumme · Jul 13, 2013

monstercameron said:
it seems antutu put out an emergency update http://thetechjournal.com/electroni...eats-intel-atom-after-benchmark-revised.xhtml

Well that didnt last long. Lol
Welcome to the world with competition Intel !

nonworkingrich · Jul 13, 2013

Hopefully this will lead to other companies thinking twice before ruining their credibility for a fistful of dollars.

galego · Jul 13, 2013

monstercameron said:
it seems antutu put out an emergency update http://thetechjournal.com/electroni...eats-intel-atom-after-benchmark-revised.xhtml

Thanks by the link. An excerpt:

But a report by EE Times identified that the testing process has some discrepancies. Technical consulting firm BDTI pointed out that Intel's Atom processor wasnt executing all the instructions to be run during the RAM test. This artificially improved that results in favor of Intel Atom.

After receiving the criticism, AnTuTu revised its benchmarking tool for Android. The revised tests of the Atom Z2580 processor dropped overall scores by 20 percent for the processor.

This brings the ARM-based Samsung Exonys 5 Octa processor on top of Intel Atom Z2580 in terms of performance. Seems, Intel has a lot more to do to touch ARM-based processors in performance.

:whiste:

blackened23 · Jul 13, 2013

I can't wait for the product to be released to put these arguments to rest. Hopefully Intel will give good reason for the naysayers to initiate radio silence (as they have normally done in recent years)

parvadomus · Jul 13, 2013

Intel17 said:
Because ARM chips win Geekbench despite evidence that Geekbench is intentionally crippled on Intel processors. From a recent interview with Silvermont's lead architect,

Now we can safely say that Antutu is intentionally crippled on Intel processors too. :whiste:

StrangerGuy · Jul 13, 2013

parvadomus said:
Now we can safely say that Antutu is intentionally crippled on Intel processors too. :whiste:

Let's spin it this way: Intel didn't actually care about the mobile market, right? So who cares if they can't win one lousy benchmark.

In other news: PC shipments drop another 11% in Q2 2013. Looks somebody is quaking in their 22nm boots already...

http://www.dailytech.com/PC+Shipmen...enovo+Beats+HP+as+Top+Vendor/article31948.htm

AnandThenMan · Jul 13, 2013

I think it's safe to say the credibility of this benchmark is zero.

krumme · Jul 13, 2013

StrangerGuy said:
Let's spin it this way: Intel didn't actually care about the mobile market, right? So who cares if they can't win one lousy benchmark.

In other news: PC shipments drop another 11% in Q2 2013. Looks somebody is quaking in their 22nm boots already...

http://www.dailytech.com/PC+Shipmen...enovo+Beats+HP+as+Top+Vendor/article31948.htm

The pc market is in decline. The highend phone market looking to slow down in increase or even stagnate.

What is the perfect response?

Another fab and antutu on tab

(Edit: well "tap" not "tab" lol, beer on tab gives the wrong interpretation)

CTho9305 · Jul 13, 2013

Mondozei said:
CTho9305 said:

Great post, Exophase. A friend who works on an ARM design has been ranting for a while about Antutu and Geekbench and the quality of code currently coming out of JITs on ARM... I really hate the cross-ISA situation in terms of benchmarking. The worst part is that generally-credible reviewers don't caveat their articles enough, so people actually give credit to these results. It's worse than the 80s IPC comparisons across RISC/CISC because the macroscopic workload characteristics aren't even the same.

Click to expand...

That was very interesting.

Now please explain to me like I was a 5 year old.

Someone is trying to improve the performance of a future CPU. CPUs are very complicated, and you have to pick and choose which things you make fast (if you try making everything fast, your CPU will be too expensive and use batteries too quickly). To pick the right parts to make fast, you need to look at many programs and see what they want most from the CPU. Benchmarks are special programs that act like many different programs, so instead of looking at 100 different programs, people just look at a few benchmarks.

It turns out that the benchmarks people use now are terrible. They do a very bad job of measuring what they're supposed to measure; they don't act like the programs they're supposed to act like. It's even worse because they don't measure the same thing on ARM CPUs as they do on x86 CPUs. One of the CPU makers may even be deliberately cheating in how they handle the benchmark programs. That means when somebody uses them to compare an x86 CPU like Jaguar to an ARM CPU like A15, the winner of the benchmark might not run real programs faster. If somebody who cares about running real programs well uses e.g. Anandtech's reviews to help pick a phone, the bad benchmarks might trick them into buying a slower phone.

StrangerGuy · Jul 14, 2013

krumme said:
The pc market is in decline. The highend phone market looking to slow down in increase or even stagnate.

What is the perfect response?

Another fab and antutu on tab

(Edit: well "tap" not "tab" lol, beer on tab gives the wrong interpretation)

Taiwanese makers can make cheaper and better mobos? We can always go back to sell x86 chips.

Consumer SSDs not profitable? Nah, We can always go back to sell x86 chips.

Atheros & Co can make cheaper, good enough Wi-Fi adapters? We can always go back to sell x86 chips.

Nobody except Apple wants Thunderbolt? Nah, we can always go back to sell x86 chips.

Netbooks and consoles going AMD? Nah, We can always go back to sell x86 chips.

Market doesn't want x86 chips anymore? Nah, we can always go back to sell x86 chips. Oh wait-well, we can always sell cable TV, lols.

krumme · Jul 14, 2013

That is cable tv with build in Intel experience index, supplied by ms

R0H1T · Jul 14, 2013

AnandThenMan said:
I think it's safe to say the credibility of this benchmark is zero.

I think the same goes for anything related to Intel as in their OTP claims & ridiculous predictions, like SDP

CHADBOGA · Jul 14, 2013

R0H1T said:
I think the same goes for anything related to Intel as in their ridiculous claims & hyperbolic predictions, like SDP

Never forget, Intel's SDP > AMD's ACP.

AnTuTu and Intel

Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Junior Member

Golden Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Platinum Member

Platinum Member