AnTuTu and Intel

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

PPB

Golden Member
Jul 5, 2013
1,118
168
106
Funny because my Mediatek quad A7 phone feels really fast. If ~20% slower than a Galaxy S3 is slow that is so off the mark it's ridiculous.

I'm on a OMAP A9 single core 800mhz phone. Only thing that feels sluggish is Watsapp scrolling. TBH i think anything more than a 2 core A15 would feel overkill in a phone.


But hey, MOAR COARS always good, right?
 

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
I'm on a OMAP A9 single core 800mhz phone. Only thing that feels sluggish is Watsapp scrolling. TBH i think anything more than a 2 core A15 would feel overkill in a phone.


But hey, MOAR COARS always good, right?

you talking to the cpu guys or the gpu guys?
 
Last edited:

PPB

Golden Member
Jul 5, 2013
1,118
168
106
No, talking about those phones with 4 A15+ 4 A7 cores. I think even a 1+1 big.LITTLE configuration would be ok for most of our smartphone needs.
 
Mar 10, 2006
11,715
2,012
126
No, talking about those phones with 4 A15+ 4 A7 cores. I think even a 1+1 big.LITTLE configuration would be ok for most of our smartphone needs.

How about 4 power efficient Krait cores? Qualcomm has run circles around ARM's own designs with its Krait.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
I'm on a OMAP A9 single core 800mhz phone. Only thing that feels sluggish is Watsapp scrolling. TBH i think anything more than a 2 core A15 would feel overkill in a phone.


But hey, MOAR COARS always good, right?

Single core Cortex-A9 OMAP @ 800MHz doesn't exist. Did you mean Cortex-A8?
 

PPB

Golden Member
Jul 5, 2013
1,118
168
106
Single core Cortex-A9 OMAP @ 800MHz doesn't exist. Did you mean Cortex-A8?

Thanks for the correction, its the 800 Mhz TI OMAP3610(800), it sports A8 instead of A9.

Seems I have settled for two years with even less performance than I tought :biggrin:
 

Nothingness

Platinum Member
Jul 3, 2013
2,734
1,375
136
How about 4 power efficient Krait cores? Qualcomm has run circles around ARM's own designs with its Krait.
Yeah especially in MSM82x. Funny how ARM haters will never understand and will always think extreme performance is paramount

EDIT or Intel fanboys, I don't know
 

PPB

Golden Member
Jul 5, 2013
1,118
168
106
Yep seems like 2 Core Krait Variants seem to be the best compromise of performance and battery life. Seems my next smartphone would be a S400 based one
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
Not quite. The A7 is really slow. About on par with an A8. What you might not notice is dual core A9. The performance difference between my Barnes & Noble Nook (Cortex A8) and Nook Tablet (dual-core Cortex A9) was night and day.

You might be right, we dont have anything slower than dual core a9 in the house all running 4.1 and up. For me its all the same on performance. But even so it just shows the importance of the soc power is over rated. Cpu power is solid, and i still dont get what games people play to use those tons of gpu power.
 

beginner99

Diamond Member
Jun 2, 2009
5,223
1,598
136
I have another possible explanation: Razr I used an Intel designed platform both HW and SW and was hence highly tuned.

Link to this? AFAIK it's a Google-controlled phone since Motorola = Google. Of course Google is interested too in showing that Android runs great on x86.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Link to this? AFAIK it's a Google-controlled phone since Motorola = Google. Of course Google is interested too in showing that Android runs great on x86.

When was the Google acquisition completed, May 2012? Although RAZR i wasn't released until September 2012 it was announced that Motorola was going to release a Medfield phone way back in January 2012. In fact it was said that Intel and Motorola entered a multi-year contract. Somehow I don't think Google had a huge influence on either the business or engineering end of this.

It'd be kind of weird if RAZR i was a big Intel design though, last I checked it seemed to share a lot in common with RAZR m. But most of the other Medfield phones were Intel's reference platform.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
I have another possible explanation: Razr I used an Intel designed platform both HW and SW and was hence highly tuned.

Personally I'd rather buy a highly tuned product versus the product that just kind of got muddled along through a "the status quo is OK" RIM-style produce pipeline.

One is the product of a hungry competitor, the other is relying on the assumption of having a captured audience that won't wander.
 

beginner99

Diamond Member
Jun 2, 2009
5,223
1,598
136
When was the Google acquisition completed, May 2012? Although RAZR i wasn't released until September 2012 it was announced that Motorola was going to release a Medfield phone way back in January 2012. In fact it was said that Intel and Motorola entered a multi-year contract. Somehow I don't think Google had a huge influence on either the business or engineering end of this.

It'd be kind of weird if RAZR i was a big Intel design though, last I checked it seemed to share a lot in common with RAZR m. But most of the other Medfield phones were Intel's reference platform.

OK. Did not know that.

Yes the RAZR M and I have the exact same body /chassis. Since I own it I can say it feels very, very solid and well built.
 

mrob27

Member
Aug 14, 2012
29
0
0
www.mrob.com
John from Primate Labs here (the company behind Geekbench).
[...]

We only found out about this issue a couple of months ago. Given that Geekbench 3 will be out in August, and fixing the issue in Geekbench 2 would break the ability to compare Geekbench 2 scores, we made the call not to fix the issue in Geekbench 2.

If you've got any questions about this (or about anything Geekbench) please let me know and I'd be happy to answer them. [...]

Geekbench needs to measure multi-threaded memory bandwidth (in both the "Memory Performance" and "Stream Performance" sections. I have routinely ignored the overall Geekbench score and looked only at the first two sections (integer and floating-point) because of this problem.

Since Nehalem and Athlon 64, multi-socket workstations have had more memory bandwidth than what a single thread can utilize or measure. In more recent generations the same is true even for single-socket systems.

This matters because it is common for multiple threads to be active at the same time. In normal use scenarios (e.g. browsing during an anti-virus scan), there are multiple separate tasks which share no memory between them so there is no issue of data being transferred from one thread's L1/L2 cache to another.

To measure multi-threaded memory bandwidth, create two or more threads and have each one perform the same test as your existing single-threaded test. Each thread should use its own separate buffer(s), and steps should be taken to ensure that all tests are running during the entire period of time that the measurement is being taken (this might mean that some threads end up doing "more work" while they're waiting for the measurement interval to elapse).
 

Concillian

Diamond Member
May 26, 2004
3,751
8
81
Yep seems like 2 Core Krait Variants seem to be the best compromise of performance and battery life. Seems my next smartphone would be a S400 based one

I've come to this conclusion as well. I'd rather have battery life than a bunch more cores.
 

Nothingness

Platinum Member
Jul 3, 2013
2,734
1,375
136
I had some free time so I decided to play with icc 13.

First a handy site: http://gcc.godbolt.org/. This allows you to enter code and see the resulting assembly after compilation with gcc, icc, clang and gcc-arm.

First version of the code:
Code:
void r(unsigned *bitmap)
{
  unsigned baddr = 0;
  unsigned nb = 32;

  while (nb--) {
    bitmap[baddr >> 5] |= 1 << (baddr & 0x1f);
    baddr++;
  }
}
icc -O3 -m64
Code:
L__routine_start__Z1rPj_0:
r(unsigned int*):
        xorl      %ecx, %ecx                                    #3.18
        movl      $31, %eax                                     #6.10
..B1.2:                         # Preds ..B1.2 ..B1.1
        movl      %ecx, %edx                                    #7.21
        movl      $1, %esi                                      #7.41
        shrl      $5, %edx                                      #7.21
        decl      %eax                                          #6.10
        shll      %cl, %esi                                     #7.41
        incl      %ecx                                          #8.5
        orl       %esi, (%rdi,%rdx,4)                           #7.5
        cmpl      $-1, %eax                                     #6.10
        jne       ..B1.2        # Prob 82%                      #6.10
        ret                                                     #10.1
icc -O3 -m32
Code:
L__routine_start__Z1rPj_0:
r(unsigned int*):
        pushl     %esi                                          #2.1
        pushl     %edi                                          #2.1
        pushl     %esi                                          #2.1
        lea       32, %ecx                                      #
        xorl      %edx, %edx                                    #
        movl      16(%esp), %eax                                #
        movl      %edx, %esi                                    #
        andl      $31, %edx                                     #
        shrl      $5, %esi                                      #
        lea       (%eax,%esi,4), %eax                           #
        movl      %ecx, %esi                                    #
        addl      %edx, %ecx                                    #
        cmpl      $32, %ecx                                     #
        jbe       ..L10         # Prob 50%                      #
        movl      %ecx, %esi                                    #
        movl      %edx, %ecx                                    #
        movl      $-1, %edi                                     #
        shll      %cl, %edi                                     #
        orl       %edi, (%eax)                                  #
        subl      $32, %esi                                     #
        addl      $4, %eax                                      #
        movl      $-1, %edi                                     #
        cmpl      $32, %esi                                     #
        jbe       ..L11         # Prob 50%                      #
..L12:                                                          #
        movl      %edi, (%eax)                                  #
        addl      $4, %eax                                      #
        subl      $32, %esi                                     #
        cmpl      $32, %esi                                     #
        ja        ..L12         # Prob 50%                      #
..L11:                                                          #
        movl      $32, %ecx                                     #
        subl      %esi, %ecx                                    #
        shrl      %cl, %edi                                     #
        orl       %edi, (%eax)                                  #
        jmp       ..L13         # Prob 100%                     #
..L10:                                                          #
        movl      $-1, %edi                                     #
        movl      $32, %ecx                                     #
        subl      %esi, %ecx                                    #
        shrl      %cl, %edi                                     #
        movl      %edx, %ecx                                    #
        shll      %cl, %edi                                     #
        orl       %edi, (%eax)                                  #
..L13:                                                          #
        popl      %ecx                                          #10.1
        popl      %edi                                          #10.1
        popl      %esi                                          #10.1
        ret                                                     #10.1
This is hilarious: the compiler is able to make use of the set all bits to 1 trick, but can't see that the loop count is constant!

So I decided to change the function to the equivalent code using bytes instead of ints:
Code:
void r(unsigned char *bitmap)
{
  unsigned baddr = 0;
  unsigned nb = 32;

  while (nb--) {
    bitmap[baddr >> 3] |= 1 << (baddr & 0x7);
    baddr++;
  }
}
icc -O3 -m64
Code:
L__routine_start__Z1rPh_0:
r(unsigned char*):
        xorl      %edx, %edx                                    #3.18
        movl      $31, %eax                                     #6.10
..B1.2:                         # Preds ..B1.2 ..B1.1
        movl      %edx, %esi                                    #7.21
        movl      %edx, %ecx                                    #7.41
        shrl      $3, %esi                                      #7.21
        andl      $7, %ecx                                      #7.41
        movl      $1, %r8d                                      #7.41
        shll      %cl, %r8d                                     #7.41
        decl      %eax                                          #6.10
        incl      %edx                                          #8.5
        orb       %r8b, (%rsi,%rdi)                             #7.5
        cmpl      $-1, %eax                                     #6.10
        jne       ..B1.2        # Prob 82%                      #6.10
        ret                                                     #10.1
icc -O3 -m32
Code:
L__routine_start__Z1rPh_0:
r(unsigned char*):
        pushl     %esi                                          #2.1
        pushl     %edi                                          #2.1
        pushl     %ebx                                          #2.1
        xorl      %edx, %edx                                    #
        movl      16(%esp), %ecx                                #1.6
        movl      $31, %eax                                     #
        movl      %ecx, %esi                                    #
..B1.2:                         # Preds ..B1.2 ..B1.1
        movl      %edx, %edi                                    #7.21
        movl      %edx, %ecx                                    #7.41
        shrl      $3, %edi                                      #7.21
        andl      $7, %ecx                                      #7.41
        movl      $1, %ebx                                      #7.41
        decl      %eax                                          #6.10
        shll      %cl, %ebx                                     #7.41
        incl      %edx                                          #8.5
        orb       %bl, (%esi,%edi)                              #7.5
        cmpl      $-1, %eax                                     #6.10
        jne       ..B1.2        # Prob 82%                      #6.10
        popl      %ebx                                          #10.1
        popl      %edi                                          #10.1
        popl      %esi                                          #10.1
        ret                                                     #10.1
And now in 32-bit mode, the set all bits to 1 trick simply disappeared even though it is still applicable at the byte level (the same applies to 16-bit and 64-bit arrays).

One can also play with decreasing baddr (from 31) instead of increasing; again the optimization disappears.

One can also pass two arrays and do the or assignment to the two arrays; again the optimization disappears.

I know some will still deny, but all my doubts have vanished: icc is definitely cheating.
 

jfpoole

Member
Jul 11, 2013
43
0
66
Geekbench needs to measure multi-threaded memory bandwidth (in both the "Memory Performance" and "Stream Performance" sections.

All of the memory workloads will be multi-threaded in v3. The biggest difference is on multi-socket systems, but we've even seen a boost on some Android and iOS handsets.
 

ashetos

Senior member
Jul 23, 2013
254
14
76
I had some free time so I decided to play with icc 13.

First a handy site: http://gcc.godbolt.org/. This allows you to enter code and see the resulting assembly after compilation with gcc, icc, clang and gcc-arm.

First version of the code:
Code:
void r(unsigned *bitmap)
{
  unsigned baddr = 0;
  unsigned nb = 32;

  while (nb--) {
    bitmap[baddr >> 5] |= 1 << (baddr & 0x1f);
    baddr++;
  }
}
This is hilarious: the compiler is able to make use of the set all bits to 1 trick, but can't see that the loop count is constant!

So I decided to change the function to the equivalent code using bytes instead of ints:
Code:
void r(unsigned char *bitmap)
{
  unsigned baddr = 0;
  unsigned nb = 32;

  while (nb--) {
    bitmap[baddr >> 3] |= 1 << (baddr & 0x7);
    baddr++;
  }
}
And now in 32-bit mode, the set all bits to 1 trick simply disappeared even though it is still applicable at the byte level (the same applies to 16-bit and 64-bit arrays).

One can also play with decreasing baddr (from 31) instead of increasing; again the optimization disappears.

One can also pass two arrays and do the or assignment to the two arrays; again the optimization disappears.

I know some will still deny, but all my doubts have vanished: icc is definitely cheating.

In the first case we have the expression:
1 << (baddr & 0x1f)
which sets all 32 bits of the bitmap for baddr (31...0)

In the second case we have the expression:
1 << (baddr & 0x7)
which only works correctly for baddr (7...0).
For all baddr values greater than 7 the result is always 0x01 which means not all 32 bits of the bitmap are set to 1

These pieces of code are already for decreasing baddr. If we are to conclude intel is cheating I think you should post different variations of the code so that there is no doubt.
 

Nothingness

Platinum Member
Jul 3, 2013
2,734
1,375
136
In the first case we have the expression:
1 << (baddr & 0x1f)
which sets all 32 bits of the bitmap for baddr (31...0)

In the second case we have the expression:
1 << (baddr & 0x7)
which only works correctly for baddr (7...0).
For all baddr values greater than 7 the result is always 0x01 which means not all 32 bits of the bitmap are set to 1

These pieces of code are already for decreasing baddr. If we are to conclude intel is cheating I think you should post different variations of the code so that there is no doubt.
Oh I tried a for loop going up or down, and also 8 bits for the char case. In all of these cases the optimisation disappears. On my side I'm 99.9% confident Intel cheated.

EDIT: You got it wrong for the & 7 case
 

Schmide

Diamond Member
Mar 7, 2002
5,588
719
126
In the first case we have the expression:
1 << (baddr & 0x1f)
which sets all 32 bits of the bitmap for baddr (31...0)

In the second case we have the expression:
1 << (baddr & 0x7)
which only works correctly for baddr (7...0).
For all baddr values greater than 7 the result is always 0x01 which means not all 32 bits of the bitmap are set to 1

These pieces of code are already for decreasing baddr. If we are to conclude intel is cheating I think you should post different variations of the code so that there is no doubt.

Those are masks to get the bit address, not to set it. The first 3 bits (0x07 in bits is 00000111) is masked to the address and thus equals the 0-7 bit address of the byte. The 1 is then left shifted and ored with the byte.

Think about it 10 is greater than 7 and corrispondes to byte 1 (10>>3) and bit 2 (10 & 7).
 

ashetos

Senior member
Jul 23, 2013
254
14
76
Those are masks to get the bit address, not to set it. The first 3 bits (0x07 in bits is 00000111) is masked to the address and thus equals the 0-7 bit address of the byte. The 1 is then left shifted and ored with the byte.

Think about it 10 is greater than 7 and corrispondes to byte 1 (10>>3) and bit 2 (10 & 7).
They are masks, and for baddr (8...31) the mask is always 0 (baddr & 0x07) which always results in setting only the first bit for those bytes, which as I said is 0x01. This means, one byte is full of ones, the rest three bytes are only 0x01. Right?
 

dastral

Member
May 22, 2012
67
0
0
Single core Cortex-A9 OMAP @ 800MHz doesn't exist. Did you mean Cortex-A8?

Do we have a reliable website with ARM Benches ?
I have a Galaxy Tab 7 (the first one) and i'm looking for a 6 to 7" phone/tablet.

I've seen a few ones from 200&#8364; (asus fonepad) to 500&#8364; (samsung galaxy mega).
I just need something that can play twitch.tv streams at 480p (or more) something the First Tab 7 can't do anymore....(don't ask me why)

I assume it's mostly CPU performance, but... i can't find a shop with "open WIFI" that will let me test twitch.tv APK before buying
 

Schmide

Diamond Member
Mar 7, 2002
5,588
719
126
They are masks, and for baddr (8...31) the mask is always 0 (baddr & 0x07) which always results in setting only the first bit for those bytes, which as I said is 0x01. This means, one byte is full of ones, the rest three bytes are only 0x01. Right?


A walk through

The first 10 numbers

Code:
#      byte address              bit address             first 2 bytes in bits
          (# >> 3)                (# & 0x07)

0             0                         0                 0000 0000 0000 0001
1             0                         1                 0000 0000 0000 0011
2             0                         2                 0000 0000 0000 0111
3             0                         3                 0000 0000 0000 1111
4             0                         4                 0000 0000 0001 1111
5             0                         5                 0000 0000 0011 1111
6             0                         6                 0000 0000 0111 1111
7             0                         7                 0000 0000 1111 1111
8             1                         0                 0000 0001 1111 1111
9             1                         1                 0000 0011 1111 1111
10            1                         2                 0000 0111 1111 1111
 

Nothingness

Platinum Member
Jul 3, 2013
2,734
1,375
136
baddr & 7 is the same as baddr % 8 (modulus) in case you have math notions.
This means for b in [0..7] b & 7 = b, for b in [8..15] b & 7 = b-8, etc.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |