Apple A9X Geekbench

Page 10 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
SPARC does indeed have SIMD if that's what you are implying (not sure you may just be referring to Oracle's chips). You don't need 256 bit wide SIMD to be "robust". Power and SPARC have robust SIMD support for what their target market is. If ARM wants to invade x86 territory they will have to reach x86 parity instruction wise.

Okay but you're not actually saying anything specific here. How are these other ISAs more robust than NEON?

I looked around and the only SIMD I could find for SPARC is VIS which was an old extension and it only supports datatypes like 4x16-bit or 4x8-bit.

IPC is instructions per clock which is the same thing as perf/MHz is it not? Unless you mean SIMD doesn't increase IPC but rather increases performance with all things considered. However, in a ideal CPU I'm sure SIMD would actually increase IPC because it means executing more instructions in parallel but that's in an ideal scenario.

No IPC and perf/MHz are not the same thing. That's only true if the instructions are the same. Changing SIMD means changing the instructions used. If I go from executing on average two 128-bit SIMD instructions per cycle to two 256-bit SIMD instructions per cycle the number of instructions per cycle hasn't gone up.
 

Hi-Fi Man

Senior member
Oct 19, 2013
601
120
106
IPC stands for instructions per clock not work per cycle. But if you use one instruction to do work on four pieces of data simultaneously instead of one, then you're doing more work even if your instruction takes two cycles instead of one.

This is true but in my original statement I didn't say it increased IPC, my later comments are really referring to "PPC" (performance per clock).
 

Hi-Fi Man

Senior member
Oct 19, 2013
601
120
106
Okay but you're not actually saying anything specific here. How are these other ISAs more robust than NEON?

I looked around and the only SIMD I could find for SPARC is VIS which was an old extension and it only supports datatypes like 4x16-bit or 4x8-bit

These other ISAs are more robust because they support more instructions/operations. SPARC not only has VIS but some decryption specific instructions (can't recall if they are SIMD or not). SPARC isn't too relevant to the conversation even if I barely mentioned it, so I don't see a reason to focus on it.

EDIT: Looks like Fujitsu has developed "SPARC64 X+/X SIMD" in addition to VIS. Just quickly browsing a whitepaper and it appears FMA, decimal FP and encryption/decryption are features.
 
Last edited:

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
These other ISAs are more robust because they support more instructions/operations. SPARC not only has VIS but some decryption specific instructions (can't recall if they are SIMD or not). SPARC isn't too relevant to the conversation even if I barely mentioned it, so I don't see a reason to focus on it.

EDIT: Looks like Fujitsu has developed "SPARC64 X+/X SIMD" in addition to VIS. Just quickly browsing a whitepaper and it appears FMA, decimal FP and encryption/decryption are features.

Even in this case the decimal FP and crypto stuff are not related to SIMD, ARM has FMA and crypto, and no one is seriously going to make a point about a lack of decimal FP hurting performance outside of very niche applications (hence why x86 doesn't have it either)

I don't really think you barely mentioned SPARC, this statement is kind of significant:

"Because it's still a phone/tablet design with all the limitations of the ARM ISA (lack of robust SIMD instructions hurts IPC and performance in comparison to x86, Power or SPARC)."

But I think you never had actual examples in mind when you said that the ARM ISA had limitations vs those other ISAs except 256-bit SIMD in AVX/2. And where this is concerned, I can look at numerous CPUs from Intel intended for desktops/laptops that also don't have AVX+, albeit not ones intended for the high end space. I don't think Apple really needs 256-bit SIMD in all of their laptop segments, it's not this pervasive thing that makes a huge difference across the board (I doubt most software is even compiled for it)
 

Hi-Fi Man

Senior member
Oct 19, 2013
601
120
106
Even in this case the decimal FP and crypto stuff are not related to SIMD, ARM has FMA and crypto, and no one is seriously going to make a point about a lack of decimal FP hurting performance outside of very niche applications (hence why x86 doesn't have it either)

I don't really think you barely mentioned SPARC, this statement is kind of significant:

"Because it's still a phone/tablet design with all the limitations of the ARM ISA (lack of robust SIMD instructions hurts IPC and performance in comparison to x86, Power or SPARC)."

But I think you never had actual examples in mind when you said that the ARM ISA had limitations vs those other ISAs except 256-bit SIMD in AVX/2. And where this is concerned, I can look at numerous CPUs from Intel intended for desktops/laptops that also don't have AVX+, albeit not ones intended for the high end space. I don't think Apple really needs 256-bit SIMD in all of their laptop segments, it's not this pervasive thing that makes a huge difference across the board (I doubt most software is even compiled for it)

The crypto is related to SIMD because the entire whitepaper was about the additional SIMD instructions being added which includes the crypto. ARM NEON crytpo only accelerates SHA and AES I believe, SPARC X+/X accelerates AES, DES, 3DES, RSA, SHA and DSA. In one of my earlier posts I indicated that current NEON seemed to have parity with SSE3 with FMA and AES-NI so no NEON isn't just missing an AVX equivalent there's SSSE3, SSE4.1 and SSE4.2.

Apple makes expensive laptops and desktops used for creation. A lot of these creation type applications (DAW, photo editing, video editing, etc.) use these SIMD instructions. In fact I remember setting up a hackintosh years back and having to use a custom kernel to emulate SSSE3, SSE4.1 and SSE4.2 on my AMD CPU otherwise a lot applications would crash because they relied on these instructions.

Fujitsu Whitepaper: http://www.fujitsu.com/global/docum.../sparc/downloads/documents/isv-swoc-wp-v1.pdf
 
Last edited:

Thala

Golden Member
Nov 12, 2014
1,355
653
136
The crypto is related to SIMD because the entire whitepaper was about the additional SIMD instructions being added which includes the crypto.

You mix things up here. The crypto support instructions in SPARC have nothing to do with SIMD.

SSSE3, SSE4.1 and SSE4.2 on my AMD CPU otherwise a lot applications would crash because they relied on these instructions.

You still fail to teach us, why SSE4.2 is more robust than NEON. In addition i would like to hear, why you think that creation type applications would be at disadvantage with NEON.
I programmed SSE and NEON myself and i can tell you that there is no significant difference, neither from performance nor from robustness. (Aside from the fact that there is no NEON Implementation with fixed performance. Typically the NEON implementation on the smaller cores like Cortex A5 is inherently slower from IPC point of view)
 
Last edited:

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
The crypto is related to SIMD because the entire whitepaper was about the additional SIMD instructions being added which includes the crypto. ARM NEON crytpo only accelerates SHA and AES I believe, SPARC X+/X accelerates AES, DES, 3DES, RSA, SHA and DSA.

You're reading the same paper I did and I'm pretty sure the crypto instructions are not related to SIMD here. It uses an entirely decoupled crypto engine which isn't really tied into the ISA. Not only can any other SoC vendors implement this but they actually have been for mobile platforms for a while now, including Apple.

In one of my earlier posts I indicated that current NEON seemed to have parity with SSE3 with FMA and AES-NI so no NEON isn't just missing an AVX equivalent there's SSSE3, SSE4.1 and SSE4.2.

I don't know where you got this statement about SSE3 parity or what it's supposed to mean. Many of the instructions added in SSSE3 and SSE4.1 do actually have AArch32 NEON equivalents:

PABSB/W/D are equivalent to VABS
PALIGNR is equivalent to VEXT
PSHUFB is basically equivalent to VTBX
PMULHRSW is basically equivalent to VQDMULH.s16
PHADDW/D is equivalent to VPADD
PHADDSW can be synthesized with VPADDL and VQMOVN
MPSADBW has similar functionality to VABA/VABAL with some permutations (it's more horizontal)
PMULDQ is like VMULL.s32 with a different ordering
PMULLD is equivalent VMUL.s32
The BLEND instructions have the same functionality as VBIT/VBIF/VBSL albeit with an immediate form
The PMIN/PMAX instructions are equivalent to VMIN/VMAX
The ROUND instructions are partially the same as VCVT, depending on the round modes needed
INSERT and EXTRACT are equivalent to various VMOV instructions
The sign/zero extraction instructions are partially supported by VMOVL instructions (when performing a single doubling of width)
PACKUSDW is equivalent to VQMOVUN.s64

So that really just leaves a small number of instructions that don't really have equivalents (PSIGN, PMADDUBSW, PHSUB, PHMINPOSUW, DPPS/DPPD, PCMPEQQ, MOVNTDQA) and AFAIK AArch64 has packed 64-bit compares.

SSE4.2 only adds a handful of instructions, mainly CRC32, some fixed length strength instructions, and PCMPGTQ (again should be supported by AArch64)

Meanwhile I can list various instructions in NEON that are not present in SSE4.2 ISA CPUs.

It's questionable how much either set of missing instructions are "needed" for good performance, since a lot of them are very niche and were added to satisfy a small number of developers for some fixed task.

Frankly I think you should just stop commenting on it, it's clear you aren't actually that familiar with either ISA and made statements about them that you are trying to justify after the fact.

Apple makes expensive laptops and desktops used for creation. A lot of these creation type applications (DAW, photo editing, video editing, etc.) use these SIMD instructions. In fact I remember setting up a hackintosh years back and having to use a custom kernel to emulate SSSE3, SSE4.1 and SSE4.2 on my AMD CPU otherwise a lot applications would crash because they relied on these instructions.

Apple has different brands of laptops, not all of them are commonly heavily used for content creation, particularly lower end MBAs. Apple doesn't have to move all or nothing at once.
 

Hi-Fi Man

Senior member
Oct 19, 2013
601
120
106
You're reading the same paper I did and I'm pretty sure the crypto instructions are not related to SIMD here. It uses an entirely decoupled crypto engine which isn't really tied into the ISA. Not only can any other SoC vendors implement this but they actually have been for mobile platforms for a while now, including Apple.



I don't know where you got this statement about SSE3 parity or what it's supposed to mean. Many of the instructions added in SSSE3 and SSE4.1 do actually have AArch32 NEON equivalents:

PABSB/W/D are equivalent to VABS
PALIGNR is equivalent to VEXT
PSHUFB is basically equivalent to VTBX
PMULHRSW is basically equivalent to VQDMULH.s16
PHADDW/D is equivalent to VPADD
PHADDSW can be synthesized with VPADDL and VQMOVN
MPSADBW has similar functionality to VABA/VABAL with some permutations (it's more horizontal)
PMULDQ is like VMULL.s32 with a different ordering
PMULLD is equivalent VMUL.s32
The BLEND instructions have the same functionality as VBIT/VBIF/VBSL albeit with an immediate form
The PMIN/PMAX instructions are equivalent to VMIN/VMAX
The ROUND instructions are partially the same as VCVT, depending on the round modes needed
INSERT and EXTRACT are equivalent to various VMOV instructions
The sign/zero extraction instructions are partially supported by VMOVL instructions (when performing a single doubling of width)
PACKUSDW is equivalent to VQMOVUN.s64

So that really just leaves a small number of instructions that don't really have equivalents (PSIGN, PMADDUBSW, PHSUB, PHMINPOSUW, DPPS/DPPD, PCMPEQQ, MOVNTDQA) and AFAIK AArch64 has packed 64-bit compares.

SSE4.2 only adds a handful of instructions, mainly CRC32, some fixed length strength instructions, and PCMPGTQ (again should be supported by AArch64)

Meanwhile I can list various instructions in NEON that are not present in SSE4.2 ISA CPUs.

It's questionable how much either set of missing instructions are "needed" for good performance, since a lot of them are very niche and were added to satisfy a small number of developers for some fixed task.

Frankly I think you should just stop commenting on it, it's clear you aren't actually that familiar with either ISA and made statements about them that you are trying to justify after the fact.



Apple has different brands of laptops, not all of them are commonly heavily used for content creation, particularly lower end MBAs. Apple doesn't have to move all or nothing at once.

I don't think Apple wants to split their OS X ecosystem into two different uarchs, that would be a nightmare to manage. I'm sure most Mac buyers would want a unified platform that can run photoshop on their Mac Pro and also on their MacBook Air without having to deal with compatibility or performance issues. Apple is a very monolithic company so I don't see them doing that.

Again if you read my post it wasn't stated as a fact that NEON was at parity with SSE3 but merely as a grain of salt of what I've heard from others who have had to port code over, I'm not a journalist and this is just a forum so relax. While it may be true that there are equivalents it's possible that these equivalents are more costly in terms of performance and at the same time your list shows there are still some instructions that don't have ARM equivalents. Right now ARM doesn't have anything equivalent to AVX, if you believe ARM has equivalent instructions to SSSE3 and SSE4x fine I won't argue you there but my original point still stands that in order for Apple to transition to ARM they will need something that can do 256 bit SIMD because there are applications out there that use it and no matter how niche these applications are users are going to expect near 100% compatibility. So I don't understand why you are going full crusade mode here but whatever.
 

Hi-Fi Man

Senior member
Oct 19, 2013
601
120
106
You still fail to teach us, why SSE4.2 is more robust than NEON. In addition i would like to hear, why you think that creation type applications would be at disadvantage with NEON.
I programmed SSE and NEON myself and i can tell you that there is no significant difference, neither from performance nor from robustness. (Aside from the fact that there is no NEON Implementation with fixed performance. Typically the NEON implementation on the smaller cores like Cortex A5 is inherently slower from IPC point of view)

Robust is just term used in this case to describe that x86 has instructions that ARM doesn't. It's that simple. Content creation applications would be at a disadvantage because like I said earlier some of them use instructions not provided by ARM.
 

SarahKerrigan

Senior member
Oct 12, 2014
602
1,467
136
You're reading the same paper I did and I'm pretty sure the crypto instructions are not related to SIMD here. It uses an entirely decoupled crypto engine which isn't really tied into the ISA. Not only can any other SoC vendors implement this but they actually have been for mobile platforms for a while now, including Apple.

Fujitsu SPARC (K and up, including the X/X+) has HPC-ACE, which if I recall is broadly comparable to AVX plus a few other oddities. The SPARC64 XIfx supercompute chip has a new version of HPC-ACE, which I'm not sure of the formal name of, which has additional features (and will probably by included in the next commercial SPARC64 as well.)
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
I don't think Apple wants to split their OS X ecosystem into two different uarchs, that would be a nightmare to manage. I'm sure most Mac buyers would want a unified platform that can run photoshop on their Mac Pro and also on their MacBook Air without having to deal with compatibility or performance issues. Apple is a very monolithic company so I don't see them doing that.

It'd be better for traditionally MacOS apps to end up on an ARM-based MacOS product than ported to an iOS product, but that's not stopping products like iPad Pro.

The legacy challenge is really in getting old apps ported, not getting new ones to support both platforms properly. Switching everything to their SoCs isn't really going to be less painful than switching just some things.


Again if you read my post it wasn't stated as a fact that NEON was at parity with SSE3 but merely as a grain of salt of what I've heard from others who have had to port code over, I'm not a journalist and this is just a forum so relax.

It sounds like you don't like being told you're wrong. Why is it so hard to just admit that? The way you shift around what you were clearly saying is kind of amazing.

these equivalents are more costly in terms of performance

And you keep doubling down with more statements like this. You don't know what you're talking about so just stop, please?

and at the same time your list shows there are still some instructions that don't have ARM equivalents.

Pretty much every instruction set has unique instructions, I told you I can name a bunch in NEON that weren't part of SSE4.2 so I guess by your notation that makes both more robust than the other.

in order for Apple to transition to ARM they will need something that can do 256 bit SIMD because there are applications out there that use it and no matter how niche these applications are users are going to expect near 100% compatibility. So I don't understand why you are going full crusade mode here but whatever.

I'm not going "full crusade", I'm just actually supporting my rebuttals against you rather than just simply stating you're wrong.

It's your opinion that Apple needs 256-bit SIMD across their lineup.. I guess if it was such a big deal there must have been a lot of sources praising the first Haswell Macs for this big performance boost in really important applications, right?

Fujitsu SPARC (K and up, including the X/X+) has HPC-ACE, which if I recall is broadly comparable to AVX plus a few other oddities. The SPARC64 XIfx supercompute chip has a new version of HPC-ACE, which I'm not sure of the formal name of, which has additional features (and will probably by included in the next commercial SPARC64 as well.)

From what I can find on HPC-ACE, it's like an instruction prefix that enables more registers:

The number of floating-point registers (FPR) in SPARC-V9 is 32. This is not sufficient to fully exploit the performance of many applications. However, to increase the number of registers, the 32-bit fixed instruction length in the SPARC architecture falls short and is difficult to modify. To solve this problem, HPC-ACE has created a new prefix instruction called SXAR (Set eXtended Arithmetic Register). For up to two subsequent instructions, the SXAR instructions perform operations such as register address extension. SXAR has extended the register address with two additional bits, and the number of addressable floating-point registers (FPR) has been increased up to 128; four times as many as are defined in SPARC-V9 (see Figure 7-3). Compilers use this large-capacity register to perform, among other uses, software pipeline optimization to fully exploit instruction level parallelism in applications.

And allows a simple paired processing SIMD for FP:

The SIMD enhancement processes data using multiple pipelines in parallel, all with a single instruction. By adopting SIMD, HPC-ACE enables the use of two FMA (Floating-point Multiply and Add) execution units with a single arithmetic instruction.

http://www.oracle.com/us/products/s...arc/fujitsu-m10/m10architecturewp-1924309.pdf

This is not really comparable with AVX+.

On SPARC64 X+ specifically:

Single Instruction Multiple Data (SIMD) instructions: SIMD instructions are supported in the SPARC64 X/X+ processor. Up to eight 8-bit data can be compared at the same time. This function will accelerate searching large amounts of data, compressing/decompressing data, in-memory database operations, etc

Only 64-bit SIMD. http://www.oracle.com/us/products/s...ujitsu-m10/fujitsu-m10-4/m10-4faq-1924312.pdf
 

Hi-Fi Man

Senior member
Oct 19, 2013
601
120
106
Honestly Exophase it just sounds like you want to be right all the time. I really don't understand what your problem is but filling the forum with posts like the one above and arguing every little point to death looks like a crusade to me. If I didn't like being wrong I would have completely disagreed with your rebuttals, which isn't the case. I keep stating this and maybe I should clarify that this is an opinion but I don't think Apple would switch to an ARM lineup yet because of the lack of 256 bit SIMD which would cause compatibility/performance issues for applications. That has been the base of my point the whole time so I'm not sure how you see shifting.

Exophase said:
Switching everything to their SoCs isn't really going to be less painful than switching just some things.

I'm not even sure what you are talking about here.

Exophase said:
Pretty much every instruction set has unique instructions, I told you I can name a bunch in NEON that weren't part of SSE4.2 so I guess by your notation that makes both more robust than the other.

Every ISA has strengths and weaknesses.

Exopase said:
It's your opinion that Apple needs 256-bit SIMD across their lineup.. I guess if it was such a big deal there must have been a lot of sources praising the first Haswell Macs for this big performance boost in really important applications, right?

AVX was introduced with sandy bridge. People were definitely happy with the jump from Westmere Mac Pros to Haswell Mac Pros though. I never said across the entire lineup either.

Exophase said:
It'd be better for traditionally MacOS apps to end up on an ARM-based MacOS product than ported to an iOS product, but that's not stopping products like iPad Pro.

What does the iPad Pro have to do with OS X or what I said? It doesn't run OS X applications.

Exophase said:
And you keep doubling down with more statements like this. You don't know what you're talking about so just stop, please?

With hostile comments like these you can't make an effective argument and you just come off looking like a know it all douche. I don't see how you can come to that conclusion either.
 
Last edited:

SarahKerrigan

Senior member
Oct 12, 2014
602
1,467
136
It'd be better for traditionally MacOS apps to end up on an ARM-based MacOS product than ported to an iOS product, but that's not stopping products like iPad Pro.

The legacy challenge is really in getting old apps ported, not getting new ones to support both platforms properly. Switching everything to their SoCs isn't really going to be less painful than switching just some things.




It sounds like you don't like being told you're wrong. Why is it so hard to just admit that? The way you shift around what you were clearly saying is kind of amazing.



And you keep doubling down with more statements like this. You don't know what you're talking about so just stop, please?



Pretty much every instruction set has unique instructions, I told you I can name a bunch in NEON that weren't part of SSE4.2 so I guess by your notation that makes both more robust than the other.



I'm not going "full crusade", I'm just actually supporting my rebuttals against you rather than just simply stating you're wrong.

It's your opinion that Apple needs 256-bit SIMD across their lineup.. I guess if it was such a big deal there must have been a lot of sources praising the first Haswell Macs for this big performance boost in really important applications, right?



From what I can find on HPC-ACE, it's like an instruction prefix that enables more registers:



And allows a simple paired processing SIMD for FP:



http://www.oracle.com/us/products/s...arc/fujitsu-m10/m10architecturewp-1924309.pdf

This is not really comparable with AVX+.

On SPARC64 X+ specifically:



Only 64-bit SIMD. http://www.oracle.com/us/products/s...ujitsu-m10/fujitsu-m10-4/m10-4faq-1924312.pdf

My mistake - I was thinking of HPC-ACE2, which extends SIMD to 256-bit, and allows 8 SP or 4 DP ops per instruction.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
With hostile comments like these you can't make an effective argument and you just come off looking like a know it all douche. I don't see how you can come to that conclusion either.

The issue of course is, that you are completely clueless when it comes to the different flavors of SIMD instruction sets, still you insisting on you original statements of lacking robustness and any disadvantage at content creation apps.
Your knowledge is hearsay at best. It gets very apparent that you did not work with either SIMD implementation - still you insisting.
I think it was civil request from Exophase to politely ask you to stop with this nonsense.
 
Last edited:

itsmydamnation

Platinum Member
Feb 6, 2011
2,863
3,417
136
Hi-Fi Man i think you should maybe ask Exophase what software he has developed that heavily uses SIMD . The man has actual experience in optimising SIMD for performance.
 

Nothingness

Platinum Member
Jul 3, 2013
2,751
1,397
136
BTW the comment I got that ARM SIMD was better than Intel SIMD was coming from an FFmpeg guy. It might be interesting to use FFmpeg assembly code to compare both instruction sets.
 

teejee

Senior member
Jul 4, 2013
361
199
116
Static binary translation of an executable is a non-decidable problem without assistance (see this for instance), so I'm afraid it's not a solution.


...

thanks for the link, clearly some hurdles to overcome.
But I don't think it is common to have code and data in same area. At least not on Windows, I usually turn on DEP for all executables since many years I and don't get any problems. And DEP has been around since XP SP2
Maybe it is different on Mac though.

Anyway, I still think it is possible with translation at install time for almost all applications. And then have a JIT compiler as "compatibility mode".
But I'm basically guessing here...

And there is a strong business case to move to ARM. Apple sell about 20 million Macs a year. And let us assume an average price of 130$ per Intel CPU and 30$ for an ARM SOC (foundry cost). This is a difference of 2 billion dollars per year (ballpark figure). So Apple can spend a huge amount of money on switching to ARM and still earn a lot of money.
It also gives them more control of timing and features.


[
 
Last edited:

Nothingness

Platinum Member
Jul 3, 2013
2,751
1,397
136
thanks for the link, clearly some hurdles to overcome.
But I don't think it is common to have code and data in same area. At least not on Windows, I usually turn on DEP for all executables since many years I and don't get any problems. And DEP has been around since XP SP2
Maybe it is different on Mac though.
IIRC you can for instance find (const) strings in text segment on Linux.

Anyway, I still think it is possible with translation at install time for almost all applications. And then have a JIT compiler as "compatibility mode".
But I'm basically guessing here...
FX!32 (x86 on Alpha) was using an advanced dynamic recompilation engine that stored persistent data. As far as I know this technique hasn't been used again. Might be something to look for...
 

Hi-Fi Man

Senior member
Oct 19, 2013
601
120
106
The issue of course is, that you are completely clueless when it comes to the different flavors of SIMD instruction sets, still you insisting on you original statements of lacking robustness and any disadvantage at content creation apps.
Your knowledge is hearsay at best. It gets very apparent that you did not work with either SIMD implementation - still you insisting.
I think it was civil request from Exophase to politely ask you to stop with this nonsense.

I wouldn't say clueless (haven't been fired yet!) because my original claims were about AVX are backed up by fact and it wasn't something really argued. The only thing I've insisted was that I believe AVX was a clear advantage for x86 and something ARM should have an alternative to for Apple and that OS X should be on one uarch not two. Everything was never stated as a fact and never insisted. A polite request does not involve insulting one's intelligence or acting superior.
 

Nothingness

Platinum Member
Jul 3, 2013
2,751
1,397
136
I wouldn't say clueless (haven't been fired yet!) because my original claims were about AVX are backed up by fact and it wasn't something really argued. The only thing I've insisted was that I believe AVX was a clear advantage for x86 and something ARM should have an alternative to for Apple and that OS X should be on one uarch not two. Everything was never stated as a fact and never insisted.
Can you demonstrate that AVX/AVX2 is used so much in OS X and its applications that having something half as wide (and half as fast) would make it and its applications significantly slower?
 

Hi-Fi Man

Senior member
Oct 19, 2013
601
120
106
Can you demonstrate that AVX/AVX2 is used so much in OS X and its applications that having something half as wide (and half as fast) would make it and its applications significantly slower?

Dolphin and PCSX2. They may be niche (maybe not, after all they are the most popular emus) but there is a significant boost in performance especially for Dolphin in using the extra width. PCSX2 mainly uses AVX/2 when software rendering which also gives a nice speedup.

I hear handbrake also uses AVX and SSE4.2 but I'm not sure on it's importance there.
 
Last edited:

Thanatosis

Member
Aug 16, 2015
102
0
0
Please more pages of Hi-Fi Man attempts to obfuscate or change the subject, so entertaining. So educating.

We should create a separate "Hi-Fi Man pwnage thread" where he can be humiliated without taking away from the real discussions about A9X here.





You do not get to insult other members. This is not P&N.

Stop it now.


esquared
Anandtech Forum Director
 
Last edited by a moderator:

Nothingness

Platinum Member
Jul 3, 2013
2,751
1,397
136
Dolphin and PCSX2. They may be niche (maybe not, after all they are the most popular emus) but there is a significant boost in performance especially for Dolphin in using the extra width. PCSX2 mainly uses AVX/2 when software rendering which also gives a nice speedup.
This PCSX2 forum post shows very limited speed up for AVX over SSE. It's between 0 and 20%.

I couldn't find any Dolphin number.

So yes definitely nice to have, but not something that in its own would make 256-bit width mandatory.
 

stingerman

Member
Feb 8, 2005
100
11
76
I would rather look to using the very efficient Metal GPU compute capabilities for vector processing. Metal now lives in both the iOS and OS X and will allow greater portability as well as longer legs as Apple continues to beef up the GPU resources.

I imagine internally Apple is recommending their framework developers to preference Metal since iOS and OS X have a vast common code base.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |