[Rumor, Tweaktown] AMD to launch next-gen Navi graphics cards at E3

Page 69 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
ARM is not faster than x86. Geekbench 4 may have the same code executed on ARM as is on x86. But the Instructions are 4 times bigger on x86. Which is the very reason why x86 has so huge advantage in everything in real world, than ARM. There may be 20% IPC(Instruction per clock) advantage for A76. But it still is just 25% of real world x86 performance becasue of robustness of instructions compared to ARM instructions. That is why I have said that on ISA level, x86 is 4 times wider.

And that test in PCWorld is just bloodbath. x86 is so much faster.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
When ARM can release an 8/16/32 core chip (real 8 core, not 4 fast and 4 slow clores) at 5ghz that can compete with an x86 chip at the same clock speed and core count in everyday scenarios then.... and only then it might start to challenge x86 though in reality ARM cores are small and very low clocked and to acheive that kind of parity, the advantage of power consumption on ARM vs x86 goes out of the window which just happens to be it's main USP vs x86, sure you could probably create an 8 core ARM chip that runs at 5GHZ with the same IPC as x86, but then you're TDP and power consumption are at best just the same as x86. So yes, ARM can sometimes match x86 IPC at low clock but for high clocks and complicated instruction sets at higher clocks and core counts, ARM would be no better and likely a lot worse.

https://www.pcworld.com/article/3323381/intel-vs-snapdragon-we-test-hps-envy-x2-with-both.html

2 cores vs 8

No contest really, ARM is suited to low power mobile devices, when you need to get the big boy stuff done, there's no comparison to a mature x86 architecture

ARM does not release implementations, they provide the architecture - the number of cores and the clockspeed is determined the SoC manufacturer not by ARM - both factors are limited by target TDP of the SoCs. Second most of the benchmarks in you link running x86 code under emulation - means they are totally worthless.
 

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
Why does everybody read something that I have not written here?

I'm reading what you did say previously:

Don't expect miracles for Nvidia GPUs on 7 nm process. They might not be more power efficient than AMD's GPUs.

That was a plain and clear statement: you said that Nvidia's upcoming 7nm GPUs might not be more efficient than AMD's 7nm GPUs. And I pointed out that would require an actual regression in Nvidia's efficiency with the node shrink, since it looks like Turing (on 12nm) already is more efficient than Navi. And an actual regression in perf/watt with a die shrink just won't happen. It contradicts everything we've ever seen in both the GPU and CPU markets.

Now, it's possible that the perf/watt gains for Ampere over Turing will be modest. I could see them getting 20-30% better performance at the same power consumption. Or maybe they'll really optimize for the process again like they did with Pascal, and/or improve the architecture, and get better performance plus lower power consumption. We can't necessarily bet on the latter, but nor can we rule it out. Nvidia, so far, has done quite an impressive job of generational gains ever since Fermi.
 
Reactions: beginner99

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
When ARM can release an 8/16/32 core chip (real 8 core, not 4 fast and 4 slow clores) at 5ghz that can compete with an x86 chip at the same clock speed and core count in everyday scenarios then.... and only then it might start to challenge x86

The truth is that only a relatively small minority of enthusiast systems run at or above 4 GHz. And the vast majority of PCs still have 4 or fewer cores.
 

ubern00b

Member
Jun 11, 2019
171
75
61
ARM does not release implementations, they provide the architecture - the number of cores and the clockspeed is determined the SoC manufacturer not by ARM - both factors are limited by target TDP of the SoCs. Second most of the benchmarks in you link running x86 code under emulation - means they are totally worthless.
So where are the ARM versions of the benchmarks provided? ohhhhh there are none, it's a small scale architecture that cannot cope with a superscaler architecture like x86 in everyday computing, yes at low TDP it's a decent architecture and suited to mobile computing AKA phones/tablets it has no where near the compute power of mature x86 processors and will not be "taking over" any time soon in our life time unless you think Intel and AMD has been wasting 65-125w TDP on nothing for the last 15 years just so ARM can out muscle them at a miniscule 15w TDP, I mean... c'mon man?
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
I'm reading what you did say previously:



That was a plain and clear statement: you said that Nvidia's upcoming 7nm GPUs might not be more efficient than AMD's 7nm GPUs. And I pointed out that would require an actual regression in Nvidia's efficiency with the node shrink, since it looks like Turing (on 12nm) already is more efficient than Navi. And an actual regression in perf/watt with a die shrink just won't happen. It contradicts everything we've ever seen in both the GPU and CPU markets.

Now, it's possible that the perf/watt gains for Ampere over Turing will be modest. I could see them getting 20-30% better performance at the same power consumption. Or maybe they'll really optimize for the process again like they did with Pascal, and/or improve the architecture, and get better performance plus lower power consumption. We can't necessarily bet on the latter, but nor can we rule it out. Nvidia, so far, has done quite an impressive job of generational gains ever since Fermi.
How efficient is 2560 CUDA core(ALU) Nvidia GPU, and how efficient is Navi with 2560 ALUs, that you claim it would require regression in efficiency?
 

ubern00b

Member
Jun 11, 2019
171
75
61
The truth is that only a relatively small minority of enthusiast systems run at or above 4 GHz. And the vast majority of PCs still have 4 or fewer cores.
Show me any mainstream desktop system that outsells it's counterparts in droves that have less than 4 cores?
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
So where are the ARM versions of the benchmarks provided? ohhhhh there are none, it's a small scale architecture that cannot cope with a superscaler architecture like x86 in everyday computing, yes at low TDP it's a decent architecture and suited to mobile computing AKA phones/tablets it has no where near the compute power of mature x86 processors and will not be "taking over" any time soon in our life time unless you think Intel and AMD has been wasting 65-125w TDP on nothing for the last 15 years just so ARM can out muscle them at a miniscule 15w TDP, I mean... c'mon man?

Just because a certain benchmark does not exist natively you conclude that ARM must be slower? Thats interesting reasoning to say the least.
I am sure that you are able to look up Geekbench 4 number for Cortex A76 in the web - but hey why not rely on totally worthless benchmark where ARM is emulating x96 code?

Just finished compiling 7-zip release and ran the single core benchmark both AArch64 and x64 with max optimizations and no usage of vector extension on both architectures.
Cortex A73@2.4GHz: 2137 MIPS
Skalyke 6700k@4GHz: 4210 MIPS

I do think that factor 2 is best case fpr ARM, normally (running different code) Skylake is between 2.5 and 2.8 faster. Also Geekbench should indicate slightly north of 2.5.
Anyway from these numbers you can easily conclude the performance level of Cortex A76 - (hint you can take Andrei's Cortex A76/A77 review as reference)

ps: If someone has a faster Windows ARM machine than Cortex A73, please send me PM and i will link for download the binaries.
pps: I notice we are quite a bit off topic...
 

ubern00b

Member
Jun 11, 2019
171
75
61
Just because a certain benchmark does not exist natively you conclude that ARM must be slower? Thats interesting reasoning to say the least.
I am sure that you are able to look up Geekbench 4 number for Cortex A76 in the web - but hey why not rely on totally worthless benchmark where ARM is emulating x96 code?

Just finished compiling 7-zip release and ran the single core benchmark both AArch64 and x64 with max optimizations and no usage of vector extension on both architectures.
Cortex A73@2.4GHz: 2137 MIPS
Skalyke 6700k@4GHz: 4210 MIPS

I do think that factor 2 is best case fpr ARM, normally (running different code) Skylake is between 2.5 and 2.8 faster. Also Geekbench should indicate slightly north of 2.5.
Anyway from these numbers you can easily conclude the performance level of Cortex A76 - (hint you can take Andrei's Cortex A76/A77 review as reference)

ps: If someone has a faster Windows ARM machine than Cortex A73, please send me PM and i will link for download the binaries.
pps: I notice we are quite a bit off topic...
I provided proof it wasn't as fast/efficient in a like for like scenario, if you care to do the same then by all means do. ARM is a small efficient architecture that can compete with x86 at low power and clocks if you're saying something different then maybe you know better than the likes of Apple, Samsung, Qualcomm etc and can make this little low powered chip beat out the likes of Intel and AMD who have been in the x86 game for 40 years, please do enlighten us with your superior intellect
 

ubern00b

Member
Jun 11, 2019
171
75
61
Geekbench... Geekbench? Come on man, my 1st gen Ryzen beats the crap out of your cortex a76 in geekbench it's well known for being inconsistent to say the least, oh and my lowly
Ryzen 1st gen 4ghz scores more than your skylake, where's your 4ghz ARM comparison? there is none cause it's a ultra low power CPU designed for mobiles that can probably run at 2.1ghz at a push
 

JasonLD

Senior member
Aug 22, 2017
486
447
136
ARM is not faster than x86. Geekbench 4 may have the same code executed on ARM as is on x86. But the Instructions are 4 times bigger on x86. Which is the very reason why x86 has so huge advantage in everything in real world, than ARM. There may be 20% IPC(Instruction per clock) advantage for A76. But it still is just 25% of real world x86 performance becasue of robustness of instructions compared to ARM instructions. That is why I have said that on ISA level, x86 is 4 times wider.

And that test in PCWorld is just bloodbath. x86 is so much faster.

First of all, that PCworld benchmark is running emulation on those apps so I wouldn't say that is a good representation of ARM performance.

I am sure someone with better knowledge would explain this better but x86 with 4 time bigger instruction doesn't mean it will perform 4 times better because traditional x86 instruction would take multiple cycles to perform that instruction. Reason why modern x86 processors had micro-op for like 20+ years to break down those instructions to make it work more like RISC processors.
I think CISC vs RISC is pretty much a moot point right now since modern processors have multiple extensions for vector operations so even ARM processors these days have micro-op.

When Adobe releases full version of Photoshop on iPad sometime later this year, perhaps we will see a better representation of real world performance on those ARM processors.
 
Reactions: soresu

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Geekbench... Geekbench? Come on man, my 1st gen Ryzen beats the crap out of your cortex a76 in geekbench it's well known for being inconsistent to say the least, oh and my lowly
Ryzen 1st gen 4ghz scores more than your skylake, where's your 4ghz ARM comparison? there is none cause it's a ultra low power CPU designed for mobiles that can probably run at 2.1ghz at a push

I would be willing to answer your question if i had the impression you are interested in learning something. Unfortunately this is not the case so i am out of here.
Have fun!
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
Just because a certain benchmark does not exist natively you conclude that ARM must be slower? Thats interesting reasoning to say the least.
I am sure that you are able to look up Geekbench 4 number for Cortex A76 in the web - but hey why not rely on totally worthless benchmark where ARM is emulating x96 code?

Just finished compiling 7-zip release and ran the single core benchmark both AArch64 and x64 with max optimizations and no usage of vector extension on both architectures.
Cortex A73@2.4GHz: 2137 MIPS
Skalyke 6700k@4GHz: 4210 MIPS

I do think that factor 2 is best case fpr ARM, normally (running different code) Skylake is between 2.5 and 2.8 faster. Also Geekbench should indicate slightly north of 2.5.
Anyway from these numbers you can easily conclude the performance level of Cortex A76 - (hint you can take Andrei's Cortex A76/A77 review as reference)

ps: If someone has a faster Windows ARM machine than Cortex A73, please send me PM and i will link for download the binaries.
pps: I notice we are quite a bit off topic...
You do realize that ARM is executing instructions that are 25% of x86 code?
First of all, that PCworld benchmark is running emulation on those apps so I wouldn't say that is a good representation of ARM performance.

I am sure someone with better knowledge would explain this better but x86 with 4 time bigger instruction doesn't mean it will perform 4 times better because traditional x86 instruction would take multiple cycles to perform that instruction. Reason why modern x86 processors had micro-op for like 20+ years to break down those instructions to make it work more like RISC processors.
I think CISC vs RISC is pretty much a moot point right now since modern processors have multiple extensions for vector operations so even ARM processors these days have micro-op.

When Adobe releases full version of Photoshop on iPad sometime later this year, perhaps we will see a better representation of real world performance on those ARM processors.
It is not moot point. 4 times bigger instructions are leading to scenario you saw in the PCworld tests.

And exact reason why x86 has dominated EVERYTHING important for past 30 years. If A76 would be as good as some say, we would be seeing HPC design wins with those CPUs. Who would not want free efficiency and performance, if they are really 20% faster in IPC than x86 while being core for core 5-10 times more power efficient?

But they are not as fast, because Software is simply too complicated for ARM to be viable option. Period.
I would be willing to answer your question if i had the impression you are interested in learning something. Unfortunately this is not the case so i am out of here.
Have fun!
Your point is beyond ridiculous. Based on one, completely useless benchmark you claim that A76 is good CPU. What does Geekbench show? Only people's biases towards certain architectures.

Lets discuss this when we will see anything meaningful running on A76 as fast, or faster than x86. Then we can talk.
 

JasonLD

Senior member
Aug 22, 2017
486
447
136
It is not moot point. 4 times bigger instructions are leading to scenario you saw in the PCworld tests.

Straight from PCworld article.
To make instructions for x86 work on ARM work, Microsoft and Qualcomm translate the binary instructions in real time. This translation eats performance.

Even the article provided the reason why the performance number is so low. 4 times bigger instructions doesn't mean anything the since the processor is not processing that instruction in single cycle as it is.

And exact reason why x86 has dominated EVERYTHING important for past 30 years. If A76 would be as good as some say, we would be seeing HPC design wins with those CPUs. Who would not want free efficiency and performance, if they are really 20% faster in IPC than x86 while being core for core 5-10 times more power efficient?

x86 was already dominating desktop so it already had a sustained revenue to provide fund for develop and improve their server processors, something RISC competitors in 90s didn't have and couldn't sustain development effort to keep pace with x86 server processors. Reason why x86 went on to completely dominate the server as well.
ARM is little different. It already dominates mobile and other low-power devices so it already has sustainable revenue stream to keep providing funds for further research and improve their architecture. ARM is already providing solid roadmap for next 3 years for their server reference design, with rapid improvement on IPC.
I am aware that it is difficult to break into the HPC market with established x86 ecosystem, but ARM can keep providing sustained effort to keep providing improved architecture every year as well as better complier set. I expect stronger challenge from ARM in next 2-3 years on server market.

But they are not as fast, because Software is simply too complicated for ARM to be viable option. Period.

That is simply not true.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
In my simple mind, sure, IPC on ARM compared to IPC on x86/64, at same clock speeds, can be very close. But ARM requires more instructions for the same action than x86/64, and as a result, it seems to make sense to me that any piece of software complex enough to be useful would end up running slower on ARM, even at same clock speeds, because while the IPC is roughly the same, it requires more instructions to complete the same action.

Example from a StackOverflow post:

x86:
repe cmpsb /* repeat while equal compare string bytewise */

ARM:
top:
ldrb r2, [r0, #1]! /* load a byte from address in r0 into r2, increment r0 after */
ldrb r3, [r1, #1]! /* load a byte from address in r1 into r3, increment r1 after */
subs r2, r3, r2 /* subtract r2 from r3 and put result into r2 */
beq top /* branch(/jump) if result is zero */


So for that one action, x86 only runs 1 cycle, but ARM has to run 4 5 cycles. But I don't know ISAs well enough to know whether these two things are saying the same thing

And I'm not a microprocessor engineer, nor do I know if that's even the right term for "those people who design architectures and ISAs and cool stuff." This clearly has to be more complicated than that.
 
Last edited:
Reactions: Glo.

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
In my simple mind, sure, IPC on ARM compared to IPC on x86/64, at same clock speeds, can be very close. But ARM requires more instructions for the same action than x86/64, and as a result, it seems to make sense to me that any piece of software complex enough to be useful would end up running slower on ARM, even at same clock speeds, because while the IPC is roughly the same, it requires more instructions to complete the same action.

Example from a StackOverflow post:

x86:
repe cmpsb /* repeat while equal compare string bytewise */

ARM:
ldrb r2, [r0, #1]! /* load a byte from address in r0 into r2, increment r0 after */
ldrb r3, [r1, #1]! /* load a byte from address in r1 into r3, increment r1 after */
subs r2, r3, r2 /* subtract r2 from r3 and put result into r2 */
beq top /* branch(/jump) if result is zero */


So for that one action, x86 only runs 1 cycle, but ARM has to run 4 cycles.

But I'm not a microprocessor engineer, nor do I know if that's even the right term for "those people who design architectures and ISAs and cool stuff." This clearly has to be more complicated than that.
Exactly. Thank you.
 

insertcarehere

Senior member
Jan 17, 2013
639
607
136

guachi

Senior member
Nov 16, 2010
761
415
136
AMD, overall. Which is why I'm surprised they don't want to start one in the consumer dGPU segment. Maybe they're afraid of knocking off NV for some reason?

Maybe they don't have the capacity? I'd like to think that AMD could handle a price war simply because of the smaller dies, but maybe not at first. I suspect supply will be constrained for the first two to three months, anyway. This can set them up well for price drops and supply increases come fall and into holiday season.

It's just speculation, of course. But I'd be willing to bet we see 5700XT cards at $350 five months from now.
 

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
In my simple mind, sure, IPC on ARM compared to IPC on x86/64, at same clock speeds, can be very close. But ARM requires more instructions for the same action than x86/64, and as a result, it seems to make sense to me that any piece of software complex enough to be useful would end up running slower on ARM, even at same clock speeds, because while the IPC is roughly the same, it requires more instructions to complete the same action.

Example from a StackOverflow post:

x86:
repe cmpsb /* repeat while equal compare string bytewise */

ARM:
top:
ldrb r2, [r0, #1]! /* load a byte from address in r0 into r2, increment r0 after */
ldrb r3, [r1, #1]! /* load a byte from address in r1 into r3, increment r1 after */
subs r2, r3, r2 /* subtract r2 from r3 and put result into r2 */
beq top /* branch(/jump) if result is zero */


So for that one action, x86 only runs 1 cycle, but ARM has to run 4 5 cycles. But I don't know ISAs well enough to know whether these two things are saying the same thing

And I'm not a microprocessor engineer, nor do I know if that's even the right term for "those people who design architectures and ISAs and cool stuff." This clearly has to be more complicated than that.

You can't determine execution time on any modern CPU by how many opcodes are used. On a modern CPU, opcodes are broken down into micro-ops and pipelined, so multiple opcodes may be executed in parallel (or speculatively, in the case of branches, which we recently found out causes a lot of potential security issues). And on x86, some of the older and less frequently used opcodes take multiple micro-ops.

In your example above, the rep cmps instruction is an antiquated instruction that most compilers don't generate any more because it is slower on a modern x86 CPU than just using regular comparison operations in a loop. For instance, on Skylake, rep cmps consists of 8 or more micro-ops per iteration, and on Ryzen, it's 9 micro-ops per iteration.
 
Status
Not open for further replies.
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |