Question Intel's new x86 instruction sets: APX and AVX10

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

soresu

Platinum Member
Dec 19, 2014
2,959
2,181
136
I know nothing, but this seems like a mess.
Makes me think that Intel should go to AMD and discuss developing together a new architecture with sane vector instructions.
They should.

But that would imply a desire to curate a healthy x86 ecosystem to compete with the growing threats of ARM64 and RV effectively rather than shallow attempts to one up against their singular rival in the x86 market with short lived ISA extension boosts.

(or collaborating with the UHD Bluray consortium and Cyberlink for a useless SGX secured playback tie in that only worked on a very limited Intel platform..... still salty about that mess years later 😒)

I think that the only time they didn't completely take AMD by surprise was with AVX, as AMD's own proposed SSE5 instruction set already covered much of it, and the final XOP extensions comprised the remainder of SSE5 that wasn't in AVX.

Obviously unfortunately being ahead of the game there didn't help really as SSE5/XOP was introduced with Bulldozer 😅
 

Doug S

Platinum Member
Feb 8, 2020
2,486
4,049
136
Sir, i bet you composed this forum post on ZX Spectrum, right? Not much else to say about the rest of Your post.

I notice you didn't answer the question you quoted. Why does it matter to run Javascript faster on a brand new x86 CPU than the very fast speed it would already run?

This is being written on a PC with a Skylake CPU that is still plenty fast for any web browsing or Javascript code. If you waved a magic wand and starting tomorrow it would run Javascript twice as fast I doubt I would even notice.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,101
136
I notice you didn't answer the question you quoted. Why does it matter to run Javascript faster on a brand new x86 CPU than the very fast speed it would already run?

This is being written on a PC with a Skylake CPU that is still plenty fast for any web browsing or Javascript code. If you waved a magic wand and starting tomorrow it would run Javascript twice as fast I doubt I would even notice.
Running Javascript fast is probably the single most important improvement you could provide to the average user, given how much time is spent directly in internet browsers, or apps leveraging web technology (i.e. Electron). Certainly becomes more noticeable as a PC ages.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,101
136
And on the topic of ISA extensions in the context of the web, my go-to example of "RISC vs CISC is meaningless" is this beauty ARM added a few years back.
FJCVTZS - Floating-point Javascript Convert to Signed fixed-point, rounding toward Zero
Yes, an instruction added explicitly for Javascript acceleration.

 

Tuna-Fish

Golden Member
Mar 4, 2011
1,422
1,759
136
Yes, an instruction added explicitly for Javascript acceleration.

This is not as silly as it sounds. Another way to describe that instruction is: "convert floats to int like x86 does". This was adopted as the way js does things pretty much by default, which meant that computers that didn't have that behavior ended up a lot slower than x86 at doing some reasonably common js idioms. When js runs half the world, fixing that performance deficit is just sane, and FJCVTZS is less embarrassing of a name than Fx86CVTZS.

(The difference in behavior is how the instruction operates when the result of converting a float into i32 doesn't fit into i32. x86 tosses the high bits and gives the result mod 2^32. ARM normally clamps the result into the highest representable i32 value. Both implementations are standards-conforming, probably because the people who wrote the standard couldn't imagine why it would ever matter, the situation is always a failure to do the conversion properly and there are ways of signaling that, why would the failed result matter?

But then someone made a programming language where all numbers are always floats, and you can do bitwise operations on the numbers, which only makes sense if you turn them into integers first, and because he was in a ridiculous hurry, he just used the built-in "convert float to int" instruction, and probably never even considered what happens if the float is larger than what is representable by i32, and shipped it because the language needed to be done in literal two weeks. And then the entire freaking world adopted it, and the behavior on the hardware that they happened to be running became what the behavior actually needs to be matched to render websites properly.)
 

A///

Diamond Member
Feb 24, 2017
4,352
3,155
136
Lads, this isn't the place to whip it out and see who can piss the farthest. Cut this crap out.
 

SpudLobby

Senior member
May 18, 2022
961
655
106
Genuine Question: How does every topic on this forum seemingly always pivot to talking about how dead Intel is / will be?

This was ostensibly about ISA expansion at one point.
Lol, broadly I agree this can be annoying and I am not really an Intel doomer writ large, most who are see the fabs also as toast. I think that's naive and untenable due to market and geopolitical forces both.

At any rate, It's one of the most historically significant tech firms as is its decline, and previously was the de facto center of general purpose computing for civilization. I am not surprised Intel's state is a seemingly unavoidable discussion when we discuss e.g. major Intel ISA extensions pretty intimately linked with recent Intel messups or contending with better ISA design choices of a notable competitor. How could that not come to mind?

This forum has never really been a purely technical one and even realworldtech or similar aren't either for quite obvious reasons, in fact they're having conversations about the ISA and future of the X86 market, future releases etc.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,155
136
Given how badly optimised some JS heavy websites are I wouldn't bet on that.
Oh do let Doug swish his hand around. Let the man have some peace and fun at his age. His post was tongue in cheek, he said Skylake. 10th is skylake. Also a hot pos. it does work a bit faster than 6th gen junk. 11th gen desktop was junk maximo.
 

SpudLobby

Senior member
May 18, 2022
961
655
106
Frankly, anyone conflating the technical merits of an ISA with its success in the market has not learned from history. People have been complaining about x86 for literal decades at this point. Has not stopped it from being one of, if not the most commercially successful ISAs around. We see some of this same discussion both for and against RISC-V today. The success or failure of an ISA is about the business model and performance of the companies behind it far more than any technical merit.
For what it's worth I suspect the mild problems with X86 are historically contingent. Previously it was mostly a complaint about ROM sizes or whatever before chips got so small but the costs of the the Arm (64 bit) vs X86 choice is probably less trivial as e.g. cores get wider. Compare designing with Arm V8+ in 2015 vs X86 in 2015, vs now. There's no way that delta hasn't increased in terms of tradeoffs (be it design overhead or area or whatever) for a high performance CPU, GLC straight up added two cycles of latency for branch mispredictions for example which was primarily about decode as far as we can tell. People seem to miss this.

RISC-V is also a legitimate disaster for performance CPUs IMO, much more than X86 and I've seen nothing but horror about it from engineers that would know, see for example Shac Ron's twitter (think he works at Nvidia, used to work at Apple).

Technical merit etcetera are the bigger drivers I agree within normal variation, but I suspect it's not a fixed point, depends on goals or how much you have to work with, the GPR thing alone is a subtle admission they've changed their minds for future workloads to a degree, albeit at a cost with the extra complexity thrown in on the frontend I believe (at some point I hope they just go from scratch).
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,101
136
This is not as silly as it sounds. Another way to describe that instruction is: "convert floats to int like x86 does". This was adopted as the way js does things pretty much by default, which meant that computers that didn't have that behavior ended up a lot slower than x86 at doing some reasonably common js idioms. When js runs half the world, fixing that performance deficit is just sane, and FJCVTZS is less embarrassing of a name than Fx86CVTZS.

(The difference in behavior is how the instruction operates when the result of converting a float into i32 doesn't fit into i32. x86 tosses the high bits and gives the result mod 2^32. ARM normally clamps the result into the highest representable i32 value. Both implementations are standards-conforming, probably because the people who wrote the standard couldn't imagine why it would ever matter, the situation is always a failure to do the conversion properly and there are ways of signaling that, why would the failed result matter?

But then someone made a programming language where all numbers are always floats, and you can do bitwise operations on the numbers, which only makes sense if you turn them into integers first, and because he was in a ridiculous hurry, he just used the built-in "convert float to int" instruction, and probably never even considered what happens if the float is larger than what is representable by i32, and shipped it because the language needed to be done in literal two weeks. And then the entire freaking world adopted it, and the behavior on the hardware that they happened to be running became what the behavior actually needs to be matched to render websites properly.)
Oh it's not at all silly. It makes perfect sense in a world where CPUs will spend much of their life executing Javascript. I just think it's a good way to illustrate how messy reality is, even for a "good" ISA like ARM v8/v9. People can spend years hemming and hawing over mistakes of the past (x86, Javascript, etc.), but often they're more trouble to replace than it's worth. Or even more pragmatically, can't be replaced. I think what Intel is doing is a good balance between maintaining legacy support and keeping the ISA moving forward.
 
Reactions: Tlh97

Exist50

Platinum Member
Aug 18, 2016
2,452
3,101
136
For what it's worth I suspect the mild problems with X86 are historically contingent. Previously it was mostly a complaint about ROM sizes or whatever before chips got so small but the costs of the the Arm (64 bit) vs X86 choice is probably less trivial as e.g. cores get wider. Compare designing with Arm V8+ in 2015 vs X86 in 2015, vs now. There's no way that delta hasn't increased in terms of tradeoffs (be it design overhead or area or whatever) for a high performance CPU, GLC straight up added two cycles of latency for branch mispredictions for example which was primarily about decode as far as we can tell. People seem to miss this.

RISC-V is also a legitimate disaster for performance CPUs IMO, much more than X86 and I've seen nothing but horror about it from engineers that would know, see for example Shac Ron's twitter (think he works at Nvidia, used to work at Apple).

Technical merit etcetera are the bigger drivers I agree within normal variation, but I suspect it's not a fixed point, depends on goals or how much you have to work with, the GPR thing alone is a subtle admission they've changed their minds for future workloads to a degree, albeit at a cost with the extra complexity thrown in on the frontend I believe (at some point I hope they just go from scratch).
Let me be clear. Certain aspects of the ISA do matter. You can bet your butt that it's taken a lot of engineering effort to deal with things like x86 variable length decode, for example. Where I take issue is people claiming that because an ISA has problems, it's a dead end that will inevitably be abandoned for something else. Seems like every decade or so, some flavor of this argument pops up again for x86, yet here we are. I don't think any of the major CPU ISAs today (x86, ARM, RISC-V) are so fundamentally broken that they must be abandoned. If there's money to be made in supporting an ISA, then there is money to pay engineers to find clever ways to work around its limitations. CPU architecture history is full of solutions to problems once claimed unsolvable.

I haven't heard of this Shac Ron guy before, but various people I've heard from at different companies have expressed similar reservations about RISC-V as an ISA. But that neither has nor will stop its growth. Yes, high performance uarchs are going to have to do a ton of op fusion and other tricks to get performance, but you can buy a lot of engineers for the cost of an ARM architecture license. The financial incentive exists, thus it will happen.
 

Doug S

Platinum Member
Feb 8, 2020
2,486
4,049
136
Oh do let Doug swish his hand around. Let the man have some peace and fun at his age. His post was tongue in cheek, he said Skylake. 10th is skylake. Also a hot pos. it does work a bit faster than 6th gen junk. 11th gen desktop was junk maximo.

I'm curious, do you browse with an ad blocker or no?

I just happened to see a mention about JS performance on RWT and the poster admitted he browses without an ad blocker, and surmises that's the reason he finds JS performance to be a limiter more than the person he was replying to.

Every once in a great while I turn off my ad blocker just to see what the web looks like, and I'm appalled and yes my browser quickly grinds to a halt. So I guess if you assume the average person doesn't have anyone around them kind enough to set them up with an ad blocker that they will need every performance advantage they can get.

And what's wrong with Skylake? I build a new PC every 5 years or so, planning to do so again this winter. I don't see the point in updating every couple of years, the performance advancements aren't big enough from year to year to be worth the hassle very often.
 
Reactions: Insert_Nickname

Saylick

Diamond Member
Sep 10, 2012
3,385
7,151
136
Let me be clear. Certain aspects of the ISA do matter. You can bet your butt that it's taken a lot of engineering effort to deal with things like x86 variable length decode, for example. Where I take issue is people claiming that because an ISA has problems, it's a dead end that will inevitably be abandoned for something else. Seems like every decade or so, some flavor of this argument pops up again for x86, yet here we are. I don't think any of the major CPU ISAs today (x86, ARM, RISC-V) are so fundamentally broken that they must be abandoned. If there's money to be made in supporting an ISA, then there is money to pay engineers to find clever ways to work around its limitations. CPU architecture history is full of solutions to problems once claimed unsolvable.
I'm very far from a CPU architect, but I think what you said makes a lot of sense and is honestly the truth regarding X86's longevity. At the end of the day, any "flaw" can be mitigated to some extent given enough engineering resources are thrown at it. Said differently, there are little to no perfect solutions in the world of engineering; every design has trade offs and it's up to the designers to come up with an approach that maximizes the benefits while minimizing the drawbacks.

I mean, look at the Porsche 911, which I argue is a good analogy here. The rear engine layout is widely accepted as being not ideal in today's era of sports car design, yet look at what Porsche has accomplished over the decades by just iteratively improving on the design with each generation. Today, it remains one of the most well-known and celebrated sports cars in history, even though some may argue that it is based on an inherently "flawed" design concept.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,689
1,223
136
ISA-wise the first to die when RISC-V generally becomes available will be POWER and ARM.

Tenstorrent, Ventana, SiFive/StarFive, Full China(HiSilicon, T-Head, CAS), etc. All are targeting HPC/XCC first and any fall down to HEDT is going to slap POWER and ARM first.

RCS Blackbird >5000 USD
Avantek ARM Desktops >7000 USD

versus whatever replaces this in production:


Whereas the lower-end is just ARM, with RK3588/Kompanio 1380(Genio 1200 will get hit harder than the 1380 part) will probably have to deal with:
TH1520 (4x C910(OpenT-head)) // now -> JH8100 (4 Dubhe(weak si5 670?) cores/2?-4? Merak(weak si5 470?) cores) // Q2 2024

The general consensus for x86 death is how much lower cost RISC-V will be for each market segment.

Every bit counts, and Intel isn't helping with x86 future proofing.
APX reduces code density as REX2 is larger than REX, AVX10 falls into prefix determined vector-width which for future code means low code density/high instruction count.

For example, Dubhe/Merak should be able to unroll VLA-like code to full width like Xuantie910/C910/C920 cores. Instead of a couple 8x128-bit/4x256-bit non-dependent/parallel instructions, it can just launch one 1024-bit for density.
I'm pretty confident that Dubhe is based on p670 and Merak is based on p470 now:


It is more likely Intel if they see ISA death(Intel now >$50B for x86 sales(per-year) -> IBM-like future <$7B for x86 sales(per-year)), they would just accept it and full run RISC-V in IFS. [Someone tell SiFive to launch a P800 on 20A/18A, with the level of board support that StarFive gets, yeesh~. P550 dev board looks awful.]

More ISA specific:



RV64+C. If any of the software (liblz4, sqlite3, mfat, mxml, quake) op'd for SIMD-scaling. Where RV64V can be inserted reducing instruction count. Like unlimited SIMD scaling portions where 16x RV64C instructions can be fused into one 1024-bit RV64V instruction. Ins. count reduction -> higher code density for HPC/High-end workloads. REX2(APX) and Enhanced EVEX(AVX10) appear to make the above worse.
 
Last edited:
Reactions: Grazick

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
I'm curious, do you browse with an ad blocker or no?

I just happened to see a mention about JS performance on RWT and the poster admitted he browses without an ad blocker, and surmises that's the reason he finds JS performance to be a limiter more than the person he was replying to.

Every once in a great while I turn off my ad blocker just to see what the web looks like, and I'm appalled and yes my browser quickly grinds to a halt. So I guess if you assume the average person doesn't have anyone around them kind enough to set them up with an ad blocker that they will need every performance advantage they can get.
While I definitely agree that using ad blockers is the sane way and a healthy suggestion to everybody, this doesn't change the fact that the current internet without any ad blocker is the default, the way most of the global population are experiencing it.
 

coercitiv

Diamond Member
Jan 24, 2014
6,393
12,826
136
Every once in a great while I turn off my ad blocker just to see what the web looks like, and I'm appalled and yes my browser quickly grinds to a halt. So I guess if you assume the average person doesn't have anyone around them kind enough to set them up with an ad blocker that they will need every performance advantage they can get.
Not on-topic, but might as well be said for the sake of the discussion: no amount of CPU performance can fix the ads related performance issues. The CPU does indeed experience more load, but the biggest problem by far is the delays generated by a cascade of requests negotiating what to display to the user. So "the halt" is not local most of the time, it's the cloud.
 
Reactions: Thibsie

Panino Manino

Senior member
Jan 28, 2017
846
1,061
136
The current ones are perfectly sane. technical debt accrues over time, that happens in any product. unless you explicitly drop support for old features. The only real miss-step is AVX-512 not aligning to intel's actual plans.
Everyone still wants alot of the "old" features , we just dont want the now very old features. So if they keep deprecating really old things as they add new things i dont see what the deal is.

I also mean plan with AMD which instructions will be available on actual x86 products, preventing any fragmentation, to not letting this be an disadvantage against any other architecture.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,101
136
I'm very far from a CPU architect, but I think what you said makes a lot of sense and is honestly the truth regarding X86's longevity. At the end of the day, any "flaw" can be mitigated to some extent given enough engineering resources are thrown at it. Said differently, there are little to no perfect solutions in the world of engineering; every design has trade offs and it's up to the designers to come up with an approach that maximizes the benefits while minimizing the drawbacks.

I mean, look at the Porsche 911, which I argue is a good analogy here. The rear engine layout is widely accepted as being not ideal in today's era of sports car design, yet look at what Porsche has accomplished over the decades by just iteratively improving on the design with each generation. Today, it remains one of the most well-known and celebrated sports cars in history, even though some may argue that it is based on an inherently "flawed" design concept.
I think it also helps to put real numbers to the problem. A typical high performance CPU core team is on the order of a few hundred people. Varies wildly company to company, and core to core, but let's just assume 500. Now let's say it takes 20% more labor to deal with x86, so ~100 incremental headcount (again, erring on the high side). And let's say each of those engineers costs a generous $1,000,000/year in total compensation, tools, etc.

With those very heavy-handed numbers, you're looking at $100M/year in x86 tax. Now, is the global market for x86 compatible software worth $100M/year? At least today, I think the answer is clearly "yes".

Now obviously, the reality is complicated by the fact that there are 5-6 serious x86 core teams scattered about, and my numbers are, as I said, gross approximations. But right now, I think x86 has enough momentum to be self-sustaining for at least a while yet. It's a much more interesting question when you look at something like POWER.
 

SpudLobby

Senior member
May 18, 2022
961
655
106
I think it also helps to put real numbers to the problem. A typical high performance CPU core team is on the order of a few hundred people. Varies wildly company to company, and core to core, but let's just assume 500. Now let's say it takes 20% more labor to deal with x86, so ~100 incremental headcount (again, erring on the high side). And let's say each of those engineers costs a generous $1,000,000/year in total compensation, tools, etc.

With those very heavy-handed numbers, you're looking at $100M/year in x86 tax. Now, is the global market for x86 compatible software worth $100M/year? At least today, I think the answer is clearly "yes".

Now obviously, the reality is complicated by the fact that there are 5-6 serious x86 core teams scattered about, and my numbers are, as I said, gross approximations. But right now, I think x86 has enough momentum to be self-sustaining for at least a while yet. It's a much more interesting question when you look at something like POWER.
So FWIW I fully agree about the business incentives RE: X86 laid out earlier but I think you're downplaying some of this, RISC-V for example, it's really not something out of Reddit but a fairly common criticism that even academics have taken to, at some point it severely hampers performant design. Frankly, I sort of hope they redo it at some point and just make it more like Arm V8/V9.

For really crappy small stuff it doesn't matter though yeah.
 

Thibsie

Senior member
Apr 25, 2017
811
887
136
And why does that matter, other than higher scores in the "how fast is your browser" thread?

Interpreted/JITted code is not used for anything performance critical, and that's where it will take years before APX has any impact. So great, if you buy an APX CPU your browser will run faster, but if a web site is slow on brand new Intel or AMD CPUs it is broken.

Getting APX to speed up a broken slow website is like having a sewer line full of roots and saying "hey no problem I got a new toilet with a stronger flush, that'll help push the turds through better!" Once the website is fixed or you have the roots augured out, neither APX nor your new toilet make any difference.
Why not? MMX accelerated the internet after all

😁
 

CakeMonster

Golden Member
Nov 22, 2012
1,428
535
136
Ooh, I remember badly wanting to upgrade from the P133 to P200MMX... No idea if it would have made a noticeable difference to teenage me at the time.
 

naukkis

Senior member
Jun 5, 2002
779
636
136
Intel designs their instruction sets backwards. They design their instruction sets to supply their hardware designs when it really should be other way around. Doing things Intel way results just diverged and messy ISA which will lead to x86 dead - which isn't a big deal anymore.
 

Jan Olšan

Senior member
Jan 12, 2017
312
402
136
I'm curious, do you browse with an ad blocker or no?

I just happened to see a mention about JS performance on RWT and the poster admitted he browses without an ad blocker, and surmises that's the reason he finds JS performance to be a limiter more than the person he was replying to.

Every once in a great while I turn off my ad blocker just to see what the web looks like, and I'm appalled and yes my browser quickly grinds to a halt. So I guess if you assume the average person doesn't have anyone around them kind enough to set them up with an ad blocker that they will need every performance advantage they can get.

And what's wrong with Skylake? I build a new PC every 5 years or so, planning to do so again this winter. I don't see the point in updating every couple of years, the performance advancements aren't big enough from year to year to be worth the hassle very often.

I'm browsing without adblocker full time (obvious reasons: i want the sites that are paid by me seeing ads to get paid and not ride without paying the fare - spare me the cope about how that's totally not what adblocking is etc, pls.).
I don't find it problematic at all. Don't really see performance problems and not even that much problems with the impact on the data usage (loading speed). And note that I have laughably bad connection over 802.11*g*. With a faulty USB dongle. Over several walls. It usually looks like it goes 10-12 Mbps, but it chokes/timesout extremely easily under higher load.
Yet, internet is perfectly usable not just on my desktop (5950X but running at slow power plans so it's usually just 1.7~4 GHz), it isn't even bad on notebook with the worst CPU AMD currently/recently offered (Athlon Silver 3020e), although that machine can take better use of what the wi-fi offers (802.11n). Windows 10/11, current Firefox.

All of you have massively better connections so I bet you would have no problem browsing without adblock too.
 
Reactions: DAPUNISHER
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |