AVX2 and FMA3 in games

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Simple question, but do any games use AVX2 and FMA3 yet? I know that some are using AVX, especially since the PS4 and Xbox One both support it.. And several physics engines also use AVX..

I did a google search and the only thing I could find was that Serious Sam 3 might use AVX2, and this was speculative more than anything.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,920
3,544
136
you have to remember generally speaking both fma and AVX for games is only going to be an incremental improvement over SSE. In would be interesting to see what PC devs actually target today? SSE4? SSSE3?
 

zlatan

Senior member
Mar 15, 2011
580
291
136
you have to remember generally speaking both fma and AVX for games is only going to be an incremental improvement over SSE. In would be interesting to see what PC devs actually target today? SSE4? SSSE3?
SSE4.2
The problem with AVX is that the majority of the best selling processors are not support it. For example every Celeron and Pentium. Even if it might be a useful way to improve the application performance the publishers don't finance the research. They want an alternative optimization strategy to improve the performance for the Celeron/Pentium users also.
The consoles might help for AVX, but AVX2 is still a not really useful option.
I personally think that HSA runtime is the best way to get AVX2 and even AVX512 support for the applications. That platform is cheap and we can target a lot of extensions/accelerators with the same codebase. I really think that SYCL 2.1 will be also a revolutionary step for the programers.
 
Last edited:

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
Game developers aren't going to want to lose sales to people who don't have AVX2 CPUs. This is one of the reasons why game engines aren't being made that fully take advantage of eight threads. They need the games to run well on an i3.

How willing developers are to jettison non-AVX1 customers I don't know. Perhaps the solution is to offer code that uses non-AVX for processors that lack it but AVX for those that have it. That's more work but it's more reasonable than cutting out a substantial base.
 

TheELF

Diamond Member
Dec 22, 2012
4,026
753
126
Game developers aren't going to want to lose sales to people who don't have AVX2 CPUs. This is one of the reasons why game engines aren't being made that fully take advantage of eight threads. They need the games to run well on an i3.
Oh so that's the reason why all games of the last 2-3 years run so well on any CPU... (not)
 

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
Oh so that's the reason why all games of the last 2-3 years run so well on any CPU... (not)
Asking devs to make everything for the Anniversary Pentium is a bit of a stretch. As for 2-3 years of i5s and i7s, I fail to see the issue. Some of the slowest i3s may have difficulty but those with high enough clocks should still be viable even if they're two or three years old.

An i3 gained a lot of FPS in DX12 in Ashes, so it seems that developers are even targeting the i3 for DX12 titles.
 

NTMBK

Lifer
Nov 14, 2011
10,322
5,351
136
SSE4.2
The problem with AVX is that the majority of the best selling processors are not support it. For example every Celeron and Pentium. Even if it might be a useful way to improve the application performance the publishers don't finance the research. They want an alternative optimization strategy to improve the performance for the Celeron/Pentium users also.
The consoles might help for AVX, but AVX2 is still a not really useful option.
I personally think that HSA runtime is the best way to get AVX2 and even AVX512 support for the applications. That platform is cheap and we can target a lot of extensions/accelerators with the same codebase. I really think that SYCL 2.1 will be also a revolutionary step for the programers.

Even SSE4.2 is a bit risky. Phenom didn't have it, and there are a lot of those still out there.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
SSE4.2
The problem with AVX is that the majority of the best selling processors are not support it. For example every Celeron and Pentium. Even if it might be a useful way to improve the application performance the publishers don't finance the research. They want an alternative optimization strategy to improve the performance for the Celeron/Pentium users also.
The consoles might help for AVX, but AVX2 is still a not really useful option.
I personally think that HSA runtime is the best way to get AVX2 and even AVX512 support for the applications. That platform is cheap and we can target a lot of extensions/accelerators with the same codebase. I really think that SYCL 2.1 will be also a revolutionary step for the programers.

I'm no programmer, but I was under the impression that extensions such as AVX2 were backward compatible with older extensions. For example, a new CPU like Haswell or Skylake would run the fastest codepath with AVX2, while a CPU like Sandy Bridge would use the same codepath but with less throughput/performance due to lacking AVX2..

I really have to wonder though at some of the massive performance gains on the CPU side seen in recent games, such as Dying Light for instance. They went from this at the game's launch:



To this 11 months later, a more than doubling of performance for many CPUs on that list..



And the game definitely was CPU limited when it first shipped no doubt, but now it performs very well. So I wonder, did they get these gains by exploiting more vectorization, or was it all due to better multithreading?

It seems more the latter, as CPUs with more threads/cores gained more performance.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Simple question, but do any games use AVX2 and FMA3 yet? I know that some are using AVX, especially since the PS4 and Xbox One both support it.. And several physics engines also use AVX..

I did a google search and the only thing I could find was that Serious Sam 3 might use AVX2, and this was speculative more than anything.

Yes they do. An "easy" way is to see when Haswell/Broadwell/Skylake enters the increased AVX2/FMA mode with a higher voltage. But its not something programmers is going to tell you in a list.
 

NTMBK

Lifer
Nov 14, 2011
10,322
5,351
136
Phenom is not a target now. It's too old.

Phenom II is still listed as minimum for plenty of recent games. (Llano also lacked SSE4.2.) SSE4.1/2 isn't that essential, to be honest- the blend instructions are nice, but you can get the same effect with an (and)|(andnot) sequence.
 

zlatan

Senior member
Mar 15, 2011
580
291
136
I'm no programmer, but I was under the impression that extensions such as AVX2 were backward compatible with older extensions. For example, a new CPU like Haswell or Skylake would run the fastest codepath with AVX2, while a CPU like Sandy Bridge would use the same codepath but with less throughput/performance due to lacking AVX2..
SIMD is not implemented well in x86. It is forcing you to deal with multithreading, prefetching, and small registers. MMX, SSE and AVX is very inefficient compared to other SIMD models. Also every new SIMD register sizes will just make your original code outdated. The best way to do SIMD is to generate a specialized code for each CPU model, and this will be relatively fast, but most publishers just don't finance it. For today it is much more logical to use one or more IR, and from there you can compile the code to a lot of hardware targets. The real question is that which IRs and compiling paths gives the best overall results.

I really have to wonder though at some of the massive performance gains on the CPU side seen in recent games, such as Dying Light for instance. They went from this at the game's launch:



To this 11 months later, a more than doubling of performance for many CPUs on that list..



And the game definitely was CPU limited when it first shipped no doubt, but now it performs very well. So I wonder, did they get these gains by exploiting more vectorization, or was it all due to better multithreading?

It seems more the latter, as CPUs with more threads/cores gained more performance.

This is pretty much normal when you build a new engine. The reason why you won't see it in most games is that most publishers don't really finance the engine optimization backports to the released titles.
The primary optimization strategy for a new engine should focus to reverse engineering the graphics kernel drivers. Knowing where are the stalls in the code/drivers is a huge success, and you can optimize accordingly.
 
Last edited:

zlatan

Senior member
Mar 15, 2011
580
291
136
Phenom II is still listed as minimum for plenty of recent games. (Llano also lacked SSE4.2.) SSE4.1/2 isn't that essential, to be honest- the blend instructions are nice, but you can get the same effect with an (and)|(andnot) sequence.
That's why SSE2 is mainstream now, but SSE4.2 is much more logical choice for the actual projects.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
I'm no programmer, but I was under the impression that extensions such as AVX2 were backward compatible with older extensions. For example, a new CPU like Haswell or Skylake would run the fastest codepath with AVX2, while a CPU like Sandy Bridge would use the same codepath but with less throughput/performance due to lacking AVX2..

That's not how it works. If a program is compiled to only use AVX2, it won't run on a processor without AVX2 support (eg on Linux it will seg fault). If it's compiled to only use AVX, then processors that support AVX2 will also be able to run it.

And then there's dispatching that can be done where the code detects CPU flags and runs different code paths depending on what's found.
 
Mar 10, 2006
11,715
2,012
126
That's not how it works. If a program is compiled to only use AVX2, it won't run on a processor without AVX2 support (eg on Linux it will seg fault). If it's compiled to only use AVX, then processors that support AVX2 will also be able to run it.

And then there's dispatching that can be done where the code detects CPU flags and runs different code paths depending on what's found.

I am truly and utterly amazed that Intel tries to segment its processors by disabling ISA features. This is the product of a marketing department that has no idea of the ramifications of its actions. They are literally hindering the already naturally slow pace of the adoption of new ISA features that could make their processors run a hell of a lot faster.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
I'm no programmer, but I was under the impression that extensions such as AVX2 were backward compatible with older extensions.

They are neither binary compatible, as the instruction encoding is different nor logical/semantical compatible as the instruction operate on different/wider data-types.
 

lamedude

Golden Member
Jan 14, 2011
1,206
10
81
Visual Studio 2013/15 C runtime will use FMA3 for some math functions.
Jaguar only supports AVX 128 so no use in hand writing 256bit vector code for BoneStation.
MS's compiler only supports SSE/SSE2/AVX/AVX2 and PhysX/Skyrim has shown us you have to be a wizard to change that setting (thankfully the default changed to SSE2 in VS2012).
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,475
1,975
136
I'm no programmer, but I was under the impression that extensions such as AVX2 were backward compatible with older extensions. For example, a new CPU like Haswell or Skylake would run the fastest codepath with AVX2, while a CPU like Sandy Bridge would use the same codepath but with less throughput/performance due to lacking AVX2..

This is not true at all. If you put an AVX2 isntruction into a program and run it on an older CPU, the program crashes with illegal instruction exception.
 
Reactions: Arachnotronic

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Its not an issue to have AVX2 support and still run it on CPUs without AVX2. This is essentially what all the "intel compiler cheats" is about in the old days.
 

Nothingness

Diamond Member
Jul 3, 2013
3,063
2,042
136
Its not an issue to have AVX2 support and still run it on CPUs without AVX2. This is essentially what all the "intel compiler cheats" is about in the old days.
That still means an increase in validation effortsas you have to ensure all paths are correct. And not everyone uses icc, the vast majority of the industry relies on MS compiler.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |