DrMrLordX
Lifer
- Apr 27, 2000
- 21,813
- 11,168
- 136
I'd love to see AVX512 go mainstream, but we have so little support for AVX/AVX2 right now that it almost makes no sense to push it out to the consumer market.
Even supporting AVX/AVX2 isn't so simple as flipping a switch. Based on my very limited experience, you've got to at least have blocks of instructions involving 512 bits of operands (to fill two 256-bit registers) being acted upon by like operators. There may be enough flexibility with AVX and/or AVX2 that the operators need not be uniform (FMA?), but frankly I don't know enough about the instruction set to say so for sure. You can also have less than 512 bits of data, but you're basically losing performance since AVX/AVX2 lets you complete the operation on all the data loaded into the registers at once, so you may as well fill them up if you can.
Anyway, if your code is too branchy or if there's too much dependance on prior operations to run your block of code in parallel, it can screw up AVX/AVX2 and all other sorts of SIMD operations as well.
AVX512 seems to extend the length of AVX registers to 512 bits (among other things), so you'd actually want/need 1024 bits of operands in parallel to take full advantage of that feature.
Even supporting AVX/AVX2 isn't so simple as flipping a switch. Based on my very limited experience, you've got to at least have blocks of instructions involving 512 bits of operands (to fill two 256-bit registers) being acted upon by like operators. There may be enough flexibility with AVX and/or AVX2 that the operators need not be uniform (FMA?), but frankly I don't know enough about the instruction set to say so for sure. You can also have less than 512 bits of data, but you're basically losing performance since AVX/AVX2 lets you complete the operation on all the data loaded into the registers at once, so you may as well fill them up if you can.
Anyway, if your code is too branchy or if there's too much dependance on prior operations to run your block of code in parallel, it can screw up AVX/AVX2 and all other sorts of SIMD operations as well.
AVX512 seems to extend the length of AVX registers to 512 bits (among other things), so you'd actually want/need 1024 bits of operands in parallel to take full advantage of that feature.