AtenRa
Lifer
- Feb 2, 2009
- 14,003
- 3,361
- 136
Intel's AVX implementation is much wider @ 256 bit and can be done in one cycle, AMD has to fuse two 128 bit operations (?) and it takes multiple cycles (?) - resulting in Intel-like AVX workloads running just as fast on AMD with or without AMDs AVX implementation being used.
.
Bulldozer/Vishera can execute 1x 256bit AVX per cycle. The only limitation is that the FPU can only accept one thread per cycle but the thread can chance each cycle. But once received by the FPU ops from multiple threads can be executed.
That is it can execute 2x 128bit per cycle but only a single 256bit AVX per cycle.