Did some tests with Bullet physics library using Steamroller, Excavator and Haswell.
I built the libraries and the test program with various settings (e.g instructions up to SSE3 / SSE4.2 / AVX / AVX2 and µarch specific tunes).
Based on the results, Excavator appears to be the slowest of the three. Regardless of the used compiler settings Steamroller is 9.2% faster on average. On both Steamroller and Excavator using a non-architecture specific settings combined with instructions up to AVX & FMA always produce the best results, even better than the specific "march" setting. It doesn't seem that there has been too much efforts in optimizing GCC presets for AMD architectures. Hope this will change with Zen...
Haswell is nearly 61% faster than Steamroller when SSE3 is used and around 74% faster than Excavator when SSE3 or SSE4.2 is used.
Bullet is used pretty widely in recent games, benchmarks and rendering applications (e.g. GTA V, 3DMark, Blender, Cinema 4D, etc). Could partly explain why AMD CPUs do so badly in GTA V and 3DMark physics tests :\
I also noticed an interesting phenomenon with Haswell. When AVX or AVX2 and FMA are enabled simultaneously and no "march" or "mtune" parameter in given, one of the individual tests (136 ragdolls) slows down by +500%. However as soon as "march" parameter is given and both AVX/2 and FMA obviously remain active, the phenomenon seizes to exist
I wouldn't expect GCC 5.3 to have such a bug since Haswell has been supported for a "while" now.
In case anyone wants to try themselves:
https://onedrive.live.com/redir?resid=8329B08E8413A80E!546&authkey=!AAnGDZ1Nv6fMfEw&ithint=file,7z
The benchmark itself is from Bullet 2.82 build, while the libraries are from the newest build available at Git (2.83.xxx). They have been compiled with GCC 5.3 x86-64.
It is the same benchmark as OpenBenchmarking uses, however the build options differ. Also it is single threaded only.