And the part where you refute his statements with actual technical information...? Or is it one of those "you have to trust me on this" kind of posts?
In any case, it is just as his affirmations without actual proof
No need to take anyone's word for it, AMD has publicly released a very generous amount of documentation on Bobcat and especially Jaguar, and at least a reasonable amount for Bulldozer and Piledriver. There's also sources like GCC you can glean uarch info from..
Saying that BD and Bobcat share the same vector units is just profoundly inaccurate. Off the top of my head:
- Bobcat has two execution ports, Bulldozer has four
- Bobcat has only 64-bit FP and integer execution, Bulldozer has 128-bit
- Bobcat has FADD and FMUL pipes, Bulldozer has two FMA pipes
- Bobcat has 2-3 cycle latency FP operations, Bulldozer is 5-6 cycles (this means a really fundamentally different approach in how the pipeline is designed)
- Bobcat only supports up to SSSE3, while Bulldozer supports SSE4.x, AVX, FMA, and XOP and has the real execution to back it
- AFAIK Bobcat doesn't have SIMD move elimination, while Bulldozer does
Jaguar mainly changes things by widening the units to 128-bit, adding SSE4.x and AVX, tightening some timings and I think removing some microcodings. I'm not aware of any changes Piledriver makes to the vector part at all.
This is all backed by easily available documentation: look at slides, publications from HotChips, optimization guides, Agner Fog's manuals, GCC scheduler definitions, and so on..
Probably NostaSeronx saw something that he thinks says what he does but actually says something completely different (although how anyone could misconstrue somethnig into this is a real headscratcher), or he saw something that isn't actually from a reliable source.