Exophase, the Atom chip shares some of the Integer SSE operations with the FP ports.
So do a lot of CPUs. There's a good reason for this: very little software uses both integer and floating point SIMD simultaneously. When talking about dual-issue the bottlenecks would usually be elsewhere even if they were.
Also, the Linpack results show 0.4GFlops for the single core 1.6GHz Atom while a dual core E-350(1.6GHz) gets 2.4GFlops. Doubling cores on Atom should get 0.8GFlops, which is still only 1/3rd the result. Linpack is a good benchmark for isolating pure FPU power.
Atom has two major FP problems that reek like bugs, Linpack is exposing at least one and possibly two of these:
1) x87 performance sucks, specifically there's this inexplicable extra stall when you issue them back to back
2) double precision SSE performance sucks, works as if it's unpipelined
This was with old Atoms, it's possible Saltwell fixed either or both of these things.
In the real world almost no one will be using double precision on these CPUs and no one should be using x87. Suffice it to say that Linpack is a poor proxy for real world performance in this scenario.
70% is again for best case scenarios.
Not at all, you can find real world tests that show > 100% better IPC for Bobcat. I listed a few in the post I made earlier. There were also cases where it was < 50%.
Average gain of E-350 versus similar clocked Atom is only 10-20% better. Hyperthreading is best case, 35-40% faster(coincidentally in the application where the gap is greatest with Bobcat), while we see places where there's almost no gain at all(
http://www.tomshardware.com/reviews/Intel-Atom-Efficient,1981-13.html).
Ironically, you yourself posted a test where it gained over 50%.
http://forums.anandtech.com/showthread.php?t=163947
And this is a real world test, nothing synthetic or especially bizarre. You've really got to be careful when talking about best case scenarios, only takes one incidence to disprove it
It would be great if someone did an exhaustive set of comparisons for Atom like was done here
http://ixbtlabs.com/articles3/cpu/archspeed-2009-4-p1.html All I can really say is that it's a good rule of thumb that where HT helps on a Nehalem it'll help a lot more on Atom..
Here's one more figure, Geekbench on Medfield:
http://browser.primatelabs.com/geekbench2/1060511
This is useful because it's 1C/2T, and because there are mostly single threaded + multi-threaded versions of the test. So it's pretty close to a measurement of HT vs no-HT. This is what it shows:
Integer tests:
Blowfish: 64.3%
Text compress: 27.5%
Text decompress: 36.1%
Image compress: 43.4%
Image decompress: 48.5%
Lua: 38.2%
FP tests:
Mandelbrot: 88.9%
Dot product: 83.2%
LU decomposition: 1.3%
Primality test: 19.1%
Sharpen image: 68.7%
Blur image: 65.8%
Much bigger win for FP than integer, probably because FP operations have higher latencies that are difficult to schedule around in Atom code, especially 32-bit Atom with only 8 xmm registers. But even for integer calling out a 40% best case is low-balling it.