Oh god, not this myth again!
Llano used an improved version of the Phenom II core on GloFo 32nm. Trinity then used Piledriver on GloFo 32nm. Trinity was an improvement over Llano in CPU performance.
At the end of the day, all that matters is performance per watt. Trinity had resonant clock meshing which llano did not, plus an improved gpu.
At 100W, llano had 4 cores running at 3ghz and 400 radeon cores running at 600mhz.
At 100W, Trinity has 2 modules running at 3.8ghz and a gpu with 384 cores running at 800mhz.
Looking at a 2.9ghz llano versus a 3.8ghz trinity, there is a 31% advantage in clock speed.
http://www.anandtech.com/bench/Product/399?vs=675
AMD accounted for a ~10% clock speed boost from resonant clock meshing.
Cherry picking a few benchmarks:
Trinity is 29% faster in sysmark 2012 overall.
22% faster in adobe cs4.
12.7% faster in divx encode.
5% faster in x264 encode 1st pass.
20% faster in x264 encode 2nd pass.
12.5% faster in windows media encoder 9.
9.5% faster in 3dsmax cpu test.
20.5% faster in cinebench r10 single thread.
8% faster in cinebench r10 multithread.
4% faster in POV-Ray SMP benchmark.
7.6% faster in excel monte carlo.
-<1% slower in sorenson squeeze.
23% faster in winrar.
21% faster in cinebench 11.5 single thread.
-5% slower in cinebench 11.5 multi thread.
-5.5% slower in dragons age.
-2% slower in dawn of war 2.
6% faster in WOW.
22.5% faster in starcraft 2.
My conclusions:
A hypothetical resonant clock mesh llano would have traded blows with trinity, but trinity would generally have been faster.
Single threaded performance of trinity is way higher than llano.
Multithreaded performance of trinity roughly on par with llano, but shows the weakness of the module design compared to dedicated cores. This is especially so in apps that stress the fpu, more so if they're legacy apps that don't take advantage of the improved FPU.
And observations:
The Stars core was designed around needing an L3 cache to share data for multithreading. The Bulldozer/Piledriver module has a shared L2 cache between the integer cores, and a more powerful FPU to make up for only having one.
At low clock speeds, Kabini with its shared L2 cache would probably destroy Trinity in multi threaded benchmarks.
Trinity is a larger die than llano, so it is slightly more expensive.
Compared to Phenom II:
It's hard to say how much power the igpu consumes. However, I feel that a 65W Phenom II is a reasonable comparison to a 100W Llano.
That gets you the 2.6ghz Phenom II X4 910 versus a 2.9ghz (or up to 3ghz) llano.
http://www.anandtech.com/bench/Product/399?vs=85
In comparison, Llano is almost always faster, only in the occasional multithreaded benchmark does Phenom II match or exceed Llano's performance.
Now, Kabini compared to Trinity:
http://www.anandtech.com/show/6974/amd-kabini-review/5
Trinity here has a 53% clock speed advantage. The top end Kabini at 25W would match Trinity's clock speed at 25W.
PCMark 7:
Trinity only 43.5% faster
Cinebench R11.5 single thread:
Trinity is 79% faster.
Cinebench R11.5 Multi:
Trinity is only 36% faster than Kabini.
x264 pass one:
Trinity is 62% faster.
Pass 2:
Trinity is 48% faster.
Conclusion:
Kabini compares favorable to Trinity in multi threaded performance, which was already Trinity's achilles heel. But in terms of single threaded performance, the Piledriver core seems to easily have the best performance per watt of any AMD 32nm architecture.