Originally posted by: Viditor
Well, as dexvx points out, it is just speculation at the moment and we should keep that in mind. But the core improvements of note for K8L (grabbed hastily from Wiki) are:
More aggressive prefetching (16 bytes to 32 bytes)
Out of order loads
128 bit wide Floating point units
Larger Out of Order (OoO) buffers
Greater number of entries in Branch Target Buffer
Probable new additions to micro-ops ROM
Add to that the reduced latency of HT and ODMC, and it seems to me that K8L will be crowned the new champ next summer...
The 128bit wide FPU is interesting. If it can do single pass 128bit SSE2 instructions in 1 pass, it'll have SSE2 performance on par with Core2. Everything else just seems more or less tweaks that should gain any feasible performance beyond the spread of error. But those are just too vague.
However, just a latency decrease doesn't seem to be worthy of any performance differential. Memory latency decrease is still no match for cache latency. This is where the L3 will get interesting (again depending on how slow). Again, this is all speculation and you are assuming nothing actually gets slower. Remember, Conroe's L2 is actually a little slower than Yonah's (on a per clock basis).
Originally posted by: Viditor
For Conroe, the major advantages appear to be:
Can decode and execute 4 commands per clock cycle (though I have yet to see an example where it actually does this, it has the ability and hence the headroom)
Process 128-bit SSE3 instructions without slowing down
While we certainly won't know how these 2 play out until samples are released, my own opinion is that K8L comes pretty darn close to Conroe in performance.
If you compare Yonah to Conroe (closest derivative), the main performance increase resulted in the 4 issue wide buffer compared to 3. "Simple" math would yield an optimal 33% increase in per clock improvement, and real world was more like 20% per clock on average.
Another performance issue is the larger, but slower (per clock), L2 cache as well as more L1 and L2 bandwidth.
There were also a series of tweaks here and there, but seriously, I doubt those wouldnt be noticeable beyond the spread of error.