OK, that makes more sense. Yes I would agree that ARM probably is "turboing" to hit the same speeds that Haswell will at low voltages.
Yeah, there's little question that at more appropriate operating frequencies their voltage is probably closer to 1.0 or thereabouts. But it'd definitely be interested to get some of that data.
CLA is a ripple with added logic, it's purely adding logic to the basic full adder chain in a ripple :thumbsup: They have MASSIVE delay penalties when you get to something like 32 or 64 bit numbers (no way they would ever be used in a modern microprocessor). There are tradeoffs ARM can make to reduce power, and it's not 2.38x less logic, it's less but not that much. It's hard to say exactly how much less, but it's certainly smaller when something like the queue or FP/integer unit in a Haswell core is the same size as an entire ARM core....
Last I checked CLA removes the ripple carry logic (believe that's 3 AND gates and 1 OR gate in the typical full adder implementation per bit), instead having four full adders feed into a single CLA block. The net effect being a similar amount of logic, it just uses a bit more power due to the fact that said logic toggles more frequently than that of a ripple carry (that activity factor thing I'd mentioned previously.) (And yes, I was definitely incorrect in my statement previously, I'd simply glanced at the applicable section in one of my textbooks that compared the various adder designs in an 8 bit implementation instead of actually looking at and remembering the differences.) Anyway, never denied the fact that there are trade offs ARM could make, I'm just not at all convinced that they're making those trade offs any more in the interest of catching up on performance (on the big cores at least.) Oh, and ya know, something like the queue or FP/integer unit in a Haswell core is also way larger than an entire P24C on a modern process
Intel has optimized Haswell for idle efficiency. Load power usage can be worse than IB. Haswell just has SOIX and C7 states to drop into ultra-low power mode (and essentially emulate what an ARM or Jaguar or Atom core is doing, more or less). Haswell is about scaling down when it's horsepower isn't needed. That means more battery life, and less power to do menial tasks, but that doesn't mean power savings in, for example, 3dMark or something. That means power savings when you're sitting idle at desktop. Or maybe light web browsing or something (I'm not sure at what usage C7 kicks in).
I'd disagree that Haswell is only optimized for idle efficiency. Load power usage is better than IB
when run at its intended frequencies. Haswell was optimized for a lower power target than Ivybridge, hence the desktop SKUs suffer as they're running a fair bit above the intended frequency range. Yeah it still works, but it requires a lot more voltage to do so. Optimizing for that lower power target was clearly the right move though as mobile is clearly reaping the benefits with those low operating voltages. Sadly it's difficult to say exactly how much of a difference it really makes as I haven't seen any battery life/power draw figures under full load. All we have are the numbers for desktop which aren't really applicable.