You didn't bother to reply, so...
If someone wanted to do this, I'd start by not trying to re-engerd stuff and try to find public source architecture documents from the manf's themselves.
A lot of this stuff is going to come down to process/lithography, pico-jule per bit, etc, etc, and is often bragged about at conferences like Hot Chips, etc.
Power per operation are a thing in supercomputer research and development where they are always looking to max out math performance at scale.
If you're just wondering if using SSE vs AVX is going to use more power, or be faster, much easier to just park all but one core and run a couple of programs bare metal is possible over a long enough time to measure watts used by the proc.
Again, not easy or probably possible if the first piece of test equipment that comes to mind is a DMM.