Someone actually did invent a radical new algorithm called the "bitslice", but because of register sizes, it's only useable on a handful of CPU lines(mainly, the G4 series, which has a kkeys/mhz rate of 8.x, instead of the fastest "shifting" speed of 3.5). The P4 lacks the shifts needed, but also lacks the larger registers for bitslicing, so it has to do emulated shifting.:Q