It's kinda hard for me to tell after the compliementary components really started shouldering the load(the 486DX kinda got that started, but that's OOS). When we had DMA allowing allowing disk writes without having to check with Papa Proc, and the GPU worked on the display without excessive supervision, the CPU could really start doing work.
I think the Pentium Pro wowed me the most with its performance for the time and what was being done with its architecture and implementations.
The PPro brought us on-die L2 cache running at processor speed and (I thought)really made serious inroads for SMP. It got dogged by the math bug and its 16-bit performance, however, and the price has already been discussed. However, it brought about a lot of what makes its descendents speedy today(P4EE picked up the price gene many years later ).