Also, consider the difference in design methods. From what I hear, large portions of the GPU are synthesized from rtl with static CMOS with latch/flop sequentials, as opposed to CPU's which at the least use domino (or worse yet, self-reset domino) to achieve speed, resulting in a massive increase in design effort. Even hand-designed static CMOS is about a 2-4X increase in design effort compared to direct synthesis.
If you define "advanced" as "difficult", I'd agree with pm that CPU's are more difficult in the following areas:
- Definition: Protocols are more complicated with non-systolic pipelines (CPU's) as opposed to, as far as I know, more regular pipes such as GPU's and DSP's
- Logic design: Speed targets demand aggressive design methods (hand placement with strange topologies), and so forth
- Validation: Given the breadth of the x86 ISA, it is very difficult to validate all functionality on every protocol corner case (see definition)
- Debug: Not too sure on this one, maybe pm can answer, but silicon debug for CPU's is considered black magic...