IPC should be the first thing anyone ever optimizes, due to A) inherent scaling problems with multiple cores, except in ideal situations like graphics...
It's a common misconception that GPUs have many cores. The GTX 680 has 'only' 8 cores, with 6 vector units each, which are 32 elements wide. NVIDIA multiplies that all together to claim it has 1536 cores, but really there are only 8. For comparison, mainstream Haswell CPUs have 4 cores, with 2 (floating-point) vector units each, which are 8 elements wide, and run at over 3 times the clock frequency.
Also note that the GTX 680 reaches only a relatively small market, while Haswell is aimed at a very wide market. My point is that Intel does put a lot of effort into multi-core and vectorization. Haswell's TSX and AVX2 technology underscores that. That said, IPC should indeed not be sacrificed for more cores like AMD did with Bulldozer.
On the other hand I think it would be bad to optimize for IPC first and add more cores and SIMD capabilities as an afterthought. They really need equal attention and I think Intel achieved a home run with Haswell. Keep in mind that extracting more ILP is getting ever harder, while TLP and DLP are up for grabs. Rest assured that we're a long way from hitting the "inherent scaling issues" of multi-core and vectorization. The real issue was the lack of efficient synchronization primitives, which is addressed by TSX, and not having wide vector equivalents of every scalar operation, which is addressed by AVX2. So clearly Intel is very forward-thinking.
I wouldn't be surprised if their next major architecture barely improves IPC but instead features 8 cores with 4 vector units of 16 elements. The transistor budget required for that could come from ditching the IGP and
doing graphics on the CPU cores. For power efficiency, the vector units could be split into two clusters running at half the base frequency, each dedicated to one thread and covering memory latencies with AVX-1024 instead of Hyper-Threading.