Explain how EX is "drastically changed"? You do realize that Excavator is just a further evolution of Bulldozer, not some dramatically swapped-around uarch, right?
If AMD wants to keep up with Intels Core series, they'll have to increase performance. Frequency scaling seems to be dead, as Kaveris clocks are going down, not up. This leaves IPC increases, and this will require wider cores. And widening the core is a rather far-reaching change. You can't just slap 1-2 additional ALUs in there and be done; you need more scheduler dispatch ports, more scheduler entries, more register read ports, a more complex result forwarding network, a deeper reorder buffer, etc. And doing all this in one step is indeed rather complicated because it changes so many things in one step.
This doesn't even touch the problems of FP. In Kaveri, the FPU is very weak for multi-threaded compared to integer. If AMD will ever want to participate in server/workstations again, they'll have to remedy this problem. Either widen the FPU in both execution units and data paths, or give each core its own FPU again. Not an easy call, and I'm really curious which way they'll go; either way, this will also be a far-reaching microarchitectural change.
Of course, it's also possible that they keep the basic design and only slightly improve the Steamroller architecture while going 20nm, but I really don't see the point. Against Broadwell/Skylake, this won't be competitive. Heck, they'd be lucky to compete with Sandy Bridge performance-wise (but should beat it in perf/W).
I'd also like a source for the claim of Kaveri being delayed due to process issues. Steamroller ver1, whatever it was, was canned at some point in 2012 and they decided to work on bdver3b, or Steamroller B.
No, I don't have a source, because AMD is tight-lipped about the reasons for their inability to deliver their roadmap targets. However, I hope we can agree that AMD didn't delay and then downgrade performance targets of Kaveri voluntarily, but due to some unexpected technical problems. This pretty much leaves process/physical design problems, or problems related to architecture development.
Of these, the process problems are the more common. This affects even companies with huge budgets and a development flow designed to minimize any process-related problems; see the mass production delays of Broadwell. If AMD would run into this, it would be pretty much an industry norm and thus "business as usual".
If the problems were in the architectural domain, I'd see this as far more problematic. This would imply that AMDs engineer teams can't really predict performance and/or validity of a design until tapeout has occured. They seemed to have this problem with the original 45nm Bulldozer scheduled for 2009, and if the design process hasn't been fixed to prevent such failures, this doesn't bode well. This would imply that AMDs big core development is pretty much a hit and miss affair, where they need plain luck to hit their design targets. And I really can't believe this to be the case. While AMD really screwed up during Bulldozer development, I'm pretty confident that they've fixed their overall development process.
But even with the assumption that their architecure development is borked, it could only explain the delay on the CPU side of things, as the GPU should already have been developed and is proven to work in stand-alone products. The performance downgrade isn't really explainable with design problems.
And this leads me to the opinion that AMD hasn't mastered 28nm (at least at GFL). If you can think of another option which explains both delays and the performance downgrade, then please let me know.