Power and Die size both. AMD is behind NVidia in efficiency. They need more power and more die space to match performance.
Vega 64 already has bigger, more power hungry die that Titan/1080Ti, yet it only performs like a GTX 1080.
That is a large deficit.
Going to MCM is not a panacea, and is much more problematic that using multiple CPU dies.
Even if you magically solved all the MCM GPU issues, the same design split over multiple dies would always be slower than on one monolithic die. It won't save power either. In fact it would be worse there as well, and off chip connections use more power.
I am quite certain Navi won't be MCM, except for the possible Crossfire dual chip cards we have had for years.
GPU MCM will really only make sense if the die target size is ridiculously large.
That's because AMD is trying to compete with Titan and Nvidia's larger compute chips with a single chip. Which was really all AMD could do to try to compete, they just weren't (aren't still) in a good place to make effective use of that. They don't have leverage in the compute/pro space to capitalize on their chip there, and they don't have the resources to get all the graphics/gaming capability of it on the consumer side. Nvidia did a better job of timing how strong to go in the compute direction, enabling them to have a smaller more efficient chip and just focusing on getting the most out of it. AMD had to put out the chips they did because they had to lure developers to them. Much like they pretty much had to start moving to open source as they'd have no chance of competing as they fell further and further behind in the software aspect. So even if AMD had done two chips, one compute focused, and one graphics focused (by focused, I mean basically all about it, not talking the Nvidia situation where they've got the leaner/smaller consumer chip and then the larger pro/enterprise one), I don't think that'd be a good setup either, since they were competing against Nvidia chips that did both, while waiting on the industry. Plus, it would inevitably lead them to doing multi-chip setup anyway (and especially in the consumer space that'd probably be killer since that's a lower margin area where they'd need to be putting two chips).
Absolutely it has issues that will need to be overcome. I'm not entirely sure I'd agree with that, as lots of the tasks GPUs are used for are more parallel than CPU. Not saying it'll be easy, but I don't think there's necessarily anything inherently that makes it worse. Perhaps latency issues (like how important latency is for game rendering especially as we move to VR, where you need to have everything synced well and minimize latency), but the general tasks themselves are pretty well suited for heavy parallelization. And they're working on dealing with latency (looking into light transmission for interconnects, even at the chip level).
Er, you "magically solved all MCM GPU issue" but then here's several other problems? Doesn't sound like you magically solved all the issues then?
The problem is, you act like you'd absolutely be able to make such a monolithic die, and that's exactly why the industry as a whole is starting to look at this type of setup. They're hitting limits on the physics of chip production. I'm not sure the power use will be that much different, as realistically it'd just mean there'd be a bit extra wiring between the components, but it might be possible to do power gating more easily? Plus, its not like there aren't potential benefits, by spreading out the processor, it could actually help thermal performance by spreading the heat out over larger area, enabling less throttling. Plus, as chips start to move towards adding more co-processing, and they develop new memory (even HBM), they can potentially arrange chips in a better manner. Like putting shared memory between components, or putting processors that might share pieces closer. It could also be used to improve efficiency by having multiple chips operating at peak power efficiency in clocks, because the combined chips have more compute units. And because they're also in a more optimal manufacturing size than a single large monolithic design, it could be more cost efficient as well.
Mostly though I'd say stop thinking of it as solely breaking a GPU into multiple pieces, and think of it as potentially enabling various chips to be able to go closer together. Plus it offering flexibility so that you're not stuck trying to figure out balances as you make a monolithic chip. You can better adjust components for different uses, which will help a lot. Sure, if you could make a giant chip with all of that, it could probably beat it some, but you'd either have to make more specific chips (which the engineering expenses of will be higher), or make them more general purpose to be suited for more uses. And they're moving to start using more specialized processing units, which means they'll have even more need to slot more variety of chips in. And instead of doing the old method of add-in cards or the like, they can slot those closer to the other chips. This enables a more granular control over mixing and matching processing components than what we currently have.
I am quite certain that it will, because they're specifically developing it as such. I'm not expecting it to be some mega conquering wonderchip or anything (heck, I won't be surprised if its a bit of a dud or has some major problem, but I think it won't be anything that couldn't be fixed, even if that means that Nvidia or someone else properly implements MCM GPU setup), but I expect, even a 2 module Navi GPU will enable AMD to have a better GPU than trying to compete making a monolithic one of similar capability. Plus eventually it helps them do a complete stack of GPUs (and thus not have the situation where we've ended up with a bunch of GPUs with minor differences like with the various GCN versions, or situations like Polaris and Vega where there's fairly substantial differences). Initially I think it'll be more simple. We'll see something like Polaris level with 1 module and 1 stack of HBM, then a higher chip with 2 modules and 2 stacks placed between them.
I don't entirely agree. I think it will enable that (effectively we'll get more GPU), but I think even now, it has financial benefits from focusing on a single module size to mass produce, which will help on the chip design/engineering side (which is a substantial part of the costs; which absolutely those resources will need to go to getting the modules to work together now, so short term you're not gonna save probably any overall resources, but if you get it working well it would have big savings and/or let you reallocate resources to providing more variability in products, or chip design or improving the chip, like tweaking it for higher clocks, etc). I do think that doing that though, will require figuring out an optimal size for the module. I think that's why Ryzen was a success is that they figured out the right size to let them do an entire product stack. If they'd have decided to go with 1 or 2 cores per module max, would have probably not been as beneficial for them (unless each core brought a ton of performance, which I don't think it would). Plus balancing other aspects (another thing they did very well with Ryzen, is the input output, the memory and other, if they'd had just gotten it to be able to do 1-4 channels versus 2-8, and then didn't get the input/output good enough, thinking if it topped out at 64 PCIe lanes, things wouldn't be as rosy).
Something else I wonder, is could this start enabling something a bit new (in this space at least). Like stacking, like say putting the memory sandwiched between the two processors (not horizontally, but vertically). For a video card, think having a chip where there's two GPUs, one on either side of the card with the memory being between them. Maybe that'd have benefits for interposer (where instead of having to have a large one, you could have a smaller one and have direct connections between the GPUs, helping with latency)?
It might be MCM in terms of server type deal. You build a custom arrangement of say 10 or 20 Navi GPU's linked together to provide hundreds of Gflops in server type environments. Could be possibly used in super computers, imagine an array of 100 Navi gpu's all working in unison to provide 1000's gflops of compute power.
I don't think desktop MCM is a possibility or a reality, but in terms of high end, computational, research, super computers type deal it makes a lot of sense. Nvidia has 1 Volta for 10k(pro drivers or whatever) or $3000 prosumer Volta which brings 1x performance, AMD can fuse together 4xNavi for 2x performance.
I don't know, I feel like Ryzen is showing it is possible, and I think graphics should be well suited to it, and then in the future they'll adjust the whole situation (mixing and matching units based on the needs of the customer). The real question is will they actually implement it well in the end product. Both Intel and Nvidia seem to be following suit, or are at least strongly looking at it. Intel more than just looking at it as the chips with Vega are kinda what I'm talking about mixing processing components and memory. It enables companies to be flexible in ways that wouldn't in the past make sense.