Originally posted by: Game Boy
What, in terms of performance, power consumption, package size, manufacturing cost and yield, are the differences between:
a) One microprocessor with two seperate dies with one core each.
b) One microprocessor with one die containing two cores ("Native")
c) Two microprocessors with one core each
d) One large microprocessor with one core with twice as many transistors set out for double superscalar processing of the normal core
Assume they're all the same architecture so other factors don't influence it, and the layout and total number of transistors within each scenario is the same.
A is sometimes referred to as a multi-chip module, or MCM. I disagree with Extelleron's assertion that B is cheaper than A due to die size constraints -- I believe the MCM, A, is cheaper because 1) It requires an additional, highly precise manufacturing step and 2) Core area is only a small component of the whole die (1/3, tops for a single core) so adding a second doesn't badly increase the die area.
B is your basic commodity 'multi-core' chip. There are two cores on the chip, and they present two thread contexts to the software. Sometimes they take advantage of being on the same chip by sharing resources, like caches, sometimes they don't.
C is your old-style SMP. It has basically NO sharing potential (aside from main memory), and communication latency between the two cores is pretty high.
D, as others have suggested, is pretty difficult to build for complexity, power, and latency considerations. I'm quite sure everyone would be making D if it could be efficiently engineered. Superscalar hardware complexity/power/latency tends to grow quadratically with issue width. This wouldn't be a problem if signals could be communicated at light speed and it was cost-effective to manufacture 50-metal-layer chips.
So, assuming each of these systems is built to its peak potential:
D (or a single core made as aggressively as possible) would be the best performer for single-threaded applications. It would also be a power hog. It is totally unlike the other solutions, because it devotes all of its resources to a single thread's execution. That single fact has massive implications in all levels of the design -- the memory system especially.
A and B would have similar single-threaded and dual-threaded performance. B might win in a communication-intensive application because of slightly faster communication latency. A might have slightly better performance isolation for single threads. B would be much cheaper than A. A is sold as a desktop chip, B's are sold on mid-range or high-end servers. A will consume less power, as its power distribution is simpler and it probably has a smaller package.
There's almost no reason at all to prefer C these days. C's performance could be much worse than A/B, since a large inter-chip interconnect may be a bottleneck. Exceptions: Failure isolation (properly built, C's chips could fail (and be replaced) independently), Backwards compatibility.