Comparing processor setups

Game Boy · Jul 18, 2007

What, in terms of performance, power consumption, package size, manufacturing cost and yield, are the differences between:

a) One microprocessor with two seperate dies with one core each.
b) One microprocessor with one die containing two cores ("Native")
c) Two microprocessors with one core each
d) One large microprocessor with one core with twice as many transistors set out for double superscalar processing of the normal core

Assume they're all the same architecture so other factors don't influence it, and the layout and total number of transistors within each scenario is the same.

Extelleron · Jul 19, 2007

Originally posted by: Game Boy
What, in terms of performance, power consumption, package size, manufacturing cost and yield, are the differences between:

a) One microprocessor with two seperate dies with one core each.
b) One microprocessor with one die containing two cores ("Native")
c) Two microprocessors with one core each
d) One large microprocessor with one core with twice as many transistors set out for double superscalar processing of the normal core

Assume they're all the same architecture so other factors don't influence it, and the layout and total number of transistors within each scenario is the same.

From a real-world performance standpoint, there will be no noticeable difference between a) and b). There should be a theoretical advantage to having one die, but I don't think it is significant enough to have any real-world effects.

The big difference between the two is yield, and thus manufacturing cost. The biggest problem when making a CPU is die size; as die size gets bigger, there is a greater chance of defects. Thus, the yield is reduced. A CPU with lower yield will be more expensive to produce. In general, a CPU with 2 cores on one die will be more expensive to produce than a dual-core CPU made up of 2 dies.

When it comes to power usage, a) and b) should be exactly the same, AFAIK. In terms of heat, I would assume they are similar. The only thing you have to consider is that when overclocking a CPU w/ 2 dies, one of the die might not be as good of an overclocker and hold the other back. This can happen with a native multi-core CPU as well... sometimes one core can handle more than the other.

c) should be very similar to the other two as well. There is likely a theoretical advantage to having two CPUs, because unlike a multi-core CPU, they do not have to share the same FSB (AFAIK.) Looking at AMD's Quad FX, we see 2 CPUs with 2 cores each, and they perform pretty well.

As for d), you'll have to wait for somebody else.

Peter · Jul 21, 2007

It depends, mostly because the answer to "what's better" heavily depends on the application.

Weaknesses and strengths:

A: puts an extra load on the common external bus, compromising its maximum frequency. All levels of cache are separate, which is good for running truly separate tasks, but bad for a single, multithreading application. Manufacturing yield is inherently little worse than with single units.

B: Single point of bus attachment, no speed compromise. Possibility for a shared level of cache. If the CPU design contains the RAM controllers (AMD), arbitration for RAM access can be intelligent.

C: Performs identically to A if the processors are on a common bus (Intel), maybe compromises the bus speed a bit more because of the extra socket. However, if the processors each have or even contain their own northbridge (AMD), you get an extra set of RAM controllers, doubling the available RAM bandwidth and maximum RAM size. The benefit of the higher bandwidth is biggest for separate tasks, but still quite noticeable for a multithreaded single application - provided the interconnect between nodes is no slower than the RAM. AMD's solution is quite close to that ideal.

D: Is pretty much the same as B really.

Game Boy · Jul 22, 2007

So why have chip manufacturers started to use a) and b) instead of d)? Are they less expensive or something?

Peter · Jul 22, 2007

Because (D) hit a brick wall.

How do you make a single core go faster? Either you make it more complicated to computer more per clock (higher Instructions Per Clock rate, IPC), or you ramp up the clock speed to insane levels.
The more-frequency approach was what the P4 at Intel was about: Not much done per clock, on a design that would - in theory - run at extremely high frequencies. It hit the wall at below 4 GHz whilst originally planned for 5 and beyond.
AMD in turn (and /much/ earlier, Cyrix) found that aiming for more throughput per clock limits the achieveable frequency such that the resulting performance isn't any better either.

CTho9305 · Jul 22, 2007

For what it's worth, the wall you hit with d is power. If you're willing to spend big bucks on cooling, you can get quite a bit faster for single-threaded tasks.

If you have 1 2GHz CPU, it's going to consume much more power than 2 of the same dies running at 1GHz. Power = c*v^2*f; c is doubled (2 cpus), f is halved (half the clock speed), and you can lower the voltage (which ends up being a huge win since power depends on the square).

evolucion8 · Jul 24, 2007

Originally posted by: Peter
Because (D) hit a brick wall.

How do you make a single core go faster? Either you make it more complicated to computer more per clock (higher Instructions Per Clock rate, IPC), or you ramp up the clock speed to insane levels.
The more-frequency approach was what the P4 at Intel was about: Not much done per clock, on a design that would - in theory - run at extremely high frequencies. It hit the wall at below 4 GHz whilst originally planned for 5 and beyond.
AMD in turn (and /much/ earlier, Cyrix) found that aiming for more throughput per clock limits the achieveable frequency such that the resulting performance isn't any better either.

Also Intel hit that wall (Artificially tough) with the Pentium M. But the AMD approach on the Athlon 64 is what drove Intel to release the Pentium M enhanced known as the Intel Core Duo and Intel Core 2 Duo. AMD approach offered much more performance per clock than any Pentium 4 CPU, and once you overclocked, it was further improved. My Pentium M overclocked to 2.6GHz outperformed my previous P4 EE 3.4GHz in everything except video encoding which is quite linear and like fast clock cycles, even though it's not much slower.

Steve · Jul 24, 2007

Originally posted by: Peter
AMD in turn (and /much/ earlier, Cyrix) found that aiming for more throughput per clock limits the achieveable frequency such that the resulting performance isn't any better either.

I'm curious as to how other architectures compare/contrast with this, such as Alpha, SPARC, PA-RISC, Itanium, G3/4/5, etc. - would it be reasonably safe to say, in general they are designed with that philosophy of more work done per clock?

degibson · Mar 21, 2008

Originally posted by: Game Boy
What, in terms of performance, power consumption, package size, manufacturing cost and yield, are the differences between:

a) One microprocessor with two seperate dies with one core each.
b) One microprocessor with one die containing two cores ("Native")
c) Two microprocessors with one core each
d) One large microprocessor with one core with twice as many transistors set out for double superscalar processing of the normal core

Assume they're all the same architecture so other factors don't influence it, and the layout and total number of transistors within each scenario is the same.

A is sometimes referred to as a multi-chip module, or MCM. I disagree with Extelleron's assertion that B is cheaper than A due to die size constraints -- I believe the MCM, A, is cheaper because 1) It requires an additional, highly precise manufacturing step and 2) Core area is only a small component of the whole die (1/3, tops for a single core) so adding a second doesn't badly increase the die area.

B is your basic commodity 'multi-core' chip. There are two cores on the chip, and they present two thread contexts to the software. Sometimes they take advantage of being on the same chip by sharing resources, like caches, sometimes they don't.

C is your old-style SMP. It has basically NO sharing potential (aside from main memory), and communication latency between the two cores is pretty high.

D, as others have suggested, is pretty difficult to build for complexity, power, and latency considerations. I'm quite sure everyone would be making D if it could be efficiently engineered. Superscalar hardware complexity/power/latency tends to grow quadratically with issue width. This wouldn't be a problem if signals could be communicated at light speed and it was cost-effective to manufacture 50-metal-layer chips.

So, assuming each of these systems is built to its peak potential:

D (or a single core made as aggressively as possible) would be the best performer for single-threaded applications. It would also be a power hog. It is totally unlike the other solutions, because it devotes all of its resources to a single thread's execution. That single fact has massive implications in all levels of the design -- the memory system especially.

A and B would have similar single-threaded and dual-threaded performance. B might win in a communication-intensive application because of slightly faster communication latency. A might have slightly better performance isolation for single threads. B would be much cheaper than A. A is sold as a desktop chip, B's are sold on mid-range or high-end servers. A will consume less power, as its power distribution is simpler and it probably has a smaller package.

There's almost no reason at all to prefer C these days. C's performance could be much worse than A/B, since a large inter-chip interconnect may be a bottleneck. Exceptions: Failure isolation (properly built, C's chips could fail (and be replaced) independently), Backwards compatibility.

Comparing processor setups

Game Boy

Member

Extelleron

Diamond Member

Peter

Elite Member

Game Boy

Member

Peter

Elite Member

CTho9305

Elite Member

evolucion8

Platinum Member

Steve

Lifer

degibson

Golden Member

TRENDING THREADS