Originally posted by: myocardia
I'm guessing in 10 years, we won't still be using x86 processors, even 64-bit variations. I think we'll have reached the limits of what can be done with silicon and copper. Even the EE's usually will admit that anything smaller than 22nm is going to be extremely difficult, if not impossible, and 22nm will be debuting in four years. I can see possibly getting down to 17nm, if they make the transition to laser interconnects, but after that, then what? BTW, ask us this question in 4 or 5 years, and I bet you'll get very different answers.
We will never reach the limits of what can be done with the materials science, I can say this with absolute certainty from experience and education.
What will limit how far we go with the materials science though is the financial aspects of the industry.
8nm is certainly possible, but will the revenue (and margins) from the devices manufactured on an 8nm node be sufficiently enticing enough to justify a company (Intel or other) to allocate their shareholder's wealth 6yrs in advance in the pursuit of developing such a node?
For TI the decision was that developing 32nm was not in the shareholders best interest for reasons of revenue ($-volume) and margins.
The financial side of Moore's law is the only thing that will limit continued process technology cadence. We are pretty much in the stone-age when it comes to the materials science we have yet to uncover and exploit IMO.
Originally posted by: myocardia
Originally posted by: Idontcare
process 45nm -> 8nm in 10yrs (0.18x)
So how does a 16 core, 60billion transistor, 25GHz CPU sound to you?
Assuming they are able to get the process node anywhere near 8nm, don't you think they'll be putting
alot more than 16 cores in there? We had 4 cores @ 65nm, and we have 6 @ 45nm, and @ 32nm, we'll have no problem squeezing 8 cores onto each die. At that rate, I'm seeing at least 12 cores per die @ 22nm, and 16 cores @ 17nm. Wouldn't that put us @ 24 cores @ 12nm, and 32 cores @ 8nm?
You're pretty savvy when it comes to computer technology, programming technology, etc, so no-doubt you are familiar with Gene Amdahl's law and the modifications proposed thereafter by Almasi and Gottlieb to incorporate the performance penalties associated with interprocessor communication overhead.
What happens is that while Amdahl's law explains to us why even a theoretical speedup of 1:1 is impossible when the code is anything less than 100% parallelized, the Almasi/Gottlieb part goes further to explain why you can actually see performance decrease (not just stop getting incrementally better with more cores, but actually start to decrease with more cores) if you scale the cores and threads faster than you scale the performance of the interprocessor communications. This has to do with many computer scientists (yes they exist) generically refer to as fine-grained and course-grained applications.
Coincidentally the Beowulf clusters I was building nearly 10yrs ago (1999/2000) had a maximum performance peak on my code of interest (Gaussian 98, a computational quantum chemistry package) of right at 16 nodes (processing cores). Adding more cores, even though the software could scale the number of threads to take advantage of the extra cores, actually slowed down the performance and the entire cluster would take longer to complete its jobs. Exactly as predicted by Almasi and Gottlieb.
What does this have to do with 10yrs from now? My expectation is that we'll continue to see more xtors allocated towards massive shared caches to ensure the interprocessor communications continue to speed-up so they don't undermine the efforts of the programmers to make 8 and 16 threaded applications perform better than a fewer-threaded variant of the same application.
Balance must be maintained in the push for more cores, bigger cores, and more cache. I also wouldn't be surprised if we see heterogeneous cores on-die (both in core size/capability as well as clockspeed). The best way to deal with Amdahl's speedup limitations is to put the serial code on a faster "head" node while farming out the parallel code operations to more numerous but "dumber" nodes.