When will we see a completely new Intel uarch (similar to Netburst->Core jump)?

crashtech · Jun 16, 2013

BallaTheFeared said:
Depends on how you look at it.

Performance wise it's clearly better, no argument there of course even at lower MHz.

However looking at it from a perf/w perspective, "destroys" is exactly the right word.

I'm not clear on what you mean by "perf/w perspective," and I won't contest that Haswell beats SB. But by how much? It looks like it's routine to get SB close to 5 GHz, Haswell, maybe 4.5 if you are lucky. So the huge performance delta is just not there, not even sure what it is, single digit percentages, maybe? That's not destruction, it's a small increment.

P.S. I'm not complaining! Just call it what it is.

CakeMonster · Jun 16, 2013

crashtech said:
It looks like it's routine to get SB close to 5 GHz, Haswell, maybe 4.5 if you are lucky.

This misconception is even more common that I originally thought once the HW threads started popping up. I won't comment on expected OC on HW since it might be early to establish, but its anything but "routine" to get SB anywhere close to 5GHz.

crashtech · Jun 16, 2013

CakeMonster said:
This misconception is even more common that I originally thought once the HW threads started popping up. I won't comment on expected OC on HW since it might be early to establish, but its anything but "routine" to get SB anywhere close to 5GHz.

Speculation on both our parts. It seems clear that SBs DO overclock better, but the average speed delta is not clear.

BallaTheFeared · Jun 16, 2013

My chip disagrees, and also people complaining about 4.4-4.5GHz on Haswell is exactly what most people got with Sandy Bridge.

Those who get the best/worst chips speak the loudest.

And what I mean is OC vs OC Haswell completely trashes SB in perf/w, it's not even close.

crashtech · Jun 16, 2013

Oh, watts. OK. The added efficiency has not resulted in better thermals, though.

I guess if I end up buying a Haswell, I'll join the rah-rah club. Just not today.

john3850 · Jun 16, 2013

My sb is at 4700 and my ib is at 4500 I wont buy a Haswell to run at 4300.
With every new cpu your ipc goes up but your top speed goes down.
If you push it most sb hit 5000 while most ib hit 4.7

tynopik · Jun 16, 2013

Hulk said:
The easiest way to increase IPC across the board would be to increase clockspeed.

IPC = instructions per clock, which is unaffected by clockspeed

crashtech · Jun 16, 2013

Best to say single-threaded performance, perhaps.

BadThad · Jun 16, 2013

Fjodor2001 said:
The Intel Conroe CPUs were introduced in 2006. By that Intel moved from the old Netburst to the new Core uarch generation. It was a revolutionary change in the design of their CPUs.

Since then we've basically only seen incremental improvements of the Core uarch generation (going from Conroe->...->Haswell).

So now I just wonder when it's likely we'll see a new revolutionary jump in uarch design from Intel like when transitioning from Netburst->Core? How long can they continue just making incremental updates to the now 7 year old Core architecture? Isn't a complete redesign needed at some point to get any further major increases in performance (IPC, CPU clock frequency, etc)?

I've been wondering that too. Once Intel beat AMD in the IPC game, they went back to being Intel and milking an architecture for all it's worth. The only reason we got rid of NetBurst is because AMD trumped Intel in IPC. Intel was planning on using NetBurst for a long, long time. For that, I have to say I love AMD! Fanboy's aside, all of us need to support both AMD and Intel for the betterment of computing. It's competition that drives major revisions in computing.

IMO, Intel is in no hurry to move away from Core. It will be a couple more years at least unless AMD comes along with a big surprise. The release of the 8-core, 5GHz FX-9590 will help keep the pressure up at least.

IntelUser2000 · Jun 17, 2013

BadThad said:
I've been wondering that too. Once Intel beat AMD in the IPC game, they went back to being Intel and milking an architecture for all it's worth. The only reason we got rid of NetBurst is because AMD trumped Intel in IPC. Intel was planning on using NetBurst for a long, long time.

Ah, no. Once Intel beat AMD in the performance game to a point where their last generation processors were beating AMD, the real threat popped up.

That's ARM.

Everyone related to the computer industry woke up and realized that offering performance improvements beyond "good enough" is going to be a waste and one day, someone will swoop up and give them a genuine threat. And that's probably when Intel started disregarding AMD as much of a competitor anymore as well.

Also, Moore's Law allowed computers to go into smaller and more portable devices, but PC was basically going the other way for good part of 2000-2010. That was going to fail.

(The first time I've seen "good enough" performance come up is when Intel got the Core 2 chips out. I've wondered then how much it makes sense to continue performance gains at cost to everything else. It didn't make too much sense to me, and Intel gets it as well. It was bound to go in a different path sooner or later)

Lepton87 · Jun 17, 2013

tynopik said:
IPC = instructions per clock, which is unaffected by clockspeed

BS

http://www.anandtech.com/show/1230/24

cytg111 · Jun 17, 2013

Lepton87 said:
BS

http://www.anandtech.com/show/1230/24

What? Allright you can rationalize it but calling BS is BS.

Theoretical or practical IPC? Clocking one part of the system, the CPU, higher while leaving the rest of the subsytem intact will of course not yield linear performance scaling and thus fewer instructions PER clock will get chewed up in the processor. But talking about IPC, for talking about IPC in any sense that makes sense, we have to keep all other things equal, thus we are talking about theoretical IPC, which by definition is unaffected by clock.

Lepton87 · Jun 17, 2013

cytg111 said:
What? Allright you can rationalize it but calling BS is BS.

Theoretical or practical IPC? Clocking one part of the system, the CPU, higher while leaving the rest of the subsytem intact will of course not yield linear performance scaling and thus fewer instructions PER clock will get chewed up in the processor. But talking about IPC, for talking about IPC in any sense that makes sense, we have to keep all other things equal, thus we are talking about theoretical IPC, which by definition is unaffected by clock.

No it's not unaffected by clock, every part of that system was clocked exactly the same except for the CPUs, yet IPC increased faster on Prescott as clock increased then on Northwood. If that real world example doesn't cut it for you then I won't argue further.

cytg111 · Jun 17, 2013

Then dont argue further, cause it dont .

JimmiG · Jun 17, 2013

Fjodor2001 said:
The Intel Conroe CPUs were introduced in 2006. By that Intel moved from the old Netburst to the new Core uarch generation. It was a revolutionary change in the design of their CPUs.

Since then we've basically only seen incremental improvements of the Core uarch generation (going from Conroe->...->Haswell).

So now I just wonder when it's likely we'll see a new revolutionary jump in uarch design from Intel like when transitioning from Netburst->Core? How long can they continue just making incremental updates to the now 7 year old Core architecture? Isn't a complete redesign needed at some point to get any further major increases in performance (IPC, CPU clock frequency, etc)?

It's coming, but it won't revolutionize desktops or even laptops. It's all about bringing the power of desktop CPUs to SoCs used in tablets and smart phones. Desktops themselves have reached the end of the line.

Loki726 · Jun 17, 2013

JimmyG is exactly right. There is not a lot of headroom left to increase single threaded CPU performance beyond Haswell barring a major breakthrough (and btw, most architecture research labs have already decided that this problem isn't solvable and have moved onto other problems like power efficiency or parallel architectures). However, it is probably possible to design a processor with Haswell level performance in a phone power budget.

We have known this was coming since around 2004-2006, and that is exactly why there were early attempts at multicore processors. The industry isn't holding back CPU perf because it is good enough. If people knew how to make a faster system they would, and there are really cool disruptive technologies that this would enable.

There are a few ideas left that could be implemented in broadwell to squeeze out a few extra percent of ILP like speculative multithreading or dynamic trace optimization. Ideas like these have been considered too complex and too risky to implement in the past, but we are quickly getting to the point where they are the only thing left.

Lepton87 · Jun 17, 2013

Loki726 said:
JimmyG is exactly right. There is not a lot of headroom left to increase single threaded CPU performance beyond Haswell barring a major breakthrough (and btw, most architecture research labs have already decided that this problem isn't solvable and have moved onto other problems like power efficiency or parallel architectures).

I'm quite sure that designing a CPU that would have 50% better single-threaded performance then HW is possible,(standard air-cooled CPU) it's not like HW is an engineering marvel that bumps into the walls of human creativity and mental capacity. Given enough resources and changing that Intel's rule that 1% higher power consumption must give 2% more performance and we could easily see very good gains in ST performance at the expense of power and ultimately MT performance because there is only so much power you can dissipate. Even if we increased ST performance by 5% each year, we would have doubled ST performance in 14 years. That's the very least that's going to happen.

Loki726 · Jun 17, 2013

I meant within the design constraints of a desktop CPU that can be sold at a profit. If you throw cost out the window you could probably do somewhat better, although this isn't always true since bigger structures often have higher latency. You can never throw out power as a constraint in a CPU because increased power increases temperature, and that forces you to throttle.

The constraints on HW are imposed mainly by physics.

Lepton87 · Jun 17, 2013

Loki726 said:
I meant within the design constraints of a desktop CPU that can be sold at a profit. If you throw cost out the window you could probably do somewhat better, although this isn't always true since bigger structures often have higher latency. You can never throw out power as a constraint in a CPU because increased power increases temperature.

The constraints on HW are imposed mainly by physics.

I don't think power would be that much of an issue, 700mm2 is close to as big as IC can get.
If we designed a single core CPU with lots of cache effectively acting as dark silicon, then dissipating 350Watts shouldn't be a problem, GPUs already do that. 700mm2 at 22nm Intel process, how much cache would that be? Intel could devote 200mm2 to the CPU core and the rest to cache. There are application where ST performance is of utter importance, right now people mostly use overclocked desktop CPUs for such applications, Intel once catered to that audience by releasing 2-core 4.4GHz westmere. Sometimes custom ASICs are used instead.
In short, what's commercially viable and what's possible are two different things and in your previous post you somewhat sounded like you meant the latter rather then the former.

Loki726 · Jun 17, 2013

Yeah, I more or less agree that it is possible to increase perf if you are willing to arbitrarily increase cost. We probably will see small 3-5% improvements over time. There may even be a few ideas that can deliver more than this. This is what I meant when I said that there isn't a lot of headroom.

Although I don't strongly agree with your individual examples. Bigger caches have diminishing returns and you typically have to sacrifice something other than just area for more capacity (latency, associativity, NUMA effects, etc). There is often a sweet spot rather than more perf with more capacity.

ASICS are like "heroic compiler optimizations", they only work for a few apps and are not general purpose. So I don't think it is really fair to say they can speed up ST code. They can, but only in very limited circumstances, and it doesn't happen automatically behind the scenes. The app developer needs to be aware of it.

Lepton87 · Jun 17, 2013

Loki726 said:
Yeah, I more or less agree that it is possible to increase perf if you are willing to arbitrarily increase cost. We probably will see small 3-5% improvements over time. There may even be a few ideas that can deliver more than this. This is what I meant when I said that there isn't a lot of headroom.

Although I don't strongly agree with your individual examples. Bigger caches have diminishing returns and you typically have to sacrifice something other than just area for more capacity (latency, associativity, NUMA effects, etc). There is often a sweet spot rather than more perf with more capacity.

ASICS are like "heroic compiler optimizations", they only work for a few apps and are not general purpose. So I don't think it is really fair to say they can speed up ST code. They can, but only in very limited circumstances, and it doesn't happen automatically behind the scenes. The app developer needs to be aware of it.

About the caches, I don't think you need to sacrifice anything if you just add additional levels of cache. For example if you don't think increasing L3 would be beneficial, because the increased latency would more then offset the increase in capacity and it would actually be detrimental to performance, just add L4 cache etc.

Hulk · Jun 17, 2013

tynopik said:
IPC = instructions per clock, which is unaffected by clockspeed

Brain fart! I meant to write "overall performance."

Loki726 · Jun 17, 2013

Lepton87 said:
About the caches, I don't think you need to sacrifice anything if you just add additional levels of cache. For example if you don't think increasing L3 would be beneficial, because the increased latency would more then offset the increase in capacity and it would actually be detrimental to performance, just add L4 cache etc.

I don't want to derail the thread to go into too much detail on caches

However, adding additional levels still adds miss latency since you have to go through another state machine and buffers before hitting the memory controller. Adding an additional layer probably (depending on layout) also increases the physical distance that data needs to travel to get to the memory controller, adding latency.

Exophase · Jun 17, 2013

Lepton87 said:
Even if we increased ST performance by 5% each year, we would have doubled ST performance in 14 years. That's the very least that's going to happen.

No exponential rate of improvement will last forever, no matter how small you set the coefficient. Intel's rate of improvement in manufacturing will almost certainly slow down over the next 14 years (barring some major breakthrough). I wouldn't be so confident that even a 5% yearly increase in ST performance will be sustained.

For all of the disappointment in its IPC improvements Haswell made some pretty major changes. I fully expect Skylake to do less for legacy code. There's really a limit to how much you can improve average IPC for a large set of existing programs. A lot of them just don't have more ILP available, and there's not much left to further tweak average load to use latency, branch mispredict penalty, etc. Adding a ton of extra L3 or L4 cache won't make that much different for most code, and while changing the hierarchy again might help a little I suspect that what Intel has now is pretty strong.

It's true Intel is leaving single-threaded performance on the table right now with some clock headroom, unfortunately that headroom seems to be shrinking over time instead of growing. It could very well be that Intel isn't pushing 4.2+GHz processors today because they don't want to commit to that clock speed later, especially if the IPC improvements slow down enough to where they can't afford to launch at a slower clock speed.

Concillian · Jun 17, 2013

In our industry, related to the semiconductor industry, they talk about "S" curves.

Here's a quick youtube thing I found that explains s-curves and how they can be applied to most any industry to explain advancement:
http://www.youtube.com/watch?v=E5QmwMZ0voI

If you consider current x86 performance, we're likely just near the knee of the S-curve. Latest performance improvements have been in power consumption, but I believe that is rapidly approaching the top of the curve that IPC * clockspeed have been at for a couple generations. In reality this is a macro-level s-curve made up of smaller s-curves.... for example: 32nm process technology, 22nm process technology and other related s-curves like tri-gate technology. These all build into the larger scale s-curve for x86 performance... IPC * clock speed, power consumption and other high level performance goals.

Realistically, I don't think we'll see a clear uarch jump like netburst to Core. That was an optimization that Intel has under it's belt and likely won't see benefit from again. What we will see is PROCESS related improvements yielding technology that can be utilized for better IPC or higher clocks at the same IPC or lower power consumption for the same IPC. Until we get to quantum computing, we'll continue to see these improvements.

As we move along that path, we may reach a point where the current core pipeline is no longer optimum. It is at this point when we will see a uarch change, but I think there are enough checks and balances in the R&D process at Intel that it won't get to the same scale of massive mis-optimization that Netburst did.

I think we can expect to see small IPC or clock improvements going forwards. I think even power optimization is going to be pretty small post Haswell. Haswell took the last two "big" steps:
- SoC integration at the ULV level, finally moving the chipset to 32nm from 65nm on the desktop level
- Integration at the OS level to execute routine interrupts in 'chunks' to allow the CPU longer "micro-naps"

All levels of optimization are to the point where nothing's easy. Intel is at the point where large R&D efforts are resulting in small gains. I don't really see that changing unless we have a shift in technology that results in a new macro-level s-curve.

When will we see a completely new Intel uarch (similar to Netburst->Core jump)?

Lifer

Golden Member

Lifer

Diamond Member

Lifer

Golden Member

Diamond Member

Lifer

Lifer

Elite Member

Platinum Member

Lifer

Platinum Member

Lifer

Platinum Member

Senior member

Platinum Member

Senior member

Platinum Member

Senior member

Platinum Member

Diamond Member

Senior member

Diamond Member

Diamond Member