Intel Starts Production of Next-Generation Haswell Microprocessors.

ShintaiDK · Jan 26, 2013

Abwx said:
Getting a monopoly in server and why not in PCs later if ever
the architecture went successfull , with no more X86 licencees
to compete with.

Yes, that must be the sole only reasons. Because they got so much x86 competition

It was certainly not to remove some of the biggest power and performance restains of x86.

VirtualLarry · Jan 26, 2013

ShintaiDK said:
Yes, that must be the sole only reasons. Because they got so much x86 competition

Certainly is a compelling reason. The reason Verizon came out with FIOS, was so that they could own a network that was not subject to telco regulations.

Abwx · Jan 26, 2013

ShintaiDK said:
Yes, that must be the sole only reasons. Because they got so much x86 competition

It was certainly not to remove some of the biggest power and performance restains of x86.

Develloppement started in the early 90s at a time when there
was about ten X86 manufacturers , even Texas Instrument
or STMicro were fabbing thoses , i guess that IDC has some
memories lefts from this epic era.

As for perfs it could do 4 flops/cycle , a rate largely exceeded
by current X86 CPUs.

pm · Jan 26, 2013

cytg111 said:
- Right, support/emulation but at what speed? But that is really a side-point, the primary point i was trying to convey was : What about IA64's architecture makes it, in principle, superior to x86/amd64? Intel must have had a reason to create the damned chip, yet we hear everywhere that everything that makes x86 x86 really has so little impact on performace and die area, that it does not matter. The nature of x86 is not holding x86 back. So, still confused .

I worked on Itanium for about 12 years - I designed circuitry for McKinley, Montecito, Tukwila and Poulson and then post-silicon debug of most of them as well.

My understanding of the key reasons for Itanium were that it would use the latest-and-greatest architectural and compiler techniques to improve performance - and people in this thread have mentioned this - but the other reason as I see it that I rarely see people mention is that it was supposed to be server-class CPU built from the ground up to be a server CPU to allow Intel to get into the high end server market. You might reply, "but, hey, Patrick, that makes no sense because Xeon sells really and it's an IA32-based CPU so why would Intel think it needed to make a special CPU like Itanium to get into servers?" but that's looking back from the position of seeing how history turned out.

At the time in the early (and mid) 1990's, the server market was dominated by RISC - the supposed successor to CISC (ie. a name for the type of instruction set in Intel's IA32 CPU line) - and Intel was having a very hard time selling into the server computing segment. It's hard even for me to remember but servers were dominated by RISC CPU's running on proprietary platforms with proprietary OS's (Solaris, HP-UX, OpenVMS) Itanium, as I see it, came out of an idea of trying to move into that segment by having an even more advanced architecture than RISC, both from a perception standpoint ("RISC is so yesterday..."), but I know as one of the people who worked on it, we thought it was going to be much more than just perception. So it was designed to be a server part and this would enable Intel to move into servers in partnership with several leading server vendors and it was planned to be a very high-performance part.

As Nemesis said, Itanium sales are still respectable. As an engineer, I enjoyed working on Itanium and I remember the first power-on of the Intel 9300-series as being one of the coolest things that I've done in my whole life. The recent launch of the Intel Itanium 9500-series (Poulson) in Nov. 2012 is proof that despite everyone saying it's dead... it's still very much alive. But perhaps I think even the engineers who worked on it will admit that it didn't quite turn out like we thought that it would.

Back to the original question about what in theory makes IA64 superior to x86, well, things that I can think of off the top of my head: there are 6 semi-dedicated registers in x86, and there's 128 general purpose registers in IA64. So in x86 you are limited in what you can do in each of the registers so when you are writing assembly code, you have to move things to different registers to do different instructions, but in IA64 you have lots of them and you can do anything you want in any of them. x86 has semi-restrictive stack-based floating point registers, while IA64 has 128-bit non-stack-based registers. So you have to load and unload the stack in x86, but in IA64 you don't have to do anything like this. Then IA64 has this concept of an instruction bundle which makes it easier to write code that is meant to run in parallel by design, it has a bunch of stuff to help branch predictors be more accurate, it has stuff to help preload memory so than you can put things into memory before you need them (parallelize the loads), and there's things to run parts of both branches so that you don't suffer branch prediction penalties as much. Some of this stuff has been added into x86 in newer instruction sets like SSE and new instructions on newer processors, so to say x86 doesn't have these isn't strictly true any more, but at the time that IA64 was conceptualized, x86 didn't have many of these things.

* As always my views are my own and I'm not a corporate spokesperson for Intel *

NTMBK · Jan 27, 2013

ShintaiDK said:
Yes, that must be the sole only reasons. Because they got so much x86 competition

In the Opteron -> QPI era, Intel had very serious x86 competition in servers. Especially once you go beyond 2 sockets, the old single FSB topology Intel used really bottlenecked performance, and NUMA made a significant difference. Those days are long passed, of course.

guskline · Jan 27, 2013

Great post pm. Very informative.

CHADBOGA · Jan 27, 2013

NTMBK said:
In the Opteron -> QPI era, Intel had very serious x86 competition in servers. Especially once you go beyond 2 sockets, the old single FSB topology Intel used really bottlenecked performance, and NUMA made a significant difference. Those days are long passed, of course.

That has nothing to do with why Intel decided to make Itanium.

The decision to make Itanium occurred years before the "Opteron -> QPI era".

Pia · Jan 27, 2013

pm said:
I worked on Itanium for about 12 years - I designed circuitry for McKinley, Montecito, Tukwila and Poulson and then post-silicon debug of most of them as well.

Whoa.

A question, if you don't mind. Off the top of your head, how much do you think a general purpose CPU with a wide range of workloads (development, content creation, ...) could gain compared to X86, if a new instruction set was designed from ground up with zero regard for backwards compatibility?

(Assuming single threaded code, non-intrusive, Joe Programmer not required to restructure his code to access the new performance. I recognize the future is parallel and asymmetric, and the biggest potential gains are almost definitely locked behind intrusive changes, but I don't know that stuff well enough to ask anything sensible about it.)

tweakboy · Jan 27, 2013

Guys Haswell enthusiast CPUs are coming out in 2014.

2013 is Haswell budget and haswell for laptops .......

The fastest chip of 2013 will be Ivy Bridge E 12 core version if they come out with one. I do know there will be 6core and 8core versions Ivy E in 2013 Q3 ....... Shipment of Haswell has been pushed back. You cantn expect to release a Haswell and a Ivy E chip at same time. So if Ivy is Q3 ,, then either haswell is pushed to Q4 or it will be Q1 2013 like everyone knows it........ But dont expect anything faster then a Ivy E for 2013.

Charles Kozierok · Jan 27, 2013

Itanium is still alive, but only because Hewlett Packard is doing non-stop CPR on it. If not for HP, I feel fairly confident that IA-64 would have been scrapped sometime in the last few years.

pm: Enjoyed reading your insights from the inside. Thanks.

NTMBK · Jan 27, 2013

CHADBOGA said:
That has nothing to do with why Intel decided to make Itanium.

The decision to make Itanium occurred years before the "Opteron -> QPI era".

Oh, I wasn't trying to imply that it was, don't worry.

Nemesis 1 · Jan 27, 2013

I think other than the gpu power the low power mobiles will have. The biggest surprise with haswell will be it multi-threaded capabilities. looking at the changes made to the core I believe HT will improve by as much as 20% maybe more maybe less. Intel has been careful about making performance claims in referring only to them as single threaded and equally clocked. Also the latest rumors about GT3e have it 3x faster than HD 4000. For me the era of big watercooled gaming systems is coming to an end with haswell. Broadwell has the possiability of increasing Igpu performance up anther 50% over haswell. But I don't expect a cpu performance change. It would be great IF GT3e is actually 3x faster than hd 4000. But i be happy to see 2x faster myself if its more all the better.

Idontcare · Jan 27, 2013

Nemesis 1 said:
The biggest surprise with haswell will be it multi-threaded capabilities. looking at the changes made to the core I believe HT will improve by as much as 20% maybe more maybe less.

With existing compiled apps or are you talking about apps that are recompiled to take advantage of new hardware features in Haswell?

grimpr · Jan 27, 2013

I agree with Nemesis and Atenras prediction that Hyperthreading will be more powerful in Haswell, Haswell Core i3's will benefit the most from it.

NTMBK · Jan 28, 2013

Idontcare said:
With existing compiled apps or are you talking about apps that are recompiled to take advantage of new hardware features in Haswell?

I'd imagine with existing apps- the wider execution resources mean it will be harder for a single thread to hit full CPU utilisation, leaving more "spare" capacity for hyperthreading to use.

VirtualLarry · Jan 28, 2013

If the course-grained BCLK overclocking is to be believed with Haswell, then the i3 chip (dual-core with hyperthreading), and the improved hyperthreading, will be a really nice chip. Especially if it is possible to OC 67%. (3.0-3.1Ghz to 5+Ghz?)

ShintaiDK · Jan 28, 2013

VirtualLarry said:
If the course-grained BCLK overclocking is to be believed with Haswell, then the i3 chip (dual-core with hyperthreading), and the improved hyperthreading, will be a really nice chip. Especially if it is possible to OC 67%. (3.0-3.1Ghz to 5+Ghz?)

Only applies for K models. And the reaons is new multiplier ratios. 1.25x and 1.66x.

FlanK3r · Jan 28, 2013

So, do you think straps?

cytg111 · Jan 28, 2013

pm said:
I worked on Itanium for about 12 years - I designed circuitry for McKinley, Montecito, Tukwila and Poulson and then post-silicon debug of most of them as well.

My understanding of the key reasons for Itanium were that it would use the latest-and-greatest architectural and compiler techniques to improve performance - and people in this thread have mentioned this - but the other reason as I see it that I rarely see people mention is that it was supposed to be server-class CPU built from the ground up to be a server CPU to allow Intel to get into the high end server market. You might reply, "but, hey, Patrick, that makes no sense because Xeon sells really and it's an IA32-based CPU so why would Intel think it needed to make a special CPU like Itanium to get into servers?" but that's looking back from the position of seeing how history turned out.

At the time in the early (and mid) 1990's, the server market was dominated by RISC - the supposed successor to CISC (ie. a name for the type of instruction set in Intel's IA32 CPU line) - and Intel was having a very hard time selling into the server computing segment. It's hard even for me to remember but servers were dominated by RISC CPU's running on proprietary platforms with proprietary OS's (Solaris, HP-UX, OpenVMS) Itanium, as I see it, came out of an idea of trying to move into that segment by having an even more advanced architecture than RISC, both from a perception standpoint ("RISC is so yesterday..."), but I know as one of the people who worked on it, we thought it was going to be much more than just perception. So it was designed to be a server part and this would enable Intel to move into servers in partnership with several leading server vendors and it was planned to be a very high-performance part.

As Nemesis said, Itanium sales are still respectable. As an engineer, I enjoyed working on Itanium and I remember the first power-on of the Intel 9300-series as being one of the coolest things that I've done in my whole life. The recent launch of the Intel Itanium 9500-series (Poulson) in Nov. 2012 is proof that despite everyone saying it's dead... it's still very much alive. But perhaps I think even the engineers who worked on it will admit that it didn't quite turn out like we thought that it would.

Back to the original question about what in theory makes IA64 superior to x86, well, things that I can think of off the top of my head: there are 6 semi-dedicated registers in x86, and there's 128 general purpose registers in IA64. So in x86 you are limited in what you can do in each of the registers so when you are writing assembly code, you have to move things to different registers to do different instructions, but in IA64 you have lots of them and you can do anything you want in any of them. x86 has semi-restrictive stack-based floating point registers, while IA64 has 128-bit non-stack-based registers. So you have to load and unload the stack in x86, but in IA64 you don't have to do anything like this. Then IA64 has this concept of an instruction bundle which makes it easier to write code that is meant to run in parallel by design, it has a bunch of stuff to help branch predictors be more accurate, it has stuff to help preload memory so than you can put things into memory before you need them (parallelize the loads), and there's things to run parts of both branches so that you don't suffer branch prediction penalties as much. Some of this stuff has been added into x86 in newer instruction sets like SSE and new instructions on newer processors, so to say x86 doesn't have these isn't strictly true any more, but at the time that IA64 was conceptualized, x86 didn't have many of these things.

* As always my views are my own and I'm not a corporate spokesperson for Intel *

- Thanks, most informative, clears things up a bit .

Nemesis 1 · Jan 28, 2013

Idontcare said:
With existing compiled apps or are you talking about apps that are recompiled to take advantage of new hardware features in Haswell?

I think with recompiled apps the gains will be much higher. No I think on existing apps . Intel this whole haswell prelaunch has been strange. I know they only talk so far about single threaded performance gains . I think intel has been sandbagging haswell. Take the GT3. If its 40 eu at slower freq. With a 125mb of edram . I personnelly think the gain should be 4x HD 4000 at its BEST, 40 haswell EU agaist 16 HD4000 eu . Ya I can see 125mb edram on wide bus Pushing 3x as a normal performance gain

mikk · Jan 28, 2013

Surely not. Expect 50-100% over HD4000.

Nemesis 1 · Jan 28, 2013

mikk said:
Surely not. Expect 50-100% over HD4000.

Ya, Thats about right until edram enters the picture. Than the picture changes. I am sure the 125mb of edram will add another2x

CHADBOGA · Jan 28, 2013

Nemesis 1 said:
Ya, Thats about right until edram enters the picture. Than the picture changes. I am sure the 125mb of edram will add another2x

This would mean that the performance increase would be greater than what Intel is suggesting for best case scenario.

How does one work out what boost the E-Dram brings?

SiliconWars · Jan 28, 2013

If you listen to the video showing Dirt 3, it was run without AA. This is the best way to hide bandwidth issues and a tiny amount of eDRAM won't help in this case.

So we can expect to see a lot of this from Intel in the coming months - the question is how easily the tech press will fall for it. I can guarantee you now that with AA at 4x (maybe even 2x will be enough), the 650m will absolutely destroy GT3e.

Nemesis 1 · Jan 28, 2013

CHADBOGA said:
This would mean that the performance increase would be greater than what Intel is suggesting for best case scenario.

How does one work out what boost the E-Dram brings?

Intel never told us benchmarks or performance gains for edram chips.They only talked about gains without edram . They did however show edram chips running on 2 differant occasions. As I said Intels is being very hush hush/ With edram it would be easy to expect 2x improvement over HD4000. Like on a ulv with GT3. expect only 30% gains . That sounds reasonable. Until you strap 125mb of edram on the package threw a wide buse. If we know what the bandwidth is which some have figured . Than the performance should be easy to guess.GDDR5@4000ghz. Than the most important thing the gain from being on package

Intel Starts Production of Next-Generation Haswell Microprocessors.

Lifer

No Lifer

Lifer

Elite Member Mobile Devices

Lifer

Diamond Member

Platinum Member

Golden Member

Diamond Member

Elite Member

Lifer

Lifer

Elite Member

Golden Member

Lifer

No Lifer

Lifer

Senior member

Lifer

Lifer

Diamond Member

Lifer

Platinum Member

Platinum Member

Lifer