Intel "Haswell" Speculation thread

Page 25 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

mrob27

Member
Aug 14, 2012
29
0
0
www.mrob.com
Have a look at the memory controllers bellow the L3 Cache, the empty space is smaller than IvyBridge GT2 (HD4000) from the pic above, that only makes the die much smaller and thus not GT3 but probably GT1.

About that empty space in Ivy Bridge: Refer to the Ivy Bridge die sizes article again.

The "biggest" Ivy Bridge is the HE-4, 8.141 x 19.361 mm and has 8MB of cache. The Ivy Bridge HM-4 still has 4 cores, but only 6MB of cache, and is only 7.656 mm wide, about 1/2 millimeter smaller. Clearly they slice a bit out of each piece of L3 cache, which makes the chip narrower. It also has GT1 graphics, so as this post suggested, the GPU gets a lot smaller. For the dual-core H-2 and M-2 variants, the empty space is used to accommodate the dual-channel memory controller.

I don't completely agree with the photos in that post, clearly IntelUser2000 cut the L3 cache section in half, when actually it needs to be cut by 25%. If you look at the Sandy Bridge image from AtenRa's post:



You can see pretty easily the "dead space" underneath the ring bus stop in each of the four L3 cache slices, and that dead space is exactly 1/4 the height of the cache section. For Ivy it's similar, most easily seen in this image from Intel Sweden:



But the rest of IntelUser2000's analysis is pretty accurate and he predicted the die sizes pretty well.

Anyway...

...for Haswell it's a different game. They'll definitely cut the die in different ways and use different layouts, but suppose they might have a variant with less cache and the same GPU? Then they could use that "dead space" to accommodate the GPU.

I believe that in the maybe-Haswell-wafer image we're discussing, the GPU occupies the full width of the die, and that's why the visible dead space is so small.
 
Last edited:

mrob27

Member
Aug 14, 2012
29
0
0
www.mrob.com
There is one other Haswell die photo that appeared in news posts at about the same time. Here it is:


I found it in this article on newelectronics.co.uk

My analysis of this image is the same as the one in the BBC article: 4 cores with a new layout, GPU with 20 shader cores and a different fixed-function layout.

The nice thing about this image is that it makes it a bit easier to see the GPU layout is different from Ivy Bridge, and it's easier to see the dead space below the ring-bus stops in the L3 cache area.

We can also see more clearly that the space allocation within the L3 cache slices is different: only about 55% of each L3 section is taken up by the actual 2MB memory array, whereas in Ivy Bridge the memory takes up 65 or 70%. It's hard to see what that extra stuff is, but I suppose others might speculate... :whiste:
 

mrob27

Member
Aug 14, 2012
29
0
0
www.mrob.com
I forgot how many of these I scraped up last week. Here's another one:



which is in several blogs and re-posts apparently originating with this post by Intel on their Google+ page and dated 2012 Sep 13th.

This has the same new core layout, the GPU with 20 execution units, L3 cache with the memory banks taking up only about half of the space, etc.

Another new feature seen in this photo and in both of the others, and not seen in IVB and SNB dies, are the prominent "lines" between the cores. In the first "pin" image they're black; in the in-focus part of this image they're orange/yellow/green.

The odd-numbered lines (those between cores 0-1 and between cores 2-3) are a bit longer, extending almost all the way to the DDR3 controller. No such lines are seen in any of the Ivy or Sandy die photos I've looked at.
 

mrob27

Member
Aug 14, 2012
29
0
0
www.mrob.com
Not sure why but Wired seems to think this image of Haswell that we have been talking about is actually just an IB die?

(image here)

Wow... that image is available in a 12-megapixel version! It is at this Flicker post by "IntelFreePress" dated 2012 Sep 10th. On the Flickr page, use the "View all sizes" link to access the huge version.

image removed

Let's take pity on the bandwidth challenged, shall we?
admin allisolm


Sorry! - mrob27​

This is obviously not a Sandy bridge or Ivy Bridge. There is a lot to look at here... I'll try to upload a labeled collage soon...
 
Last edited:

TuxDave

Lifer
Oct 8, 2002
10,572
3
71
Am I the only person zooming in on the core to figure out which chip that is? The last set of pictures make it pretty clear what I'm looking at.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
I don't completely agree with the photos in that post, clearly IntelUser2000 cut the L3 cache section in half, when actually it needs to be cut by 25%. If you look at the Sandy Bridge image from AtenRa's post:

You are right. Its cut more than I wanted to. But I figured it delivers the general idea how the cores are derived. I also agree its a GT2 part for the Haswell GPU. The GT3 replicates almost EVERYTHING, so it can't be that small.
 

mrob27

Member
Aug 14, 2012
29
0
0
www.mrob.com
[...] I figured it delivers the general idea how the cores are derived. I also agree its a GT2 part for the Haswell GPU. The GT3 replicates almost EVERYTHING, so it can't be that small.

Yep. Your analysis was great, by the way, and gave us a lot to think about months before we had the hard numbers on Ivy Bridge. Also, you gave great Haswell predictions which still look pretty good (we have nothing solid yet on the 4c+GT3 variant, but the square die seems likely)
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
5 rows, 20 EUs, that seams to be GT2

Also, it seams that this layout was not designed for scaling down by removing cores. The iGPU is stretched from top to bottom without leaving empty space bellow it like in SB and IB. This is one reason why the iGPU size is narrower than the iGPU in IvyBridge although it has more EUs.


 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
5 rows, 20 EUs, that seams to be GT2

Also, it seams that this layout was not designed for scaling down by removing cores. The iGPU is stretched from top to bottom without leaving empty space bellow it like in SB and IB. This is one reason why the iGPU size is narrower than the iGPU in IvyBridge although it has more EUs.

Interesting. So it looks like at least GT1 variants will be GT2 with disabled parts.
 

mrob27

Member
Aug 14, 2012
29
0
0
www.mrob.com
Last edited:

BenchPress

Senior member
Nov 8, 2011
392
0
0
If someone here is actually Nicolas Capens, please note I mean no disrespect.
That would be me. But why would I feel disrespected? It's all just theories and I would rather be wrong so that Haswell has no noteworthy compromises!

However, my response to David and the many other discussions in the last couple days might clarify that's it's not exactly obvious what the implications are of Haswell's wide architecture. There could still be some surprises. Most people at RWT didn't anticipate gather support and dual FMA to be feasible or likely... I sure don't mean to disrespect anyone for bringing that up. It's just that when you discuss things at this level of detail, being wrong happens to the best of us. Lastly, it's also easier to argument against someone else's theories, than to come up with your own and defend them. But I've learned a lot by thinking outside the box. So again, I won't feel bad about being wrong.
I wish RWT had nice forums like AT. Digging through them is a PITA!
Amen to that.
One thing is clear to after reading many of these discussions, I'll have a heart attack if I ever need to understand the x86-64 ISA well enough to program @ the assembly level. No wonder AVX was under utilized.
Actually the hardware's out-of-order scheduling is insanely powerful nowadays. So you typically don't have to worry about reordering your instructions, and you can just concentrate on using as few as possible and avoiding the expensive ones. Especially with vector instructions it's easy to still beat the compilers. With AVX2 the compilers can successfully auto-vectorize a lot more code though. So the days of being able to speed up code by writing assembly are probably numbered, unless indeed you really delve into the finer details of how specific CPU architectures behave.

That said, knowing how to program in assembly is still incredibly useful to know how to write fast code in a high-level language. It's also invaluable for debugging.
Now I'm almost rooting for ARM, so we eventually get some nice high powered RISC CPUs again in the mainstream** :thumbsup:
For the sake of getting some competition, or because you think RISC is that much easier to program?
**I may go back to embedded development after completing my M.S.C.S. next year; fortunately most of that is still RISC based!
Awesome, good luck!
 

mrob27

Member
Aug 14, 2012
29
0
0
www.mrob.com
I noticed that the Intel Sweden version Ivy Bridge die photo shows a slightly different layout (for example, in the size of the gap between the cores and the L3 cache) and also has fewer interconnect layers. Compare it to the official Ivy Bridge die photo from April 2012, and you can see that the L3 cache memory blocks are much more symmetrical in the "Intel Sweden" photo. In the April photo, parts of the L3 cache are covered by something else: metal interconnect layers. My guess is that that yellow section in each L3 cache slice is part of the connection between the core's L2 cache and the L3 cache's controller.

To my eye, these recent Haswell wafer photos look more like the earlier Ivy Bridge photo, in that the cache blocks and other architectural features are more recognizable. So I've made a new labeled die photo comparing to both of the Ivy Bridge photos:

 
Last edited:

craiggloyd

Member
Jul 1, 2011
33
7
81
Intel Ivy Bridge still isn't over 50% faster clock-clock than a 3MB/core Penryn CPU.

For instance my X9100 on my G50VT laptop overclocked to 3.5GHZ , 1.325V (!00% stable),
Super pi: 1m: 14.7s 2m: 34s

Ivy Bridge i5-3210, @3.1GHZ stock turbo;
Super pi: 1m: 12.7s 2m: 29s



So my 2009 laptop is still decent with real world responsiveness Rough estimate of ~33-50% faster clock for clock with floating point than Penryn. Of course with multitasking/SMP it would be pwned.

From reading some of the Haswell reviews, it is only an improvement of about 10% clock-clock versus Ivy bridge, so that means 46 to 65% clock-clock single threaded improvement versus Penryn.
Meh, still not worth the upgrade.
I understand the new instructions are of a benefit, but in the real world with mainstream software, responsiveness is clock-clock computational power.

Tasks such as viewing and editing large complex word documents, large PDFs, even explorere.exe when deleting a lot of files off of an SSD or searching for files is often CPU pegged with 1 core, lots more examples.

I guess I'll wait for Skymont with crazy integrated graphics
 

MrDudeMan

Lifer
Jan 15, 2001
15,069
92
91
Intel Ivy Bridge still isn't over 50% faster clock-clock than a 3MB/core Penryn CPU.

For instance my X9100 on my G50VT laptop overclocked to 3.5GHZ , 1.325V (!00% stable),
Super pi: 1m: 14.7s 2m: 34s

Ivy Bridge i5-3210, @3.1GHZ stock turbo;
Super pi: 1m: 12.7s 2m: 29s



So my 2009 laptop is still decent with real world responsiveness Rough estimate of ~33-50% faster clock for clock with floating point than Penryn. Of course with multitasking/SMP it would be pwned.

From reading some of the Haswell reviews, it is only an improvement of about 10% clock-clock versus Ivy bridge, so that means 46 to 65% clock-clock single threaded improvement versus Penryn.
Meh, still not worth the upgrade.
I understand the new instructions are of a benefit, but in the real world with mainstream software, responsiveness is clock-clock computational power.

Tasks such as viewing and editing large complex word documents, large PDFs, even explorere.exe when deleting a lot of files off of an SSD or searching for files is often CPU pegged with 1 core, lots more examples.

I guess I'll wait for Skymont with crazy integrated graphics

The vast majority of "real-world" users have a very different opinion than you. 33-50% faster isn't enough for you, but that's a huge improvement for the majority of users.
 

craiggloyd

Member
Jul 1, 2011
33
7
81
I know the power consumption is much better and multitasking is much better and it has some new instructions and a good onboard GPU whereas a Penryn has no Integrated GPU or NB or memory controller and no 2nd die VRM and higher heat but Haswell's weakness is the single core performance over the generations.
Maybe after Broadwell, when the SB goes into the CPU package, we'll see more improvements because they'll be done integrating more components into the package, unless they are still aggressive with increasing integrated graphics performance a lot more by taking up more space on the die.
 

Revolution 11

Senior member
Jun 2, 2011
952
79
91
I know the power consumption is much better and multitasking is much better and it has some new instructions and a good onboard GPU whereas a Penryn has no Integrated GPU or NB or memory controller and no 2nd die VRM and higher heat but Haswell's weakness is the single core performance over the generations.
Maybe after Broadwell, when the SB goes into the CPU package, we'll see more improvements because they'll be done integrating more components into the package, unless they are still aggressive with increasing integrated graphics performance a lot more by taking up more space on the die.

Prepare to be disappointed as Intel is not giving up on the integrated graphics. Also, you can't really increase clocks much more as we hit a power/heat wall more than a decade ago in the Pentium 4. Core size can't be increased much more without hitting diminishing returns. Core number can't be increased without hitting Amdahl's Law. IPC is already into diminishing returns unless you use new instructions.

Which has been the primary form of innovation from Penryn to Haswell. Core size is not much bigger, core number is almost the same, clocks are not much faster, IPC is up by a good deal but we can't count on continuous increases, and better memory bandwidth (a one-time trick).

But we do have AVX, AVX2, FMA, and TSX (the latter on certain Haswell SKUs sadly).
 

craiggloyd

Member
Jul 1, 2011
33
7
81
" Core size can't be increased much more without hitting diminishing returns."
But luckily transistors are still shrinking so we can have more powerful execution units in each die shrink? Except all these components being integrated onto the die and separate dies in the package (NB, memory controller, GPU, VRM, SB) are slowing this growth.

Maybe you could help me understand something. Why can't they take all the execution units from a quad core die, and put them into 2 larger cores. Yes, you might have problems keeping all the execution units busy but then each core would be much more powerful.

Am I right that they have increased cores in order to keep execution units more fully utilitized when cpu usage is at 100%, along with hyperthreading to help?

The reason why they don't make dual cores that are as powerful as quad cores is because there is a lack of efficiency?

Because I wouldn't mind giving up a few cores for much better single threaded performance.
 
Last edited:

VirtualLarry

No Lifer
Aug 25, 2001
56,452
10,120
126
" Core size can't be increased much more without hitting diminishing returns."
But luckily transistors are still shrinking so we can have more powerful execution units in each die shrink? Except all these components being integrated onto the die and separate dies in the package (NB, memory controller, GPU, VRM, SB) are slowing this growth.

Maybe you could help me understand something. Why can't they take all the execution units from a quad core die, and put them into 2 larger cores. Yes, you might have problems keeping all the execution units busy but then each core would be much more powerful.

Am I right that they have increased cores in order to keep execution units more fully utilitized when cpu usage is at 100%, along with hyperthreading to help?

The reason why they don't make dual cores that are as powerful as quad cores is because there is a lack of efficiency?

Because I wouldn't mind giving up a few cores for much better single threaded performance.

Making the cores "wider" is limited by the amount of ILP (instruction-level parallelism). Haswell is already 6-wide (or is that 8-wide) as far as execution pipelines goes.

So what you are asking, is already somewhat true in Haswell. They did make the cores "wider".
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |