Intel "Haswell" Speculation thread

Page 13 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Blitzvogel

Platinum Member
Oct 17, 2010
2,012
23
81
Would be cool to see 512 bit wide FPUs in Haswell that could dynamically work to aid the IGP.
 

BenchPress

Senior member
Nov 8, 2011
392
0
0
Would be cool to see 512 bit wide FPUs in Haswell that could dynamically work to aid the IGP.
Haswell's new AVX2 instructions are 256-bit wide. But there are two 256-bit floating-point units per core, so they could definitely use them to assist the IGP.
 

Blitzvogel

Platinum Member
Oct 17, 2010
2,012
23
81
Haswell's new AVX2 instructions are 256-bit wide. But there are two 256-bit floating-point units per core, so they could definitely use them to assist the IGP.

Wow, 2x 256 bit units? That's interesting! Didn't AMDs Flex-FPU design give BD a general advantage in 128 bit SSE versus Sandy Bridge?
 

BenchPress

Senior member
Nov 8, 2011
392
0
0
Wow, 2x 256 bit units?
Three 256-bit units actually. Two of which capable of floating-point operations.

In fact this has been the case since Sandy Bridge, but with Haswell each unit also becomes capable of specific 256-bit integer operations, and the ones that can also perform floating-point operations get support for fused-multiply-add (FMA) instructions. Gather support will also play a major role in maximizing the effective throughput of these three 256-bit units.
Didn't AMDs Flex-FPU design give BD a general advantage in 128 bit SSE versus Sandy Bridge?
Bulldozer has four 128-bit units per module, two of which capable of floating-point operations (including FMA). Having an extra unit didn't make Bulldozer noticeable more powerful at 128-bit vector code though, because the floating-point units don't support any integer operations. Intel managed to do more, with less.

I'm sure AMD's next major architecture will be less disappointing though. But that's another topic...
 

Sweepr

Diamond Member
May 12, 2006
5,148
1,142
131
SB is 13.2% faster than Lynnfield in non-AVX workloads according to Hardware.fr (11.3% with HT on). Tom's Hardware also shows >10% IPC improvement here: http://media.bestofmicro.com/I/O/298752/original/overall.png

If Intel manages a consistent >10% IPC increase over SB (at least 5-10% over IB) with higher clocks and higher OC potential (3.7-4.0GHz quads that OC above 5GHz on <U$100 cooling solutions) it will convince me to finally replace my Bloomfield system. The GT3 GPU will be a very welcome addition on the mobile side too.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
They are probably aiming for another ~20% gain that's in combination with clock speed and perf/clk increases. Caveat is that they are talking about in mobile terms and may not apply to desktops. For example, desktop chips may only gain perf/clk increase part, not the clock frequency increase.

Heck SB got 2x the load/store BW and is around ~8% faster at the same clock vs Nehalem in non-AVX workloads.

Core i7 975 vs Core i7 2600K

Sysmark: 9%
Adobe Photoshop: 23.9%
DiVX: 6.5%
X264 HD Encode Test 1st: 15.7%
X264 2nd: 10%
WME64: 15%
3DSMax: 11.7%
CB ST: 28.9%
CB MT: 12%
Par 2 MT: 38.7%
Blender: 17.2%
MS Excel: 8.1%
Sorenson Squeeze Pro 5: 23.8%
WinRar: 24.2%

That's a lot better than 8%, unless you want to cherry pick them. Not even considering the direct predecessor is Lynnfield, where it is few % behind the Bloomfield.
 

BenchPress

Senior member
Nov 8, 2011
392
0
0
Core i7 975 vs Core i7 2600K

...

That's a lot better than 8%, unless you want to cherry pick them.
i7-2600K has a 2% higher base clock and 5.5% higher turbo clock...

Anyway, the point is that increasing IPC isn't easy and there are diminishing returns. Making Haswell process 20% more instructions per clock on average is downright impossible without severe compromises on other critical metrics. Meanwhile AVX2 doubles the maximum throughput per instruction, and TSX prevents losing performance and power in spin loops for multi-threaded synchronization.

So it will be a great chip if IPC remains the same. It will be a phenomenal chip if IPC improves by 10%.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
i7-2600K has a 2% higher base clock and 5.5% higher turbo clock...

That doesn't matter. Both 975 and 2600K can Turbo reliably for sustained periods of time. 975 can Turbo to 3.46GHz on all 4 cores and 2600K can do 3.5GHz. 6% single core frequency difference in light of a 28% performance difference is negligible(did I also forget to say clock speed scaling isn't linear?). They did that while reducing TDP by 35W and having a cheaper platform courtesy of LGA1155.

So it will be a great chip if IPC remains the same. It will be a phenomenal chip if IPC improves by 10%.
Fairly certain for most of us here 0% perf/clk gain will equal mediocre to bad. Especially its lot likely than ever before we'll get mere 100MHz frequency increases if at all. But I'm glad we'll get Sandy Bridge like IPC gains again.

(Actually I heard desktop TDP is going up to 105W, likely to extract more clocks)
 
Last edited:

nehalem256

Lifer
Apr 13, 2012
15,669
8
0
Would be cool to see 512 bit wide FPUs in Haswell that could dynamically work to aid the IGP.

I do not think that would be very helpful



Consider the size of the GPU portion to the size of one of the modules, of which only a part is the FPU.

Also, add in that the GPU is primarily constrained by bandwidth anyway.

And yes I realize that is Trinity not ivy bridge/haswell, but it does illustrate the idea.
 

BenchPress

Senior member
Nov 8, 2011
392
0
0
Fairly certain for most of us here 0% perf/clk gain will equal mediocre to bad.
I didn't say 0% perf/clock gain! I said it would still be a great chip if IPC didn't improve, and IPC is only one component of performance. Peak throughput per clock will double, and fewer cycles will be wasted on thread synchronization.
 

BenchPress

Senior member
Nov 8, 2011
392
0
0
Consider the size of the GPU portion to the size of one of the modules, of which only a part is the FPU.
And yet with AVX2 the computational density doesn't differ a whole lot...

What you're looking at on the GPU isn't all FPU. A big portion of it is the humongous register files it needs to store temporary results and cover latencies. And of course there's also a big difference in clock speed. Last but not least, peak performance isn't effective performance. GPUs can behave quite 'bursty' at times, and slow down to a crawl with complex workloads, while CPUs remain efficient in many situations.

A homogeneous 16-core 14 nm successor to Haswell could achieve 2 TFLOPS, which is plenty for graphics. The bigger problem is memory bandwidth, but L4$ eDRAM could offer a solution.
 

BenchPress

Senior member
Nov 8, 2011
392
0
0
For those still doubting the importance of AVX2 for next-gen gaming, here's this year's SIGGRAPH papers on GPU + CPU computing: http://bps12.idav.ucdavis.edu/

The CPU plays a very significant role in these new algoriths, and so having twice the computing power (AVX2), and efficient multi-threading (TSX) will have a real impact on your gaming experience.
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
For those still doubting the importance of AVX2 for next-gen gaming, here's this year's SIGGRAPH papers on GPU + CPU computing: http://bps12.idav.ucdavis.edu/

The CPU plays a very significant role in these new algoriths, and so having twice the computing power (AVX2), and efficient multi-threading (TSX) will have a real impact on your gaming experience.

lol, is possible to die by hype? BenchPress is near of doing it XD

dude...
there is nothing about AVX2, TSX in the link...
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
"And no one believed me about Manbearpig, even though I was super cereal." The hype is approaching that level.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
lol, is possible to die by hype? BenchPress is near of doing it XD

dude...
there is nothing about AVX2, TSX in the link...

Dont worry, plenty more AX2/TSX posts and threads will come.

Atleast one post didnt include the 2. I was surprised
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
lol, is possible to die by hype? BenchPress is near of doing it XD

dude...
there is nothing about AVX2, TSX in the link...
Is that a fact . Whats he suppose to do . Lead ya bye the nose . Heres just 1 0f many many statements that I got threw that link . 1 mans junk is anothers wealth.

In the UE4 Elemental demo, the majority of the GPU’s FLOPS are going into general compute algorithms, rather than the traditional graphics pipeline...”
–Tim Sweeney
 

ShadowVVL

Senior member
May 1, 2010
758
0
71
I haven't really paid any attention to avx instructions or what they do.

I will probably google it later but what applications can/will use avx?
 

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
20,882
3,230
126
*dear intel*

i would like to replace my almost 2yr old 990x with something faster, and have it be a true upgrade and not side grade.

so when will my wish come true?


PC's have become so stagnet, i can see the spiders make complex webs outside.
 

mrob27

Member
Aug 14, 2012
29
0
0
www.mrob.com
I haven't really paid any attention to avx instructions or what they do.

I will probably google it later but what applications can/will use avx?

This is what I found amongst the programs I care about :sneaky: and assuming that benchmarks don't count:

  • FFmpeg (ffmpeg.org) starting with version 0.8
  • SETI@home starting with beta v7 6.94 (setiathome.berkeley.edu/beta)
  • Great Internet Mersenne Prime Search (GIMPS) version 27 (mersenne.org)

As for Folding@home, I only found something called "Native Folding Image", which added a Sandy Bridge AVX kernel at version 1.2.0 (linuxforge.net/docs/crunching/fah-native.php). It appears that in F@h a lot of people use something like CUDA or the Intel compiler and compile their own kernel to get the most out of their high-end rigs. Those guys are crazy competitive.
 

OCGuy

Lifer
Jul 12, 2000
27,227
36
91
*dear intel*

i would like to replace my almost 2yr old 990x with something faster, and have it be a true upgrade and not side grade.

so when will my wish come true?


PC's have become so stagnet, i can see the spiders make complex webs outside.

To be fair to Intel, aren't they kind of stagnated by lack of competition and lack of programers taking advantage of their advanced technology?

I mean thank (whatever power you believe in) that they are pushing ahead with the AZ plant....they could easily just re-fresh the same crap at higher base clocks if they really wanted to.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
To be fair to Intel, aren't they kind of stagnated by lack of competition and lack of programers taking advantage of their advanced technology?

Lack of competition? No.

Lack of software? Somewhat.

If a new CPU doesnt offer something worth upgrading to, Intel sells no CPUs. No sales, no revenue, no profit.

But I dont see them being stagnated.
 

Arzachel

Senior member
Apr 7, 2011
903
76
91
To be fair to Intel, aren't they kind of stagnated by lack of competition and lack of programers taking advantage of their advanced technology?

I mean thank (whatever power you believe in) that they are pushing ahead with the AZ plant....they could easily just re-fresh the same crap at higher base clocks if they really wanted to.

While the desktop CPU market seems more stagnant now, it's due to most software not being able to use all that processing power we have now, thus the demand for faster CPUs is pretty low. Intel has to keep selling chips to recoup their investment in process tech and fabs but instead of shoveling more money into desktop CPUs that are hitting diminishing returns pretty badly, they turn to laptops and tablets that see far more growth and can't be maintained just by swapping out the GPU every year or two.
 

james1701

Golden Member
Sep 14, 2007
1,873
59
91
*dear intel*

i would like to replace my almost 2yr old 990x with something faster, and have it be a true upgrade and not side grade.

so when will my wish come true?


PC's have become so stagnet, i can see the spiders make complex webs outside.

+1
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |