Thoughts on "8 Core" Bulldozer and "4 Core Sandy Bridge"

Nemesis 1 · Jun 1, 2011

Tuna-Fish said:
I'm having trouble installing it -- apparently it requires windows media encoder 9, which has been discontinued. How did you install it?

Why would I have to install it? If you have an old score I will except you at your word . I just checked I guess i would have to install it again . Are you sure it requires media encoder nine. Ya have to install something extra but I don't recall media encoder 9 . I thought it was net something or the other. Mine at stock settings was 6,023 That at Stock . I don't recall what it was @ 3.59ghz @ 2-2-2-5 at full turbo BH5 . I am the only one to ever run this memory at those setting. I believe it was 219 memory speed. If the math equals that clock thats what I was @. I believe that score was 67+-

You guys here about the MN man who got 1.5 million on his water cooling patent. was in news recently. My friend.

podspi · Jun 2, 2011

386DX said:
I think you're greatly over-estimating the performance of BD imo. a 6-core SB-E is going to kill it at virtually everything.

IMO BD is just AMD bringing the market back to when the Phenom II X6 was introduced:

4-core i7 vs 6-core X6:
Intel >> AMD @ Single (light) thread
Intel < AMD @ Multi-thread

4-core SB vs 6-core X6:
Intel >>> AMD @ Single (light) thread
Intel > AMD @ Multi-thread

4-core SB vs 8-core BD (my prediction)
Intel >> AMD @ Single (light) thread
Intel < AMD @ Multi-thread

6-core SB-E vs 8-core BD (my prediction)
Intel >> AMD @ Single (light) thread
Intel > AMD @ Multi-thread

* >> means significantly greater, > means slightly greater

I would say you are greatly under-estimating BD's performance.

Nehalem is now almost three years old. It literally does not make sense that AMD would design a new architecture with singlethread performance below that of Nehalem. People seem to think that processor design is a type of race (where you have to catch up). Processors are designed, with one of the main constraints being transistor budget, which, at 32nm and willing to accept lower margins than Intel, should be comparable to what Intel is able to achieve, if not higher. Given this, SB will still be faster in lightly-threaded applications, but for multithreaded apps I expect it will at least compete with 6-core SB-E, if not beat it in integer workloads.

Floating point workloads are different. I expect AMD is hoping OpenCL really takes off and the FPU in general becomes less important.

Edrick · Jun 2, 2011

podspi said:
Given this, SB will still be faster in lightly-threaded applications, but for multithreaded apps I expect it will at least compete with 6-core SB-E, if not beat it in integer workloads.

So you are saying that an 8 core, 8 thread BD CPU will beat a 6 core, 12 thread SB-E CPU in multithreaded apps...while you claim the SB will be faster in single threaded apps? I do not follow your logic there.

I also do not agree with your assessment of BD floating point performance. I think this is where BD will shine. Especially when apps are compiled to take advantage of the new instructions like FMA. I have been considering getting a BD for my number crunching server (until Haswell of course).

podspi · Jun 2, 2011

Edrick said:
So you are saying that an 8 core, 8 thread BD CPU will beat a 6 core, 12 thread SB-E CPU in multithreaded apps...while you claim the SB will be faster in single threaded apps? I do not follow your logic there.

I also do not agree with your assessment of BD floating point performance. I think this is where BD will shine. Especially when apps are compiled to take advantage of the new instructions like FMA. I have been considering getting a BD for my number crunching server (until Haswell of course).

http://www.anandtech.com/bench/Product/147?vs=109

In highly threaded and embarrassingly parallel tasks, Thuban can beat Nehalem, even though Nehalem's singlethread performance is much, much higher.

I think the difference between K10 and Nehalem's singlethread integer performance is going to be larger than the difference between Bulldozer and Sandy Bridge's, hence my assertion.

This all falls apart with 256-bit FPU ops, though. I think BD's FP throughput will be around SB's 4-core (same amount of execution units), and thus it will probably lose to SB-E, though it should have the advantage in 128-bit ops.

Nemesis 1 · Jun 2, 2011

Edrick said:
So you are saying that an 8 core, 8 thread BD CPU will beat a 6 core, 12 thread SB-E CPU in multithreaded apps...while you claim the SB will be faster in single threaded apps? I do not follow your logic there.

I also do not agree with your assessment of BD floating point performance. I think this is where BD will shine. Especially when apps are compiled to take advantage of the new instructions like FMA. I have been considering getting a BD for my number crunching server (until Haswell of course).

The market isn't going to rush to develop FMA programms for a processor that has less than 7% of the server market. They will develop for AVX . Until intel says what FMA they use intels has effectively frozen the market here as intel will likely use FMA3 and not FMA4 as AMd uses . I not going to develop anything for 7% market share until intel says what there doing . AMD may go to to the EU complaining intel is stifling innovation again . But that remark by the EU is going to come back at them bigtime.

Bearach · Jun 2, 2011

Nemesis 1 said:
The market isn't going to rush to develop FMA programms for a processor that has less than 7% of the server market. They will develop for AVX . Until intel says what FMA they use intels has effectively frozen the market here as intel will likely use FMA3 and not FMA4 as AMd uses . I not going to develop anything for 7% market share until intel says what there doing . AMD may go to to the EU complaining intel is stifling innovation again . But that remark by the EU is going to come back at them bigtime.

I thought I read somewhere that Intel were going to do FMA4 but changed last minute to FMA3, AMD finally went with FMA4 not long after?

From Wikipedia:

The incompatibility between Intel's FMA3 and AMD's FMA4 is due to both companies changing plans without coordinating coding details with each other. AMD changed their plans from FMA3 to FMA4 while Intel changed their plans from FMA4 to FMA3 almost at the same time. The history can be summarized as follows:

August 2007: AMD announces the SSE5 instruction set, which includes 3-operand fused multiply-add instructions. A new coding scheme (DREX) is introduced for allowing instructions to have three operands.
April 2008: Intel announces their AVX and FMA instruction sets, including 4-operand fused multiply-add instructions. The coding of these instructions uses the new VEX coding scheme which is more flexible than AMD's DREX scheme.
December 2008: Intel changes the specification for their FMA instructions from 4-operand to 3-operand instructions. The VEX coding scheme is still used.
May 2009: AMD changes the specification of their FMA instructions from the 3-operand DREX form to the 4-operand VEX form, compatible with the April 2008 Intel specification rather than the December 2008 Intel specification.

It is currently uncertain whether the 3-operand VEX coded form (here called FMA3) or the 4-operand form (FMA4) will be the dominating standard in the future. It is also possible that future processors will support both forms.

With it being from Wikipedia, I guess we'll have to remember it could be wrong.

Cerb · Jun 2, 2011

Intel17 said:
So I was thinking...it seems to be a concern that it takes 8 AMD BD cores to match/beat 4 SNB cores

How so? It should be a concern whether a BD core can match up, not whether double can, because double won't help enough, in the areas AMD needs to turn a profit. They rae only going for 50% on servers, after all.

but my question is, why is this a big deal? In GPU world, it takes 1536 AMD "Cores" to match 480 NVIDIA "Cores", but nobody cares because when we choose our GPUs, we care about the bottom line performance of the part, not the per-core performance.

GPUs universally handle embarrassingly parallel workloads. The kind of workloads that scale linearly, or so close that they may as well. ILP and TLP for their common work is typically limited by the data size (2 million pixels output -> minimum 2 million small tasks). This is not so on the desktop and server, nor mobile, for most programs, though server can be close enough, sometimes. GPUs can get such high densities, in large part, because they can be processing different data streams for the same instruction at once. This kind of operation tends not to work well for CPUs, even when the workload can scale out very well, because they tend to branch a lot, and will need to be executing different instructions each, at any given time, even running the same binary in several threads--they need to be able to run it independently.

GPGPU is allowing basically free added performance--free in the sense that the processing power is already going to be there--for situations where the CPU would previously do little bits of highly parallel work, but it's not going to work as a general purpose CPU replacement. Theoretically, speculation could be replaced with predication, but even if the binary size were small enough (should be possible, if trusting the HW to manage the parallelism, so the compiler only has to tell it about it, and do some software preloads), memory bandwidth needed would be way too high, leading to higher costs, even if we were to make CPUs, and software infrastructure for them, that were designed around high per-task parallelism.

So why can't we see the CPU in a similar fashion? Software is moving towards being, in general, "multi-threaded", so we should see increasing execution cores in the same way as increasing cache, widening/shortening pipeline length, improving branch predictors, etc. -- just one aspect of the final product.

What do you all think?

Once software catches up, and allows easy programming across hundreds, or thousands of cores, we will. Each loop would be given either an iteration count or timer, after which it must check back in with a scheduler (trust me, actually making this happen is much harder than it reads, and we're years away from good language & IDE support). In this way, any number of cores up to however many can theoretically be used, can actually be used, be it 8, 16, or 500, by making each small task capable of using a currently-sleeping thread, or doing n small tasks in each thread with fewer available. However, it is going to take time, because everyone who was shouting at the top of their lungs years ago was fringe, and considered ahead of their time. Even when this happens, you won't get double performance for double the cores. With the kind of work CPUs generally do, having the extra cores available, for short utilization periods, will be what is important. Many things you want to get done depend on previous things getting done, so those things must be done in order, one after the other, not at the same time. There's no way around that, easy or hard.

Arachnotronic · Jun 2, 2011

Nemesis 1 said:
AMD may go to to the EU complaining intel is stifling innovation again . But that remark by the EU is going to come back at them bigtime.

I doubt it. Just because AMD chose to implement some instructions doesn't mean that Intel has to put them in too in order to not be "stifling innovation".

Also, I think people are pretty grossly overestimating the benefits of FMA. There's so much more to a CPU's performance than some instructions...

Nemesis 1 · Jun 2, 2011

I agree 100% .

Edrick · Jun 2, 2011

Nemesis 1 said:
The market isn't going to rush to develop FMA programms for a processor that has less than 7% of the server market. They will develop for AVX . Until intel says what FMA they use intels has effectively frozen the market here as intel will likely use FMA3 and not FMA4 as AMd uses . I not going to develop anything for 7% market share until intel says what there doing . AMD may go to to the EU complaining intel is stifling innovation again . But that remark by the EU is going to come back at them bigtime.

FMA is already being used on IBM Power systems as well as GPU (CUDA) programming. Intel is going FMA with Haswell. So yes, I do think we will see FMA enabled programs soon.

Nemesis 1 · Jun 2, 2011

Bearach said:
I thought I read somewhere that Intel were going to do FMA4 but changed last minute to FMA3, AMD finally went with FMA4 not long after?

From Wikipedia:

The incompatibility between Intel's FMA3 and AMD's FMA4 is due to both companies changing plans without coordinating coding details with each other. AMD changed their plans from FMA3 to FMA4 while Intel changed their plans from FMA4 to FMA3 almost at the same time. The history can be summarized as follows:

August 2007: AMD announces the SSE5 instruction set, which includes 3-operand fused multiply-add instructions. A new coding scheme (DREX) is introduced for allowing instructions to have three operands.

April 2008: Intel announces their AVX and FMA instruction sets, including 4-operand fused multiply-add instructions. The coding of these instructions uses the new VEX coding scheme which is more flexible than AMD's DREX scheme.

December 2008: Intel changes the specification for their FMA instructions from 4-operand to 3-operand instructions. The VEX coding scheme is still used.

May 2009: AMD changes the specification of their FMA instructions from the 3-operand DREX form to the 4-operand VEX form, compatible with the April 2008 Intel specification rather than the December 2008 Intel specification.

It is currently uncertain whether the 3-operand VEX coded form (here called FMA3) or the 4-operand form (FMA4) will be the dominating standard in the future. It is also possible that future processors will support both forms.

With it being from Wikipedia, I guess we'll have to remember it could be wrong.

The Vec is for AVX has nothing to do with FMA what so ever. Intel was talking FMA 4 but decided on AVX instead, The vec is for a simple recompile on existing apps. One simplely adds the vec at the code prefix and its an automatic recompile . Intel has not said if they will use FMA 3 or 4 . I think they will use 3 I would. Vex for AVX is an intel exclusive and always will be. GO to intel and get the info its easy to find.

Edrick · Jun 2, 2011

Intel17 said:
Also, I think people are pretty grossly overestimating the benefits of FMA. There's so much more to a CPU's performance than some instructions...

Have you seen the benchmarks on SB with AXV enabled versus AVX disabled (on tests that use AVX)? That almost doubles the FPU performance. FMA will do the same. Try doubling the FPU performance without using 'some instructions', it would take many cpu generations.

There is only so many Ghz you can pump out of CPU. And there are only so many cores you can add before it just doesnt matter any more. But adding instruction sets to the x86 platform does add a lot of gain for any amount of cores. Granted applications have to be compiled to use them, but applications also have to be compiled to use more cores.

Nemesis 1 · Jun 2, 2011

Edrick said:
FMA is already being used on IBM Power systems as well as GPU (CUDA) programming. Intel is going FMA with Haswell. So yes, I do think we will see FMA enabled programs soon.

FMA has been used for years by intel Itanic EPIC VLIW.

Nemesis 1 · Jun 2, 2011

Haswell is convergance no more itanic . As I said clearly here all those years ago . Using both P4 and Itanic elements.

Edrick · Jun 2, 2011

Nemesis 1 said:
FMA has been used for years by intel Itanic EPIC VLIW.

Forgot about that, thanks. But that only strenghtens by arguement.

Nemesis 1 · Jun 2, 2011

Well intels roadmaps won't show it . But with what we have learned about Tri-gate and its power savings . I fully expect intel to make 2 changes to IB . The IGP thats a given . Than there is the 1 change that intel won't announce till they deliver . I think Intels AVX goes from 256 to 512 . Actually it strengthens mine as VLIW is what ATI gpus use . Intel is ready and waiting . Intels IGP also uses VLIW

Bearach · Jun 2, 2011

Nemesis 1 said:
The Vec is for AVX has nothing to do with FMA what so ever. Intel was talking FMA 4 but decided on AVX instead, The vec is for a simple recompile on existing apps. One simplely adds the vec at the code prefix and its an automatic recompile . Intel has not said if they will use FMA 3 or 4 . I think they will use 3 I would. Vex for AVX is an intel exclusive and always will be. GO to intel and get the info its easy to find.

It specifically says FMA3, and more from Wikipedia again, but I remember seeing from other sources:

The FMA instruction set is a future extension to the 128-bit SIMD instructions in the X86 microprocessor instruction set to perform fused multiply-add operations. Two different variants of FMA will be used:

FMA3 will be supported by Intel in their Haswell processors in 2013 & Broadwell processors in 2014

FMA4 will be supported in AMD processors from 2011.

biostud · Jun 2, 2011

isn't BD going to compete with Ivy Bridge?

Nemesis 1 · Jun 2, 2011

Bearach said:
It specifically says FMA3, and more from Wikipedia again, but I remember seeing from other sources:

Simply go to intel and get the AVX white paper than you know whats up .

Arachnotronic · Jun 2, 2011

biostud said:
isn't BD going to compete with Ivy Bridge?

SNB-E is its real competitor, I think. Why is everyone so fixated in IVB?

GammaLaser · Jun 3, 2011

Intel17 said:
SNB-E is its real competitor, I think. Why is everyone so fixated in IVB?

SNB-E is a replacement for Gulftown, so I think not.

Arachnotronic · Jun 3, 2011

So are we saying that SNB-E will be in a class of its own?

dma0991 · Jun 3, 2011

Intel17 said:
So are we saying that SNB-E will be in a class of its own?

In terms of price the SB-E will be in a class of its own unless the actual price of the flagship FX is priced according to SB-E instead of SB.

Nemesis 1 · Jun 3, 2011

Intel17 said:
SNB-E is its real competitor, I think. Why is everyone so fixated in IVB?

You have a very good posting style.

SNB-E will no doubt be very good . I love performance PCs. But ya know what SNB-E is complete overkill for 99% of all users period. IT has its place in workstations but on the desktop I really question it .

Now the 1155 this is a great desktop and intel breaking it up with 2 seperate sockets was brillant . The only weakness I see is the memory channel being only 2 but that hasn't shown yet . But with a 6 core IB it likely would. But a 4 core is more than enough using 8 threads.

IB is only 6 months away the news on 3D tri-gate if true changes the landscape dramaticly. BD has been hyped along time . Its missed its show time . So its time to move on to The next big deal .

TALKING points
1) Improved IGP more everthing and addition of CL which is a huge talking point for llano right now . Intels CL compute will match anyones period.
2) Intel when introducing SB gave a talk on AVX and mentioned specificly that AVX scales up easily. from 256bit to 1024bits. In IB slides intel specificly states AVX enhancements so I think that enhancement will be a move to 512 bit.
a) Intels JIT compiler for AVX. <- will be improved also . Vec will likely do auto recompile on more apps.
b) Intels work on cl on the cpu first and now the move to IGP .
c) Intels compiler for C++ for use with open cl

Points abc In Ivy bridge we will see how intel has combined all 3 elements for cpmplete seemless throughput.

Its time to talk about IB . LLano is nice but early to the game for CL its graphics capabilities are ok . its processing cpu power is a joke.

The biggest improvement we see on IB short term will be the Enhancments to AVX I believe that to be a move from 256bit to 512bit . Open CL on IGP is still a little early but apps will be using more and more and should be pretty good by the time haswell shows.

Arachnotronic · Jun 3, 2011

Nemesis 1 said:
You have a very good posting style.

Thanks!

Nemesis 1 said:
SNB-E will no doubt be very good . I love performance PCs. But ya know what SNB-E is complete overkill for 99% of all users period. IT has its place in workstations but on the desktop I really question it .

Now the 1155 this is a great desktop and intel breaking it up with 2 seperate sockets was brillant . The only weakness I see is the memory channel being only 2 but that hasn't shown yet . But with a 6 core IB it likely would. But a 4 core is more than enough using 8 threads.

IB is only 6 months away the news on 3D tri-gate if true changes the landscape dramaticly. BD has been hyped along time . Its missed its show time . So its time to move on to The next big deal .

TALKING points
1) Improved IPG more everthing and addition of CL which is a hugh talking point for llano right now . Intels CL compute will match anyones period.
2) Intel when introducing SB gave a talk on AVX and mentioned specificly that AVX scales up easily. from 256bit to 1024bits. In IB slides intel specificly states AVX enhancements so I think that enhancement will be a move to 512 bit.
a) Intels JIT compiler for AVX.
b) Intels work on cl on the cpu first and now the move to IGP .
c) Intels compiler for C++ for use with open cl

Points abc In Ivy bridge we will see how intel has combined all 3 elements for cpmplete seemless throughput.

Its time to talk about IB . LLano is nice but early to the game for CL its graphics capabilities are ok . its processing cpu power is a joke.

The biggest improvement we see on IB short term will be the Enhancments to AVX I believe that to be a move from 256bit to 512bit . Open CL on IGP is still a little early but apps will be using more and more and should be pretty good by the time haswell shows.

Well, yeah, I know that SNB-E will be overkill for most user workloads -- heck, my i7 860 still doesn't really see much challenge for what I do (gaming, mostly). But like you said, it's the thrill of having the highest performing system.

Now the reason I can't really get it up for IVB is that I know that the mainstream parts will come first, creating this kind of odd separation between SNB-E and IVB just like we have between SNB and Gulftown. It's just weird that the midrange stuff has a better uArch (and soon to be process tech AND uArch) than the bleeding edge stuff.

If Intel can fend off AMD with SNB and IVB and then have SNB-E/IVB-E be in a league if its own (or in the server space), then great. But I miss the days of the high end stuff being released first and with the latest goodies (like Gulftown!)

Thoughts on "8 Core" Bulldozer and "4 Core Sandy Bridge"

Lifer

Golden Member

Golden Member

Golden Member

Lifer

Senior member

Elite Member

Lifer

Lifer

Golden Member

Lifer

Golden Member

Lifer

Lifer

Golden Member

Lifer

Senior member

Lifer

Lifer

Lifer

Member

Lifer

Platinum Member

Lifer

Lifer