Return of FX : GA-990FXA-UD7 - OC'd AMD FX 8150 / 6990 Performance Comparison Review

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
Fact remains, it goes from being 48% behind at single threaded to DEAD even in the multithreaded test, compared to 2500k. If you are going to call that "bad scaling", than what is wrong with Intel's scaling such that they lose so much ground when going to the multicore test?

Nothing is wrong. Intel just has a smarter approach to >4 threads than does AMD. Instead of devoting entire ALUs to a select few programs that can utilize it Intel has hyperthreading and it hasn't harmed per-core performance or clock speed. Therefore you're getting better performance for 4 threads and under (a vast majority of workloads) and you're still getting better performance with more than 4 threads (as we see the 2600K whooping the 8150 in nearly everything). So hyperthreading might look less efficient as far as scaling goes, but if you consider the entirety of the CPU and today's workloads it makes far more sense.

That's pretty much why I don't understand CMT. It might make more sense with their HSA agenda, but we're still not there yet.

BTW, the Thuban has fewer transistors and is able to keep up in most heavily threaded scenarios and that's entirely because of IPC and Bulldozer's low clock speeds.
 
Last edited:

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
pelov,

Intel's Hyper Threading has been with us for 10 years, yes from 2002 and Pentium 4 3.06GHz HT. If im not mistaken, first HT generation was producing problems in Windows and they needed a patch. Pentium 4 with HT enabled was performing worst than with HT off at that time. Look what HT can do now.

AMDs CMT in Bulldozer is the first generation of that technology, wait a few years and we can talk again if it was successful or not.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
It'll get better with software and the new win8 scheduler but CMT isn't going to be the big change. HSA, otoh, is a massive step and offers far more significant performance increases but that's first coming 2013 with Kaveri. It's going to require a lot more openCL/Directcompute and AMD working with developers from the ground up but historically that's always been AMD's weakest point.
 

polyzp

Member
Jan 4, 2012
161
0
71
Depends on the application used. More and more programs become Multithreaded in desktop. Not to mention you cannot have a 40% increase of IPC from one CPU generation to the next anymore. But you can have 30-50% more performance through higher core/thread count.

Im not saying that IPC doesn't count, but it all depends on the applications we are using.

This is true, but intel has the benefit of IPC always counting towards performance in benchmarks or games and multihtreaded applications, and more cores/threads not (in all scenarios).. Soon cores/threads will be a lot more valuable, and IPC will only become less of a factor as it naturally becomes harder to improve than core count, and more multithreaded applications are released.
 

polyzp

Member
Jan 4, 2012
161
0
71
It'll get better with software and the new win8 scheduler but CMT isn't going to be the big change. HSA, otoh, is a massive step and offers far more significant performance increases but that's first coming 2013 with Kaveri. It's going to require a lot more openCL/Directcompute and AMD working with developers from the ground up but historically that's always been AMD's weakest point.

I can confirm that windows 8 CP does increase FX performance by up to 15% in certain Passmark Performance CPU tests when compared to windows 7 at the same 4.9 ghz overclock. But due to GPU driver problems, I am working on windows 7 SP1 now, and my benchmarks are all in windows 7.
 

Don Karnage

Platinum Member
Oct 11, 2011
2,865
0
0
I can confirm that windows 8 CP does increase FX performance by up to 15% in certain Passmark Performance CPU tests when compared to windows 7 at the same 4.9 ghz overclock. But due to GPU driver problems, I am working on windows 7 SP1 now, and my benchmarks are all in windows 7.

Passmark is a garbage benchmark
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
I can confirm that windows 8 CP does increase FX performance by up to 15% in certain Passmark Performance CPU tests when compared to windows 7 at the same 4.9 ghz overclock. But due to GPU driver problems, I am working on windows 7 SP1 now, and my benchmarks are all in windows 7.

I think the gains were upwards of 10%? It remains to be seen if we'll see the scheduler outside of win8, though. If MS is holding the improved scheduler hostage -- which wouldn't surprise anyone, let's be honest -- it would suck for all BD/Vishera owners. Hopefully MS implements this into an update of some kind for win7 because paying $140 for a new really crappy OS to see that 10% would be ridiculous.
 

Chiropteran

Diamond Member
Nov 14, 2003
9,811
110
106
Nothing is wrong. Intel just has a smarter approach to >4 threads than does AMD. Instead of devoting entire ALUs to a select few programs that can utilize it Intel has hyperthreading and it hasn't harmed per-core performance or clock speed. Therefore you're getting better performance for 4 threads and under (a vast majority of workloads) and you're still getting better performance with more than 4 threads (as we see the 2600K whooping the 8150 in nearly everything).

Except, not. You and others keep bringing up OTHER examples.

Me: The hypothetical CPU that is 40% faster per thread for 4 threads will lose to FX-8150 in a full 8 thread test

you: But the 2600k beats FX-8150 in a fully threaded cinebench test

Me: the 2600k isn't 40% faster, it's more like 55% faster. You can't just change the CPU halfway through the argument.

Why do you and others coninually refuse to stick to the original argument? I never said 2600k. I never said 2500k. I was referring to the poster who said some "ivy" CPU was 40% faster in the singel core test. My point was very simple; based on scaling of SB and FX, we know that a 40% advantage is NOT enough to counter the advantage of FX's superior 8 thread scaling.

BTW, the Thuban has fewer transistors and is able to keep up in most heavily threaded scenarios and that's entirely because of IPC and Bulldozer's low clock speeds.

Keep up? Most? Why do you use such imprecise terms? You can say that about anything and be technically true, "ah you see it wins in 2 tests and is only 15% slower in other tests, so it keeps up" Such things are pointless, it's like saying something is "maybe" faster. Take a stand and make a real point.

Also, the fastest Thuban costs more than the fastest FX CPU, uses more idle power, and loses in many tests. More or less transistors is really a detail that NOBODY in the real world cares about, on it's own. It's only relevant if it leads to lower prices or power or some other tangible benefit, which it does not in this case.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
So you're okay with comparing it to Ivy but not Sandy? Is that supposed to make it better?

Benchmark bomb.


Note how close the 2500K is despite the mixed workload (both integer and FP) and the fact that the BD can't seem to pull away from a 45nm Thuban with 6 independent cores.


Aaaand there's your reason why. Poor IPC and clock speeds hurt it in the long run regardless of work load, making BD's strong suits seem weaker and it's weak points look even worse.

Now I know what you're thinking, who cares about single-threaded performance? Well, pretty much everyone, but let's consider multi-threaded performance without much of a cache impact (remember BD has atrociously slow cache)

and it's still shit. Considering the added threads here it should technically be doing better than the Thuban and the 2500K both and by a considerable margin but it doesn't win the benchmark it should have won...again.


That's right, folks. They managed to make the cache slower and the clock speeds have essentially stayed stagnant thus accessing that L2 cache (which is considerably slower than the Thuban/Deneb, L2 being the worst) a bad idea.

http://images.anandtech.com/graphs/graph4955/41731.png
For anyone who knows AMD chip architecture, this last one isn't surprising as the L3 speeds are always asynch and slow.

http://images.anandtech.com/graphs/graph4955/41689.png
Now here BD would have won by a large margin despite those hampered clock speeds and IPC if it had more FPUs, but it doesn't so it lost again.

As I alluded to earlier, single threaded performance is going to be a bit of a disappointment with Bulldozer and here you get the first dose of reality. Even considering its clock speed and Turbo Core advantage, the FX-8150 is slower than the Phenom II X6 1100T. Intel's Core i5 2500K delivers nearly 50% better single threaded performance here than the FX-8150

That's a whole 50% per ALU/FPU for that workload and it still can only catch up despite twice as many ALUs and even # of FPUs


But hold on, it's not all bad.

As I mentioned earlier, these are the types of benchmarks you want to show and with good reason: they are almost entirely integer-based and scale well with the core count. But this isn't great news because of how close the 2600K is despite having half the number of integer cores. Considering Intel's hyperthreading is ~30% efficient (best case scenario) and CMT tax is only ~20% (worst case scenario) the 8150 should be pounding on the 2600K in this test. Yet it doesn't because of poor IPC and clock speeds.


Same thing here except it loses. The Thuban does poorly here because this benchmark takes advantage of newer instruction sets the Thuban lacks, otherwise I think it would likely have won this.

Speaking of AVX

Now I'm finally impressed. The FPUs aren't as broken as we thought and can actually perform quite well but still not nearly well enough to make it count and make up for the deficiencies in other areas (it's helped by being partially mixed and not an entirely FP dominated workload). AVX performance is definitely one of its strong suits, along with idle power due to very good gating, and their turbo which is more efficient than Intel's. Unfortunately, that's pretty much it as it has far more shortcomings than it does advantages

Like this, for instance


Chiropteran, I love AMD. I do. Really. They make amazing GPUs that are far better than nVidia's (imo) and I'm typing this on a Deneb 955, but you've got to admit they made a catastrophic failure with Bulldozer here. Defending it against me or against anyone with more than 2 brain cells isn't going to do you any good. Sorry
 
Last edited:

Chiropteran

Diamond Member
Nov 14, 2003
9,811
110
106
You are not seeing the point. I was responding to a single comment on a single benchmark. I don't really care about the results of any of those benchmarks you posted, as they have nothing to do with what I was talking about.

IGNORE the CPU model/manufacturer/number/etc, okay?

A CPU is 40% faster than B CPU for a single thread, HOWEVER B CPU has twice as many threads. The benchmark in question scales nearly perfectly with the number of threads.

140% performance (CPU A) X 4 threads = 560% overall
100% performance (CPU B) X 8 threads = 800% overall

CPU B will be faster, in a real test, as opposed to one that is limited to a single thread.

The ultimate result: single thread IPC is NOT the only important metric for measuring performance.

Do you understand?
 
Last edited:

polyzp

Member
Jan 4, 2012
161
0
71
You are not seeing the point. I was responding to a single comment on a single benchmark. I don't really care about the results of any of those benchmarks you posted, as they have nothing to do with what I was talking about.

IGNORE the CPU model/manufacturer/number/etc, okay?

A CPU is 40% faster than B CPU for a single thread, HOWEVER B CPU has twice as many threads. The benchmark in question scales nearly perfectly with the number of threads.

140% performance (CPU A) X 4 threads = 560% overall
100% performance (CPU B) X 8 threads = 800% overall

CPU B will be faster, in a real test, as opposed to one that is limited to a single thread.

The ultimate result: single thread IPC is NOT the only important metric for measuring performance.

Do you understand?

Remember scaling on all 8 cores for FX is not double scaling on all 4 for the i5.

If an FX 8150 scales as 6.69 for 8 cores, the i5 2500k doesnt necessarily scale half this (3.34) but closer to ~3.62 ish.

Part II for my review will be up tomorrow or the next day!
 

Chiropteran

Diamond Member
Nov 14, 2003
9,811
110
106
Remember scaling on all 8 cores for FX is not double scaling on all 4 for the i5.

No, it's not. But the anandtech bench cinebench results show that it is good *enough*.

This has really been blown way out of proportion. I just thought it was funny that the post I originally commented on was bragging about "ivy" being 40% faster when cinebench was artificially limited to a single core. Ironic that he picked that test, since it's one of the few that scale so well on FX that FX can match the 2500k and would beat a CPU that is "merely" 40% faster per thread.

Edit: Showing my work.

http://www.anandtech.com/bench/Product/288?vs=434

Cinebench single threaded:

i5 5860
FX 3938

Cinebench multithreaded:

i5 20381
FX 20254

Scaling:

i5 348%
FX 518%
 
Last edited:

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
140% performance (CPU A) X 4 threads = 560% overall
100% performance (CPU B) X 8 threads = 800% overall

It would be more like 720% because of -20% x 4 for CMT tax, but that doesn't matter as it's a completely hypothetical completely integer-based workload that is very rarely true. And it would do you good to actually read my post and not ignore it because I posted a benchmark to reflect exactly your hypothetical workload.

The major takeaway should be that BD does well in very few select applications (one of which is your hypothetical but you can't even seem to work out the actual figures here) but in general is just a poor approach to a problem that never existed: lack of coars. CMT has potential but currently you're far better off with 4 stronger cores and hyperthreading than you are with lower IPC and clock speeds for moar coars.
 

Chiropteran

Diamond Member
Nov 14, 2003
9,811
110
106
your hypothetical workload.

Wrong. The hypothetical is the cpu, "ivy" that is 40% faster than FX at the single threaded cinebench. The workload is 100% real and tested by anandtech.

I don't know why this is so hard for you: scroll up, click the link in my post, look at the cinebench results. I'm ignoring your multitude of irrelevant graphs because they have nothing to do with what I was responding to.

Dude said "ivy" was faster at cinebench because it's single threaded performance was higher, I said it wasn't and proved it.

Then you come in and are like "no way, see ivy is faster on these other tests".

Thanks friend, but I already know perfectly well how poor FX does in the majority of workloads. I'm fine with that. I was merely pointing out the irony in the post where the guy was talking about cinebench specifically, because in that case FX actually does really well when it comes to multithreaded performance.

Here it is:
http://forums.anandtech.com/showpost.php?p=33385811&postcount=65

Originally Posted by nyker96
the single core ipc for BD v Ivy in cinebench is at a 40% deficit running at about same speed. That's crazy poor.

I found this ironic, because at a 40% deficit when limited to a single thread means FX would actually win in the multi-threaded test. Crazy poor? I think not.
 
Last edited:

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
But Ivy is at a 50% deficit on number of cores, so BD is actually faster when 8 threads are used. See how useless "ipc per core" is as a number?

Thanks friend, but I already know perfectly well how poor FX does in the majority of workloads. I'm fine with that.

Yet I'm the one who has things backwards?

 

Chiropteran

Diamond Member
Nov 14, 2003
9,811
110
106
Yet I'm the one who has things backwards?

Why is this so hard for you?

Something can be bad at many things but still be good at something. I found it extreamly ironic that nyker96 posted an example about how "crazy poor" FX was because ivy was faster, when the truth is FX actually wins the benchmark in question when it isn't artificially limited to 1 thread.

So yes, you have things completely backwards if you think that my post somehow says FX is an awesome CPU and great at everything. I never said anything of the sort, I don't really care one bit. Your spamming of irrelevant graphs and charts doesn't change anything.

I really thought the audience here was a bit more logical and deductive, if I realized how I had to spell out every little detail as if I was talking to a 5 year old I wouldn't have bothered posting.
 
Last edited:

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
CMT has potential but currently you're far better off with 4 stronger cores and hyperthreading than you are with lower IPC and clock speeds for moar coars.

From a performance point of view that is not true,

There are applications where the Bulldozers CMT 8 threads are as fast or faster than Intels 4 cores 8 threads HT. Cinebench is not one of them, dont take only that application and use it as a general rule.

As i have said before, IPC is application depended. Some MT programs will run as fast or faster in BD vs Core i7 and some will be faster with Intel CPUs. AMDs Phenom II and Bulldozer have higher MT scaling than Intels HT CPUs, even in Cinebench.

 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
AtenRa, I don't question it's multi-threaded performance, what I question is the direction the architecture took in achieving that performance (which is underwhelming, let's be honest). Cinebench is also benchmark that AMD doesn't like to use because it doesn't do well in it but that's not the point (for the record, they use 3Dmark CPU/Vantage and like measuring performance in GFLOPS).

The result of the above scaling is a pandering to the extreme end of the spectrum. AMD could have made BD with 40 cores and because Cinebench can task up to 40 threads it would have been amazing but that's not how it works. In going CMT they sacrificed IPC and clock speeds, die size and power consumption and all of that for heavily multi-threaded workloads that aren't common on the desktop. As a server first architecture this makes far more sense but I don't run a server so I don't care On the desktop it's underwhelming as a whole and even if you find a place where it works you have to question its efficiency. IPC is application dependent but as a whole it's taken a very costly slide that's hurt it in every single application (minus that AVX workload above that I linked which is actually impressive).

They're forced to sell cheaper and this means less money in their pockets because it's not as good as a hyperthreading alternative nor as small so they make less $$$ per sale. That too doesn't make any sense. What Intel does is cater to the largest base by offering 4 strong cores and hyperthreading for those that need it for the fewer in number heavily multi-threaded workloads. AMD has attacked this problem backwards by going for the extreme end of the spectrum and pandering to the heavily-threaded (integer heavy) workloads (which are rare) and offer relatively weak performance in the largest base of applications, that being 1-4 threads. Because software is so slow to mature, both in ISAs and # of concurrent threads, hyperthreading makes far more sense than CMT. So even though it's got some great things going for it, CMT isn't one of them and it was what dictated the architecture. That's why I hate Bulldozer
 

polyzp

Member
Jan 4, 2012
161
0
71
AtenRa, I don't question it's multi-threaded performance, what I question is the direction the architecture took in achieving that performance (which is underwhelming, let's be honest). Cinebench is also benchmark that AMD doesn't like to use because it doesn't do well in it but that's not the point (for the record, they use 3Dmark CPU/Vantage and like measuring performance in GFLOPS).

The result of the above scaling is a pandering to the extreme end of the spectrum. AMD could have made BD with 40 cores and because Cinebench can task up to 40 threads it would have been amazing but that's not how it works. In going CMT they sacrificed IPC and clock speeds, die size and power consumption and all of that for heavily multi-threaded workloads that aren't common on the desktop. As a server first architecture this makes far more sense but I don't run a server so I don't care On the desktop it's underwhelming as a whole and even if you find a place where it works you have to question its efficiency. IPC is application dependent but as a whole it's taken a very costly slide that's hurt it in every single application (minus that AVX workload above that I linked which is actually impressive).

They're forced to sell cheaper and this means less money in their pockets because it's not as good as a hyperthreading alternative nor as small so they make less $$$ per sale. That too doesn't make any sense. What Intel does is cater to the largest base by offering 4 strong cores and hyperthreading for those that need it for the fewer in number heavily multi-threaded workloads. AMD has attacked this problem backwards by going for the extreme end of the spectrum and pandering to the heavily-threaded (integer heavy) workloads (which are rare) and offer relatively weak performance in the largest base of applications, that being 1-4 threads. Because software is so slow to mature, both in ISAs and # of concurrent threads, hyperthreading makes far more sense than CMT. So even though it's got some great things going for it, CMT isn't one of them and it was what dictated the architecture. That's why I hate Bulldozer

With the next generation of Bulldozer chips, increased IPC and decreased power consumption will be the focus (15-20% per gen).

Looking at my overlocked results at 4.9 Ghz, a 15-20% increase in IPC will push the FX 8350 ahead of ivy bridge in many scenarios, but it will still fall behind IPC of ivy by around 25-30% per clock. Scaling for Vishera could actually increase as well.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
With the next generation of Bulldozer chips, increased IPC and decreased power consumption will be the focus (15-20% per gen).

Looking at my overlocked results at 4.9 Ghz, a 15-20% increase in IPC will push the FX 8350 ahead of ivy bridge in many scenarios, but it will still fall behind IPC of ivy by around 25-30% per clock. Scaling for Vishera could actually increase as well.

I don't think we'll be able to see significant gains on either until they abandon the socket compatibility -- server mainly because that's generally what dictates the advancement and abandoning it allows freedom for improvements without being hamstringed by backwards compatibility. That is supposedly happening with Steamroller? so 2013. They really need to be able to work with a clean slate and apparently they've realized that.

I guess we'll see how Trinity fairs in order to preemptively judge how much they've gained in perf-per-watt and IPC both. I wouldn't expect 15-20% IPC for Vishera as Trinity looks to have roughly matched Llano, if we're to go by the leaked benchmarks. I'd be hesitant to guesstimate clock speeds because Trinity may be significantly bogged down by that extra 50% die size VLIW4 it's lugging around, so clock speeds are still an unknown variable. L3 cache might add another +10% to gaming performance so that's another good sign but it's unlikely they changed the L3 speed given they'll still be using the 9xx-series chipsets so expect the L3 to be big and slow again in typical AMD fashion

15-20% as a combination of both IPC and perf-per-watt I'd say would be a high end estimate but certainly not IPC or perf-per-watt alone. Clock speeds are another matter entirely.
 

polyzp

Member
Jan 4, 2012
161
0
71
have you done any 4-thread testing on Bulldozer, so we can see "without module contention, bulldozer is this much faster per clock than Ph2"?

Can you do a Fritz Chess benchmark run with 1 thread / 4 threads? And a WPrime 32m - 1x Thread run aswell. Thanks again man! Ill be using your results in my next post!
 

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
With the next generation of Bulldozer chips, increased IPC and decreased power consumption will be the focus (15-20% per gen).

Looking at my overlocked results at 4.9 Ghz, a 15-20% increase in IPC will push the FX 8350 ahead of ivy bridge in many scenarios, but it will still fall behind IPC of ivy by around 25-30% per clock. Scaling for Vishera could actually increase as well.

LOL at the wishful thinking that AMD will be able to magically get 15-20% higher IPC from a slightly tweaked architecture on the same process node.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |