Good article: Analyzing Bulldozer: Why AMD's chip is so disappointing

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

apoppin

Lifer
Mar 9, 2000
34,890
1
0
alienbabeltech.com
i'm trying to figure out how you test for a CPU wall with a GPU bound test by adding more GPU power
i thought it was pretty clear from my article.

It is accomplished by overclocking the CPU. Adding more GPU power generally requires that the CPU also speed up.

In this case, overclocking FX-8150 from stock clocks to 4.4GHz made HD 6970-X2 CrossFire scale again. And adding the third GPU made some solid performance in some games; very little in other.

Overclocking the FX8150 further to 4.8GHz will show if X3 Tri-Fire will continue to scale and adding a 4th GPU for Quad-Fire will show the limitations of the CPU again.
 

hooflung

Golden Member
Dec 31, 2004
1,190
1
0
If your particular workload actually utilizes that much CPU power, great. You aren't the typical case though. I don't cheer for a given manufacturer, so if thinking that this marketing blitz for "more cores" is asinine on AMD's part means I am attacking AMD, I guess I am. However, if Intel would have come out with the same marketing material and processor, I would have the same opinion of the strategy that I do right now. It may fit a few small niches (but does it fit better than offerings from Intel?), but it's not the proper direction for most of the market.

If you want to make broad generalizations... I'm game:

So what is your point exactly? The typical workload case you are trying to portray is being savagely eroded by tablets. If market constants hold true, the PC market will clearly start to move towards specialized users such as gamers and developers who actually need more than a spreadsheet and word processor.

What happens when Windows 8 breaks backwards compatibility and you need a Virtual PC instance to run your 2002 applications? Would be nice to have them cores eh? And what happens when AMD and Nvidia drivers start to break massive backwards compatibility with older DX8/9 games and you need to virtualize your retro gaming? It's coming faster than you think sir.

These edge cases are going to smack you in the face in the future you don't think exists.
 

Ferzerp

Diamond Member
Oct 12, 1999
6,438
107
106
Hooflung, we are not going to run datacenter workloads on tablet pcs. Sorry, just no. Wow. That's just... Wow.
 

apoppin

Lifer
Mar 9, 2000
34,890
1
0
alienbabeltech.com
Thanks for putting an i7-920 at 3.8 up for comparison. That's the max stable I can reach with my 920, and I'm glad that it doesn't seem to form a bottleneck even with 6970 Trifire. It means my CPU will easily last until Haswell, perhaps even beyond that if it doesn't bottleneck HD8000/GTX700 series (single GPU config).
You're welcome. For 3 years old, it is still a very relevant CPU. i don't feel a personal need to upgrade to i5-2700K although i probably must for my tech site. i am getting CPUs from Intel again for review; next up is an i3-2105 i am putting against the Phenom IIs (and maybe the FX-4100) in gaming.

i haven't yet tested Quad-Fire; my X58 MB isn't set up for it. i will be starting a new article - Part 3 of SLI vs CrosFire Scaling which is going to explore HD 6970-X3 TriFire and GTX 580 SLI across several CPU platforms including Dual core
 

Ferzerp

Diamond Member
Oct 12, 1999
6,438
107
106
Vic, just an FYI, I ran a poll in a datacenter focused message board, and out of 19 respondents, the only one who wanted more CPU power in their virtualization environment was some guy who had netburst Xeons... (ugh)

The pain points were evenly split between needing disk performance, and memory volume. One person did need more disk volume though.

It's pretty universal that we're not hurting for CPU in general in the VM world.
 

blckgrffn

Diamond Member
May 1, 2003
9,199
3,185
136
www.teamjuchems.com
If your particular workload actually utilizes that much CPU power, great. You aren't the typical case though. I don't cheer for a given manufacturer, so if thinking that this marketing blitz for "more cores" is asinine on AMD's part means I am attacking AMD, I guess I am. However, if Intel would have come out with the same marketing material and processor, I would have the same opinion of the strategy that I do right now. It may fit a few small niches (but does it fit better than offerings from Intel?), but it's not the proper direction for most of the market.

Whatever man. +1 for another VM admin where we are wanking the CPU's on our 24 way (4S) and 32 way (4S) Intel servers with comparatively modest storage back end. If you are really creating a private cloud that is cost effective you are pushing density hard. Like 100+ VMs per box.

If our pooch Dunnington boxes could be new bulldozer boxes I would be pretty happy.

They must plan on competing based on price with Nehalem-EX though, as those chips are simply beastly from a performance standpoint, I would think in many cases their IPC advantages would outweigh the thread count advantage an AMD chip would bring to the table.

Pair that with the hard limit of vCPUs on a VMware instance... well... there is going to be some sort of limitation there.

If you don't tax your VMware servers you bought too much compute, sorry. It's about density and scaling and if you are in a modest environment it would be easy to get more than enough CPU, especially with Nehalem/Westmere.
 
Last edited:

podspi

Golden Member
Jan 11, 2011
1,982
102
106
Vic, just an FYI, I ran a poll in a datacenter focused message board, and out of 19 respondents, the only one who wanted more CPU power in their virtualization environment was some guy who had netburst Xeons... (ugh)

The pain points were evenly split between needing disk performance, and memory volume. One person did need more disk volume though.

It's pretty universal that we're not hurting for CPU in general in the VM world.

Do people really abide by the one VM per core rule that AMD is always espousing? I've always thought you could fit more for low-utilization VMs...
 

blckgrffn

Diamond Member
May 1, 2003
9,199
3,185
136
www.teamjuchems.com
Do people really abide by the one VM per core rule that AMD is always espousing? I've always thought you could fit more for low-utilization VMs...

Hahahaha... Nope.

In my environment we are about 6 to 7 vCPUs per actual core with the typical VM having two cores, some having up to eight.

Our average gets skewed by a large number of virtual router appliances...

For a second I was confused there, VMware has never said that... then I re-read your post again
 
Last edited:

Ancalagon44

Diamond Member
Feb 17, 2010
3,274
202
106
I didnt find it to be a very good article - Arstechnica.com has a much better one on the design decisions and tradeoffs.

This article read just like another review - examining the performance differences but not the whys and the hows. A mention of reduced IPC, but nothing about what AMD's goals were.
 

blckgrffn

Diamond Member
May 1, 2003
9,199
3,185
136
www.teamjuchems.com
I didnt find it to be a very good article - Arstechnica.com has a much better one on the design decisions and tradeoffs.

This article read just like another review - examining the performance differences but not the whys and the hows. A mention of reduced IPC, but nothing about what AMD's goals were.

I had skipped the ars article since the main CPU guy has moved on from there... but I did find it to be a good article just now. Thanks for referencing it

http://arstechnica.com/gadgets/news/2011/10/can-amd-survive-bulldozers-disappointing-debut.ars/1 if anyone else is interested...

If AMD is really putting its cards on floating point performance being driven by APUs... well... that explains a bit.
 

hooflung

Golden Member
Dec 31, 2004
1,190
1
0
Hooflung, we are not going to run datacenter workloads on tablet pcs. Sorry, just no. Wow. That's just... Wow.

Sir, you have as much IT imagination as a rock. Right now my gaming desktop also acts as my development platform. 1 Host OS, 5 ubuntu server instances. How do you think you can simulate a load balanced web app with 1 computer.

What happens when I want to add more virtuals? It is either add another PC, degrade performance across the board and 'share' resources or I can up my cores. bulldozer is a clear win for the workstation. My only grip is the tdp.

It is also the only good desktop processor that can do all of this with Vt-d. I'd have to buy Intel's 990X for its closest competitor to get close to it in 1 package. No thanks.

Welcome to the world of Virtual Machines sir. It is not going away. And it is for a lot more than the datacenter. I don't have to deal with how it scales in the cloud but I do need to show them how it should be set up, proof it does scale and I have to do it the cheapest way possible.
 
Last edited:

Ferzerp

Diamond Member
Oct 12, 1999
6,438
107
106
Do people really abide by the one VM per core rule that AMD is always espousing? I've always thought you could fit more for low-utilization VMs...

1 vCPU per core would be retarded. We only do that in the rare cases where we are only virtualizing for portability. If you have a system that is utilizing all it's processing power 24/7, and you virtualize it to decouple that workload from the physical server, you *might* consider such a rule, but in 95% of the cases, umm, no.
 

JimmiG

Platinum Member
Feb 24, 2005
2,024
112
106
I keep seeing "it's not a bad cpu. It's just not competitive".

For you folks, what exactly would be a "bad" cpu if one that "is not competitive" is "not bad"?

If it cost more than a faster Intel CPU, and it also had over twice the transistor count while using more power, then it would be pretty bad. Wait. nm..
 

Gundark

Member
May 1, 2011
85
2
71
@apoppin, would you mind to include some console emulators tests in your benchmarks (Dolphin and PCSX2)? I would be very gratefull.
 

amenx

Diamond Member
Dec 17, 2004
4,013
2,286
136
Saw someone in another forum with an 8150 in his sig. This was also in his sig:

"Whoever says Bulldozer is a fail obviously doesn't live in Canada, it gets cold up here, the extra heat is gonna help. Now that's true multi-tasking."
 

Ferzerp

Diamond Member
Oct 12, 1999
6,438
107
106
@apoppin, would you mind to include some console emulators tests in your benchmarks (Dolphin and PCSX2)? I would be very gratefull.

Make sure you edit them in to the 10 or 20 times you've posted those graphs too!
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
I wish I could at least understand how the architecture made sense. 6 ALUs per module, two shared, bare minimum would have made sense. 8 ALUs per module, with 4 shared ALUs next to each shared FPU would have made even more sense. But 4 unshareable ALUs per module is just boneheaded.
And, how are you going to schedule all of them, and also hit decent speeds (the easiest way to speed up IO is to make transistors switch more often)? Just streamlining what had been in there since the Athlon, decoupling the branch prediction, and getting rid of reservation stations, should have provided significant speedups, though might still leave Nehalem and up doing better at stupid-simple high-IPC loops.

You seem to be looking at it like sharing execution was on the table. It wasn't, at least not for integer work. It's two separate cores, each with 2 ALU and 2 AGU pipes, that then got their front ends merged into one. There may be front end issues, but the core scaling is good enough that IPC limitations due to sharing threads does not seem to be a significant issue. I'm sure there are lingering CMT-related problems (AMD didn't exactly have a known-good physical implementation to work from), but that part of BD appears to be working as expected. Performance with several threads looks worse on Windows than Linux (indicating that performance drops with 2/4/6/8 threads is mostly suboptimal scheduling, just like with HT), but scaling in general seems to be as good as or better than Stars, even so. A single thread not sharing anything just happens to be slower, more often than not. With the slow caches, I would also wonder about the speed of threaded-style server code (DB engines, interpreted languages, and the like). While the file compression magic might translate to those workloads, I'll remain skeptical, for now.

It is just retarded that 2 ALUs in a module will sit totally idle when there is a cache miss. Those ALUs could have and should have been put to work on the module's other thread. If they couldnt figure out a way to do that without consuming too many transistors, then the design is worthless.
Having more available is an equally valid decision to keeping a smaller number better utilized, and each will, all else being equal, shine in different workloads, just as the PhII X6 was highly competitive until SB, and still makes sense for some users, today, v. a slower i5 (v. a 2500K, if you have the budget, is another story, of course); and the same for Core 2 and Nehalem Xeons v. Magny Cours.

As a general rule, if the terms real-time or Quality of Service (QoS) come up, more unshared execution units will be the way to go (sharing = contention = unpredictability, and yes, BD's shared L1 and L2 caches are a glaring exception to this line of thought); while in general, the need for ever-increasing operations/time will get better served by sharing execution units between threads, especially if some of those threads shared significant amounts of data (competition = efficiency = high throughput). To go with more execution units (BD "core") makes sense, with AMD's main competitor preferring more threads than execution units, and AMD's cache performance constantly being a problem.

The real problem is that each BD core has come up, "just on the wrong side of, 'meh'," rather providing performance at least significantly superior to their previous CPUs across the board, if not being able to hang with Nehalem. Their competition is as much Stars as it is SB. It seems to scale, but that's only great news if you aren't paying the power bill (overclocked gaming box in a college dorm? ), or if you expect them to rapidly improve performance per Watt by 20% or more.

After that, they could use a GCN core to replace the FPU altogether. But after seeing this BD I have so little confidence that they will do anything remotely intelligent.
Not for some time. The overhead is just too much, and I would suspect that as they merge, we'll be more likely to see vector units shared between them, than for the GPU core to replace the CPU core's FPU (the primary advantages each has over the other is in how it treats data access, not the potential FLOPS each can do).
 
Last edited:

Dribble

Platinum Member
Aug 9, 2005
2,076
611
136
It's a bad cpu - there's no scenario where you'd really choose it over the intel chips, and due to its huge size and power requirements it'll be too expensive to use as even a budget chip (costs too much too make, requires larger more expensive psu/motherboard power circuits/cooling).

That's not to say everything about it is bad, perhaps in a few years they'll be able to tweak it into something more competitive, but that won't be bulldozer.
 
Feb 19, 2009
10,457
10
76
Terrible perf/watt makes it a bad CPU.

Also, for a CPU that is much larger and more transistors than its competitor yet failing to compete also make it a bad CPU.

AMD should have put together 6-8 Llano cores without the iGPU and they would have delivered a better product.
 

nanaki333

Diamond Member
Sep 14, 2002
3,772
13
81
at least it sits where it should for distributed computing (all 24 hours mine lasted before dying).
 

Genx87

Lifer
Apr 8, 2002
41,095
513
126
The new line of thinking is this is an HPC server chip? I think somebody should make a demotivational poster with an AMD bulldozer chip that says "AMD's Bulldozer, HPC server chip, released on the desktop".

This chips power consumption doesnt make it a very attractive part in the market people are claiming it is designed.
 

exar333

Diamond Member
Feb 7, 2004
8,518
8
91
The new line of thinking is this is an HPC server chip? I think somebody should make a demotivational poster with an AMD bulldozer chip that says "AMD's Bulldozer, HPC server chip, released on the desktop".

This chips power consumption doesnt make it a very attractive part in the market people are claiming it is designed.

I agree 100%.


BD could be a good CPU with better IPC and mich better perf/watt. The problem is these are not easy issues to fix, unless there are 'broken' aspects to the current design that are fixed. Let's hope this is the issue.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
The new line of thinking is this is an HPC server chip? I think somebody should make a demotivational poster with an AMD bulldozer chip that says "AMD's Bulldozer, HPC server chip, released on the desktop".

This chips power consumption doesnt make it a very attractive part in the market people are claiming it is designed.

I'm sure it will effect clocks for server BD but the 16 core chips won't be getting close to the ~4GHz range where power draw really starts to spike.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |