Bulldozer has better IPC when run 4C/4M?

Kevmanw430

Senior member
Mar 11, 2011
279
0
76
http://www.xtremesystems.org/forums/showthread.php?275873-AMD-FX-quot-Bulldozer-quot-Review-%284%29-%21exclusive%21-Excuse-for-1-Threaded-Perf.

This was posted before, but the thread got locked because of a misleading title. So, the guy in the link I posted disabled cores 1, 3, 5 and 7 on his FX-8150, and it performed about 20% better in single threaded loads. (Need more benches to confirm, anyone here willing to, if you have an FX?) It makes sense that the cores would perform better when not sharing resources (cache and FPU'), so this is not unexpected. This means, if we can get Win7 or certain programs to schedule in the order 1,3,5,7,2,4,6,8 for the BD cores, it will perform much better in tasks under 4 threads. Think about it. 20% better IPC than PHII and ~4.5GHz in OC, and for 4 threads and under, you'll be a hair behind SB. If you have a program that leverages more than 4 threads, chances are it can also leverage 8, so it won't be too far behind in MT either. Thoughts?
 

Idontcare

Elite Member
Oct 10, 1999
21,118
59
91
I get better performance on my 2600K when I disable HT as well. Unless the app is heavily threaded, then I don't.

CMT. SMT. Its the same story as soon as you go away from a CMP architecture.
 

Kevmanw430

Senior member
Mar 11, 2011
279
0
76
Which, again, makes perfect sense. But, the fact is, if we can get Windows to schedule threads in the right order, BD might not be such a stinker... Who knows.
 
Dec 30, 2004
12,554
2
76
http://www.xtremesystems.org/forums/showthread.php?275873-AMD-FX-quot-Bulldozer-quot-Review-%284%29-%21exclusive%21-Excuse-for-1-Threaded-Perf.

This was posted before, but the thread got locked because of a misleading title. So, the guy in the link I posted disabled cores 1, 3, 5 and 7 on his FX-8150, and it performed about 20% better in single threaded loads. (Need more benches to confirm, anyone here willing to, if you have an FX?) It makes sense that the cores would perform better when not sharing resources (cache and FPU'), so this is not unexpected. This means, if we can get Win7 or certain programs to schedule in the order 1,3,5,7,2,4,6,8 for the BD cores, it will perform much better in tasks under 4 threads. Think about it. 20% better IPC than PHII and ~4.5GHz in OC, and for 4 threads and under, you'll be a hair behind SB. If you have a program that leverages more than 4 threads, chances are it can also leverage 8, so it won't be too far behind in MT either. Thoughts?

yes that would be more than acceptable and would be what we wanted from bulldozer.
Please keep us posted with more results! this looks very intersting.

edit: now wait a minute that's showing us benchmarks where it's faster to use the chip in full - enabled mode not half enabled.
 
Last edited:

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
I get better performance on my 2600K when I disable HT as well. Unless the app is heavily threaded, then I don't.

CMT. SMT. Its the same story as soon as you go away from a CMP architecture.

yeah...but, 20%?
 

masteryoda34

Golden Member
Dec 17, 2007
1,399
3
81
The shared resources (FPU) must have a method of synchronizing access by each core. Some synchronization overhead is bound to occur.

With one core of each module disabled, the cache of each core is effectively doubled.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
yeah...but, 20%?

AMD themselves stated a roughly 20% penalty for going CMT over 2 full cores. Problem is that the IPC of one core still seems only Phenom II level IPC. Now odds are IPC will show improvements for tuned software. But why buy the first gen of a new uArch that needs software support. Makes much more sense to see results of second generation and buy if it looks good for your purposes. That's in addition to initial GF 32nm having leakage issues, i.e. these first Llanos and now FX have heat issues starting in the mid 3GHz range.
 

Kevmanw430

Senior member
Mar 11, 2011
279
0
76
See, this is the problem. BD just came out, so there is no software optimization, GF is having issues with 32nm, and core scheduling should be 1,3,5,7,2,4,6,8, not 1-8. If all of these things were resolved, BD might actually have a chance..
 

StrangerGuy

Diamond Member
May 9, 2004
8,443
124
106
So, disable 4 cores so the rest of the 4 cores gets their own front-end?

And that defeats the purpose of BD being a 8 core amirite?
 

Ferzerp

Diamond Member
Oct 12, 1999
6,438
107
106
When it was people saying that this wasn't really 8 cores but was more like 4, they were scoffed at. Now that there is proof, it is interpreted as a good thing?
 

Blue Shift

Senior member
Feb 13, 2010
272
0
76
When it was people saying that this wasn't really 8 cores but was more like 4, they were scoffed at. Now that there is proof, it is interpreted as a good thing?

Just stop. We all know how the modules are broken down, and that it depends on how you define cores. The FX-8150 has 8 integer cores. Move on.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
59
91
So the cores themselves actually catch back up to Phenom II IPC?

Not the cores, the modules. Disable one Integer core, convert the behemoth 30mm^2 module into a single functioning core (llano core is ~15mm^2 w/cache) and you get IPC parity.

It is rather crazy. Two llano cores w/cache takes almost exactly the same footprint at one bulldozer module.

So instead of having an 8-core llano with no CMT downsides to single-thread throughput, we have an 8-core BD with all the downsides and not of the upsides it would seem.

Phenom has similar issues wrt the athlonX2 at the time though, so I am optimistic that trinity/piledriver will see quite a boost in IPC.
 

beginner99

Diamond Member
Jun 2, 2009
5,223
1,598
136
See, this is the problem. BD just came out, so there is no software optimization, GF is having issues with 32nm, and core scheduling should be 1,3,5,7,2,4,6,8, not 1-8. If all of these things were resolved, BD might actually have a chance..

Maybe but this is still AMDs FAIL. Read the review thread where I put links about scheduling. Some people actually saw that the OS scheduler must be aware of the BD architecture for best performance (it knows about HT). So AMD was incompetent to commit MS to make a patch or they thought it was unnecessary. Given the fact, that they have been working on BD for 4-5 Years claiming they did not have enough time would be a bad joke.
 

crazylocha

Member
Jun 21, 2010
45
0
66
When it was people saying that this wasn't really 8 cores but was more like 4, they were scoffed at. Now that there is proof, it is interpreted as a good thing?


Hence the trashing of BD and JF. Forwarned was the different style of architecture, different meaning to phrase "core". Now everybody starting to figure it out piece at a time.

Timely flashback: dual cores? Why?!? Just make the single core chip oc and you get more calc's per cycle than trying to split tasks up between two cores. Do you realize how much that will bloat the software side having to figure out scheduling, let alone how long it will take just to optimize one? Who the heck wants to program that heavily.

That was just part of an old (very old) conversation under the old copper wires. A little vision goes a looooong way. What conversations will be had in 6 months-2 years from now after a few optimizations?

What were the trashing conversations like when DX11 and Dirt etc first came out?

And here i thought changing the Cobol subs was bad when Quantel came out with a 12 platter interchangable drive (thats 3k per platter for the diaper aged groupies). Bad enough we had to rewrite our own compilers everytime a new chipset came out (i.e. 8088's, Via's, even Timex Sinclairs). Try rewriting for the original Tandy "Color Comp". Those sissies playing with Trash 80's with their cassette players, wait, you mean it RECORDS too??? There goes the neighborhood!! Lol.

35+ years in this mess, and I hate to even open my laptops any more (yes, this is written on my dual core touch screen phone, how Jetsony of me). All this time has taught me patience if nothing else. Maybe some impatient nutter will take some of their precious time and write an optimized Linux version and others will claim it for their own brilliancies in their unique style. Then maybe I will play with it a while, then irratate my wife with yet another hunk of silicone sitting around my other piles of dusty hardware.

Always enjoy the inspiration of forward looking engineers. They push us all to new inspirations. Then we all skew history towards those that we admire when they succeed. What are you going to think when your cell/bank card/car+house key/medical records/ porn player/game machine/cell phone/whateverelsewecomeupwithnext falls in the toilet? Are you going to grab it? Hope that BD/Piledriver dries out well? Maybe it will be SB?!? instead. Who knows. Tomorrow is another day full of new wonders. Maybe you will be the one who tries some new avenue.

Edison and Tesla were both called nuts. AC or DC, both were insane. Gas lights were sooo much safer and easier!!

Ummmm yeah. That worked out well.
 

BD231

Lifer
Feb 26, 2001
10,568
138
106
When it was people saying that this wasn't really 8 cores but was more like 4, they were scoffed at. Now that there is proof, it is interpreted as a good thing?

Thank you. People were saying the FPU's were double the tranistor size of a typical phenom II fpu, which is is entirely false by measure. By slapping two castrated FPU's together they were able to dub it a 256bit part, keyword, TOGETHER.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
Measuring IPC per module is wrong, simple because Bulldozer module has 80% the performance of two cores.

It is like measuring Core i7 2600K with one core + HT instead of measuring two cores without HT.

IPC must be measured in single core, then we measure per module (HT in Intel CPUs) to measure the performance gains of CMT/HT OVER single core.

What OPs results shows is the performance gains from the CMT architecture and not that IPC increases with CMT disabled.

People saying that IPC is lower than Deneb/Thuban are wrong if they measure the 4M/8C Bulldozer to 6C Thuban performance. Measure Single core in both and then draw your IPC conclusion
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
I've been explaining it AtenRa, but it seems like the "8 cores more like 4 cores" crowd just won't simmer down. It is 8 cores it's just each core is kind of lame and then there is a 20% module hit to each pairs performance.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Thank you. People were saying the FPU's were double the tranistor size of a typical phenom II fpu, which is is entirely false by measure. By slapping two castrated FPU's together they were able to dub it a 256bit part, keyword, TOGETHER.
What you've seen are colored powerpoint boxes. The real silicon looks much different, starting with the 128b FMACs being halved each and the lower half of FMAC#1 sitting next to the lower half of FMAC#2 at one end of the FPU and the other halves at the other end...
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
I've been explaining it AtenRa, but it seems like the "8 cores more like 4 cores" crowd just won't simmer down. It is 8 cores it's just each core is kind of lame and then there is a 20% module hit to each pairs performance.

And then there is inefficiently decodable code blocking important shared front end resources.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Yeah some strange scenarios where the cores can't be fed properly going on in certain tests.
 

zlejedi

Senior member
Mar 23, 2009
303
0
0
So have anyone actually run strictly 4 threaded code on 8 core BD with 4M/8C and 4M/4C scenario ?

Because 4M/4C being faster than 2M/4C is discovery worthy of captain obvious award.

If 4M/4C would be faster than 4M/8C in any scenario then it would prove that indeed there's some kind of problem with scheduling.
 

dawp

Lifer
Jul 2, 2005
11,345
2,705
136
So what are the chances of AMD releasing an optimizer like they did with the dual core when they first came out?
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |