Can AMD "rescue" the Bulldozer?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Idontcare

Elite Member
Oct 10, 1999
21,118
59
91
IMO the shared module design is the biggest drawback, so the best real fix would be a pretty dramatic change. Duplicate the missing portions on the 'cores' so that each core is complete in and of itself as far as FPU/INT.

The problem with the shared core/module design is that it is something you should ONLY be resorting to IF your problem is that you already have to much single-threaded IPC and not enough cores.

Taking already weak cores and making them even weaker by hobbling them and sharing components is a doomed scenario. Time to go cruise the Titanic through the N.Atlantic in January :|
 

Xpage

Senior member
Jun 22, 2005
459
15
81
www.riseofkingdoms.com
I think they really need to work on the L3 latency, not a designer but it is getting killed by intel in L2 and L3 latency. Go for a smaller and faster L3, save some die space, make some more profit
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
I do think it's about time AMD was able to get their cache at core clock speed. Company fell flat after Hypertransport, HT development could have paralleled well with some serious cache and uncore improvements but it seems they've only taken several baby steps forward. Even at the slow pace they've had I think it's the main thing that may make a niche for BD Interlagos in server, lots of memory channels combined with a good interprocess connection.
 

Rifter

Lifer
Oct 9, 1999
11,522
751
126
Even if they do respin it and fix the performance and adjust their out to lunch pricing on them it still isnt going to be enough for some people to go back to AMD.

Remember its not just the performance that was a flop but the marketing as well, personally i find the constant lies that IPC will increase(even long after AMD knew that was BS), will work on current AM3 boards(said in 2010, also BS), and their brutal attacks on people posting leaked benchmarks saying they are BS and spreading FUD and are in no way real(when they were true 90% of the time). Perhaps this is why they fired most of the marketing department, i dont know, but either way recovering from that will take as much work as fixing the CPU. I for one wont consider an AMD CPU again till their marketing department has a clean track record for 2-3 years straight with no BS.
 

Zor Prime

Golden Member
Nov 7, 1999
1,023
588
136
Bulldozer in fact does work on some AM3 boards. I'm not sure if it's due to board design, or something as simple as board manufacturers releasing a BIOS that supports it. But ... some AM3 boards can run Bulldozer processors.
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
I think they really need to work on the L3 latency, not a designer but it is getting killed by intel in L2 and L3 latency. Go for a smaller and faster L3, save some die space, make some more profit

well, there are some rumours that bulldozer L2, L3 and floating point are all working at half clock.
and the trinity "ipc boost" is pretty much the L2 and floating point are now at full clock.
 

LoneNinja

Senior member
Jan 5, 2009
825
0
0
Bulldozer in fact does work on some AM3 boards. I'm not sure if it's due to board design, or something as simple as board manufacturers releasing a BIOS that supports it. But ... some AM3 boards can run Bulldozer processors.

It was discussed prior to Bulldozers release that some AM3 listed motherboards actually had AM3+ sockets in them. There was a change made to the socket, but the motherboard and chipset all remained the same so it was still released as an AM3 motherboard. Basically they kept making the same motherboard, but slapped an updated socket into it without officially changing any of the motherboards specs. Bulldozer technically doesn't work in AM3, I think the easiest distinction was color difference of the socket.

Can anyone else confirm this? Or am I wrong?
 

Rezist

Senior member
Jun 20, 2009
726
0
71
Well they used a new 32nm fabrication process, and a new architecture and still fail to beat there older processors. There basically still in 2008.
 

denev2004

Member
Dec 3, 2011
105
1
0
well, there are some rumours that bulldozer L2, L3 and floating point are all working at half clock.
and the trinity "ipc boost" is pretty much the L2 and floating point are now at full clock.
If you have seen the programming guide for AMD h15(Bulldozer),the IMUL is indeed working at half clock....
 

videoclone

Golden Member
Jun 5, 2003
1,465
0
0
Well they used a new 32nm fabrication process, and a new architecture and still fail to beat there older processors. There basically still in 2008.

This sums up Bulldozer pretty well.


They moved to 32nm, they had a whole new design and still failed to improve over the old cpu's let alone Intels

They couldnt even beat themselves LOL hahahah AMD really did mess up big time with this
 

nyker96

Diamond Member
Apr 19, 2005
5,630
2
81
BD probably need to boost performance by 30-40% to have something that's remotely threatening to the SB much less the new Ivy that's coming up. I doubt that can be pulled off any time soon, even intel only manages 15-20% boost per new chip per year. A BD derivative that is 30-40% better will probably be at least a year from now but from amd's own presentation slides they only expect 15-20% boost per year. so it will take them about 2 more years to catch up w/ current crop of Intel chips, but by then Intel already put out haswell and its successor. With AMD's track record, I just hope they don't keep regressing in performance/efficiency is already doing pretty well.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
BD probably need to boost performance by 30-40% to have something that's remotely threatening to the SB much less the new Ivy that's coming up. I doubt that can be pulled off any time soon, even intel only manages 15-20% boost per new chip per year. A BD derivative that is 30-40% better will probably be at least a year from now but from amd's own presentation slides they only expect 15-20% boost per year. so it will take them about 2 more years to catch up w/ current crop of Intel chips, but by then Intel already put out haswell and its successor. With AMD's track record, I just hope they don't keep regressing in performance/efficiency is already doing pretty well.

My guess is that the current Bulldozer cores are pretty small. The decode isn't very wide at 4 issue for two integer cores. That comes out to 2-wide per core.

The old Phenom II/Lisbon CPUs were 3-wide and very likely had larger integer cores.

So my guess is that there is plenty of room to upgrade Bulldozer assuming something else in the design isn't bottle necking single thread performance?

On the other hand, maybe AMD won't upgrade Bulldozer to wider decode and larger cores? Maybe they keep the design "small" overall and perfect it for low power mobile APU and many core server SKUs? (This makes a degree of sense to me considering AMD's process tech disadvantage)

Perhaps (for future higher single threaded performance designs) AMD will go back to Phenom II/Lisbon (or Llano) and build up from that 3-wide design? Maybe add SMT, beef up the core and add a 4-wide decode like the Intel CPUs?
 
Last edited:

Riek

Senior member
Dec 16, 2008
409
14
76
BD probably need to boost performance by 30-40% to have something that's remotely threatening to the SB much less the new Ivy that's coming up. I doubt that can be pulled off any time soon, even intel only manages 15-20% boost per new chip per year. A BD derivative that is 30-40% better will probably be at least a year from now but from amd's own presentation slides they only expect 15-20% boost per year. so it will take them about 2 more years to catch up w/ current crop of Intel chips, but by then Intel already put out haswell and its successor. With AMD's track record, I just hope they don't keep regressing in performance/efficiency is already doing pretty well.

If BD gets a 10% boost over the whole current lineup they would be very competitive with intel for all the models. 30-40% would bring them in the range of current released SB-E.

Wheter you like BD or not, if it would have launched at 3.9-4.5GHz as topmodel instead of 3.6-4.2 they would have been very competitive with the 2600 which is the current topmodel of intel main stream.

Wether AMD can 'fix' bd is a whole other thing. They would at atleast need to be able to reach 3.9-4.2GHz (process fix) and an ipc gain (especially for the worst case scenarios) to compete with IB in performance. Gaining 20% performance seems doable when you have manufacturing issues you might solve (clock limitations) and can do some refinement in BD. However I don't think they can become competitive in performance/W since they would need to increase performance while dropping 50W.
 

Red Hawk

Diamond Member
Jan 1, 2011
3,266
169
106
I read that Windows 8 is supposed to make better use of the Bulldozer architecture than 7 or XP do. That might help.
 

denev2004

Member
Dec 3, 2011
105
1
0
My guess is that the current Bulldozer cores are pretty small. The decode isn't very wide at 4 issue for two integer cores. That comes out to 2-wide per core.

The old Phenom II/Lisbon CPUs were 3-wide and very likely had larger integer cores.
That really sounds like what this website's editor is believed...
Well, I once thought this is not that bad. "Core" can handle 4 Mops, But the 4 Decode Unit can issue 4-8 Mops.
Does it sounds like Intel's design of 1Complex + 3Easy is more useful
Anyway, I see Real world tech wrote that Bulldozer can only let 4 Mops go through the Dispatch, which is an information I can't find on the software optimization guide of Bulldozer.
If that's true, this will be an essential problem.
It sounds like AMD design the Decode Unit wrongly:hmm:
 

Ancalagon44

Diamond Member
Feb 17, 2010
3,274
202
106
and now the Athlon 2 and Phenom 2 line is no longer being made by AMD, so they are only good for mobile and GPU's.

http://www.tomshardware.com/news/amd-cpu-apu-athlon-phenom-Llano-Bulldozer,14173.html

So glad I got my 1055T last Friday!

Surely AMD would make a better profit on Thuban because its cheaper to make anyway? I mean, due to its smaller die size even at a larger process node.

AMD needs to:
1. Reduce L2 cache to 256k per core and seriously reduce latency of both L2 and L3.
2. Bump decode units back up to 3 per core.
3. Bump execution units up to 3 ALU/AGU per core.
4. Double shared floating point execution units.

That would just about fix it. they can keep the longer pipeline if GF can sort out its issues and boost clockspeeds such that the top SKU can turbo up to at least 4.5GHz without needing 1.21 jigawatts.
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
If you have seen the programming guide for AMD h15(Bulldozer),the IMUL is indeed working at half clock....

thanks, i didn't read it.

I read that Windows 8 is supposed to make better use of the Bulldozer architecture than 7 or XP do. That might help.

The scheduler will try to use both cores of the module, this will put the other modules to be gated off, and this will allow the module in use to turbo more.
It will help more in the power savings features, the performance gain won't be high.
 

zlejedi

Senior member
Mar 23, 2009
303
0
0
So intel's AMD killer is selling at a measly 3:1 ratio. There's nothing there to be impressed about for intel considering every review on the net has proclaimed AMD's death due to Bulldozer's 'performance'. Gotta love the propaganda, keep up the good work it seems to be working in AMD's favor!

Also just as a reminder, intel's virtualization is broken in sb.

It's seems we underestimated masses of idiots being lured to more cores more Ghz marketing.
 

dawp

Lifer
Jul 2, 2005
11,345
2,705
136
It was discussed prior to Bulldozers release that some AM3 listed motherboards actually had AM3+ sockets in them. There was a change made to the socket, but the motherboard and chipset all remained the same so it was still released as an AM3 motherboard. Basically they kept making the same motherboard, but slapped an updated socket into it without officially changing any of the motherboards specs. Bulldozer technically doesn't work in AM3, I think the easiest distinction was color difference of the socket.

Can anyone else confirm this? Or am I wrong?

I have 2 ASUS M4A89GTD PRO/USB3 that are AM3 socket that I bought last February. There is a beta BIOS (3027) for these boards that adds support for BD. the socket is definitely AM3 and not AM3+. I'm not going to upgrade (downgrade?) to BD as I already have 1090t in both and see no need to change atm. if they improve BD within the next year, then maybe but I don't think that will happen.

My understanding is that an AM3+ chip will physically fit in a AM3 socket, it's up to the vendor to add support for it as AMD will not officially support that option.
 
Last edited:

Riek

Senior member
Dec 16, 2008
409
14
76
So glad I got my 1055T last Friday!

Surely AMD would make a better profit on Thuban because its cheaper to make anyway? I mean, due to its smaller die size even at a larger process node.

AMD needs to:
1. Reduce L2 cache to 256k per core and seriously reduce latency of both L2 and L3.
2. Bump decode units back up to 3 per core.
3. Bump execution units up to 3 ALU/AGU per core.
4. Double shared floating point execution units.

That would just about fix it. they can keep the longer pipeline if GF can sort out its issues and boost clockspeeds such that the top SKU can turbo up to at least 4.5GHz without needing 1.21 jigawatts.

So basically you want them to make 2 full cores again... waste alot of die space in the process and call it da day.

Reducing L2 to 256 is bad. 1MB sounds better. Reduce latency of this cache and implement a L0 cache like structure or other means to bypass branch hits. (think this is already in the pipeline for steamroller).
Decoding width is fine, in combination with some enhancements like SB should do the trick.
Bumping exeuction resources might be an option, although they would be far better of by making their AGU do more basic calculations.
double the fpu is pure nonsense, affecting execution times would have more effect...
 

sm625

Diamond Member
May 6, 2011
8,172
137
106
Realistically, in terms of what is somewhat possible, AMD could do the following:

Reduce cache latencies across all levels by about 25%.
Improve the prefetcher by 10%.
Add one more INT execution unit per core (2 per module).
Improve fpu throughput by 20%.

If they did ALL those things they might match the IPC of sb. For a company like AMD that would take two+ years. And in two years intel will have something with 20% more ipc than sb. With intel milking it, it will be more like 10%.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |