Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

Page 99 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

drizek

Golden Member
Jul 7, 2005
1,410
0
71
I think when someone like JFAMD responds directly to the question of "when is it coming out", vernacular, semantics, etc. don't matter. I don't have time to look for the quotes in this 99-page thread right now, but I don't think you can chalk this up to miscommunication. We all know exactly what we mean by "launch".

It isn't about what Newegg does with the chips, it's about when AMD gives the chips to newegg, and it only takes a couple of days to send them from Malaysia to California. I think they can afford to Air Mail $300 CPUs.

And if AMD literally has no idea what systems Dell and HP are planning to ship with Bulldozer, they're screwed. And if there is any possibility that Newegg or Frys or Microcenter would hesitate for even a second to submit the largest possible order they can get their hands on, AMD is screwed.
 

Terzo

Platinum Member
Dec 13, 2005
2,589
27
91
Well, isn't AMD going to be "announcing" Llano (Computex) and Bulldozer (E3) soon? Unless they're going to repeat stuff we already know, I imagine that we'll hear some info regarding price, release date, and maybe some demos. Either way, we'll know *something* in one and two weeks, respectively.

I really hope to see Bulldozer "released" in June. For whatever reason, the waiting and lack of information is becoming agonizing.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
If i remember correctly, AMD have said BD will launch in H1 2011 and that is from first of January until the end of June 2011.

I haven't seen any AMD paper or slide to support a change in launch time, yet.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Just for the record: another free interpretation of 5% available information:



@CMT/SMT:
The BD architecture is a mix of multiple principles: CMT (2 single threaded int cores and vertically threaded front end) and SMT (FPU and also the L2):


Intel's Hyperthreading also uses this so called vertical multithreading in the front end. Due to higher latency tolerance in this domain it is ok to switch threads, and it also simplifies the implementation.

Even Andy Glew had problems placing the FPU into his MCMT architecture (CMT) :

http://citavia.blog.de/2010/04/28/andy-glew-s-multi-star-architecture-and-the-alus-again-8474038/

Further I wonder why some assume that with enough execution resources the frontend (even a narrow one) could automatically be saturated when running only 1 thread? If ILP is not high enough in different phases it even doesn't need cache misses to underutilize the front end.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
I was thinking about the AGLU's, notably, why inc and not add? Then it hit me -- inc and dec both need only a single register read port.
So, if the design is:

MUL pipe = 2 read ports, 1 write port
DIV pipe = 2 read ports, 1 write port
AGLU 2 = 1 read port, 1 write port, access to IP and stack engine
AGLU 2 = 1 read port, 1 write port, access to IP and stack engine
(separate unit for data read on cache write, write ports on AGLU's double as the units used for data write on cache read)

total = 7 read ports, 4 write ports, the read port for cache write is not latency-sensitive and thus can be outside the forwarding network.

Then suddenly a lot of the things I've read makes sense. The design itself is sensible because it optimizes for clock speed while keeping the most common operations as one-cycle -- most memory references are IP-relative (static data), based off stack pointer or base pointer (the c stack) or register + immediate (OOP object references), with only the odd one out needing two registers. It would explain why inc and dec and not add, and why complex LEA requires the ALU. The write ports on the AGLUs will block any one-cycle operations that need them when data arrives, but that's exactly what happens with mul add add add on the ALU pipes.

And keeping the port counts on the register file low makes it fly. Now, you can even make it simpler still by replicating it, leaving 3 read ports and 4 write ports per partition (plus the single read port that can be naive). Perhaps divide the forwarding network in two -- you can only zero-cycle forward inside the partition. This would make the scheduler a bit more complex, while making the forwarding network a lot simpler and cheaper, and keeping performance good in most cases, with one cycle of additional latency inserted whenever you mul after a div (or an add at the wrong port), or vice versa, while still keeping most operations at one cycle latency.

You are close. It's just a bit different. The IRF has 8 read and 4 write ports. The AGLUs also have 2 inputs. To reduce wire delays the IRF is duplicated. And the reason for adding INC to the AGLUs is also a different one:
ISSCC paper on integer scheduler said:
Duplicating the INC removes the vertical wire delay from the worst-case bypass path, which is from any INC to the far ALU.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
That makes no sense. The max turbo on the 125W part would be 4.4GHz, whereas the 95W 6-core would have a max speed of 4.6GHz or higher.
That's what I wanted to show: There are several ways to match numbers to the left pixels of what was printed there. The "1.6" could also be a "1.0". But this is just speculation and the slide itself seems to be fake, since Asus denied having made it.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,422
1,759
136
You are close. It's just a bit different. The IRF has 8 read and 4 write ports. The AGLUs also have 2 inputs. To reduce wire delays the IRF is duplicated. And the reason for adding INC to the AGLUs is also a different one:

Do you know what port is used for cache write data?
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Do you know what port is used for cache write data?
It looks like EX<->LSU transfers use the EX/AGLU units' ports, since EX units have a store data bus to LSU and AGLUs have an address-generation bus to and a load data bus from LSU. Looking at this the AGLUs should be able to execute load-type MOVs w/o resorting to EX units at all.
 
Last edited:
Mar 11, 2004
23,181
5,645
146
That makes no sense. The max turbo on the 125W part would be 4.4GHz, whereas the 95W 6-core would have a max speed of 4.6GHz or higher.

Actually, based on what AMD has said about the new Turbo, it does make sense, as it is meant to scale up based on effective TDP, and an 8 core would have less head room to clock than a 6 core when they're in the same TDP.

I'm probably wrong about it being based on TDP, but that was the simplest way of explaining the factors. In short, the extra cores lowers the headroom they have for clocking up. I'm not saying that slide is true, but in the vein of how the new Turbo is supposed to function, it could make sense.
 

Terzo

Platinum Member
Dec 13, 2005
2,589
27
91
Bright Side of News apparently contacted AMD to try and clarify the rumors of a delay.

In order to get to the bottom of this story, we have contacted Mr. Drew Prairie and Mr. Steve Howard, AMD's PR representatives close to heart of the matter. We inquired about the statement Rick Bergman made and received the following answer:

"Our public roadmap has not changed."
I'm hoping this means they're on schedule with Bulldozer and Llano. I just don't know when there schedule plans for consumer availability. Hopefully soon.
 

drizek

Golden Member
Jul 7, 2005
1,410
0
71
Actually, based on what AMD has said about the new Turbo, it does make sense, as it is meant to scale up based on effective TDP, and an 8 core would have less head room to clock than a 6 core when they're in the same TDP.

I'm probably wrong about it being based on TDP, but that was the simplest way of explaining the factors. In short, the extra cores lowers the headroom they have for clocking up. I'm not saying that slide is true, but in the vein of how the new Turbo is supposed to function, it could make sense.

No, max turbo refers to the speed of just one core, or one module. So the 125W 8-core should have a much higher clock speed when operating only 2 cores versus a 95W 6-core also operating only 2 cores.

Of course, if they are referring to max turbo on all cores, and can get over 4GHz in normal operation on all 8 cores without overclocking... amazing.

And I think JFAMD said that it is based on TDP only, and not temperature like intel. That means it should be more consistent.
 
Mar 11, 2004
23,181
5,645
146
No, max turbo refers to the speed of just one core, or one module. So the 125W 8-core should have a much higher clock speed when operating only 2 cores versus a 95W 6-core also operating only 2 cores.

Of course, if they are referring to max turbo on all cores, and can get over 4GHz in normal operation on all 8 cores without overclocking... amazing.

And I think JFAMD said that it is based on TDP only, and not temperature like intel. That means it should be more consistent.

Ok, thanks for the clarification. My comparison was between the 8 and 6 core both at 95W TDP, which is why I was saying that the 8 core should have less headroom as it has more cores and is under the same TDP.
 

drizek

Golden Member
Jul 7, 2005
1,410
0
71
But even there, if they are only turboing one module and the other 3 gated, they should be able to max out at the same clock speed.
 

OCGuy

Lifer
Jul 12, 2000
27,227
36
91

Um...that actually raises more questions than it answers. Read it.

"So what is AMD planning here? A big paper launch to infuriate forums even more with customers eager to get a new high-end chip of the processor underdog? Standard PR tactics to make people go crazy over rumors similar to how Apple does it with their upcoming products? At this point we have no hard info to state "AMD delayed the much-awaited desktop version of Bulldozer." The thing is though, that there is plenty of indication, that at least some things are going wrong. Even though AMD representatives said that "Our public roadmap has not changed," the fact is that AMD's public roadmap has Bulldozer launch pegged for 2011. Then again, if the Bulldozer is really delayed, strong words from AMD executives to financial analysts may not be worth much then - after all, CEO and SVP are saying the platform is on track"
 
Last edited:

podspi

Golden Member
Jan 11, 2011
1,982
102
106
As long as Llano and server are on track for Q3, as a stockholder I am happy.

As an enthusiast who really kinda wants to upgrade to Bulldozer just because I've been waiting for it for literally years, "blagahahgalgagh"
 

dma0991

Platinum Member
Mar 17, 2011
2,723
1
0
The thing is though, that there is plenty of indication, that at least some things are going wrong. Even though AMD representatives said that "Our public roadmap has not changed," the fact is that AMD's public roadmap has Bulldozer launch pegged for 2011.

That could be the writer's perception on the situation, a little bit of pessimism along with the good news. Any more delays to Bulldozer and it might ruin their chances of competing against Intel. I am more interested in this part of the article which suggests that there will be more problems for AMD if they were to delay their launch.

In a "delay to third quarter 2011" scenario, AMD would face an aggravated competitive situation, with Intel having released speed bump versions of their high-end CPUs and the upcoming enthusiast platform featuring current generation sexa- and possibly octa-code CPUs based on socket 2011 around the corner. Now you might understand why motherboard manufacturers might not be very happy about a possible delay. It would make AMD motherboard sales go down the drain.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Doing the lazy quote again since I sadly don't have time to reply to each individual point at current. (em. added) Well, except for the one part that I did directly quote, which I can't help but laugh at - so true!

Suffice it to say, excellent responses over all. While I still don't agree with some points, that's more a matter of looking at them from a different perspective than actual disagreement.
/me fondly looks at a framed frilly piece of paper, on the wall: Quote Sniper of the Year 2005 .

Further I wonder why some assume that with enough execution resources the frontend (even a narrow one) could automatically be saturated when running only 1 thread? If ILP is not high enough in different phases it even doesn't need cache misses to underutilize the front end.
It shouldn't, but why set it up not to be able to handle peaks? That could allow for compute loops to end up slightly limited by the front end. The case for a narrower shared front end would be that those are going to be rare enough, even in code that could potentially be high ILP, that they will practically never occur at the same time, so the shared front end can realistically service ILP bursts from both threads. For those occasions when it could happen, a wee speed increase, partly enabled by shaving off power consumption with the shared narrower front end, should suffice to make up for it, and will also improve the performance of the other 99.9999% of code being run.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
It shouldn't, but why set it up not to be able to handle peaks? That could allow for compute loops to end up slightly limited by the front end. The case for a narrower shared front end would be that those are going to be rare enough, even in code that could potentially be high ILP, that they will practically never occur at the same time, so the shared front end can realistically service ILP bursts from both threads. For those occasions when it could happen, a wee speed increase, partly enabled by shaving off power consumption with the shared narrower front end, should suffice to make up for it, and will also improve the performance of the other 99.9999% of code being run.
Why is it important to handle peaks at the cost of a less power efficient wider implementation? It's a trade-off again: Invest X area, Y cycle time (clock headroom) and Z power to achieve T additional average performance/Watt.

Further we still didn't find out about the "accelerated mode" (GCC dispatch optimization discussion) or "fast and slow mode" (SOM) in regard of the front end (both have been mentioned while talking about instruction fetch windows). And then there is a patented redirect recovery cache (small cache of decoded dispatch packets), invented by 2 people working on the BD front end. Maybe we'll see it in BDv2.
 

JFAMD

Senior member
May 16, 2009
565
0
0
No, max turbo refers to the speed of just one core, or one module. So the 125W 8-core should have a much higher clock speed when operating only 2 cores versus a 95W 6-core also operating only 2 cores.

Of course, if they are referring to max turbo on all cores, and can get over 4GHz in normal operation on all 8 cores without overclocking... amazing.

And I think JFAMD said that it is based on TDP only, and not temperature like intel. That means it should be more consistent.

Yes, turbo is tied to TDP only so it will be much more consistent. Core count an clock speed are highly correlated. I do not have the frequencies in front of me to know how that carries through to boost frequencies. Every processor will have 3 speeds: base, all core boost and max turbo boost.
 
Status
Not open for further replies.
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |