AMD?s 64-bit gamble

Dug · Sep 30, 2002

I think it will be very interesting to see how the Hammer handles a 64bit version of Windows- or the other way around.

I hope the clock speed is up there for the geek enthusiast, otherwise it won't mean much unless you are into servers.

lambasa · Sep 30, 2002

Originally posted by: Nemesis77

Let's see.... The changes of *Hammer when compared to Athlon:

1. 64-bit chip
2. Integrated memory-controller
3. HyperTransport (equivalent to 800MHz FSB) instead of FSB
4. Reworked cache (More L2 in some models, wider datapaths?)
5. SSE2-support
6. Reworked pipeline
7. SOI
8. Generic improvements in the core.

IMO, 1, 2, or 3 alone would justify calling the chip "8th generation" instead of "7th generation". But they are doing alot more than just 1, 2 or 3.

Burns works for Intel, it's his job to downplay competitors. What is he supposed to say? That Hammer is really an awesome design? Yeah, I would like to see THAT happening

Depends on your definition... My definition is rather strict i.e. P.Pro -->PII -->PIII (Katmai) -->PIII (Coppermine)---> PIII (Tualatin)---> Timna (never saw the light of day)---> All 6th generation

If you don't consider the above all the same generation, then there is no need to argue. If you do, lets debate your argument point by point
#1-- If you want to count this fine, but this leads to the Athlon with 64-bit argument Burns was trying to make
#2-- This was done on Timna, an intended value CPU. This is not a new idea, AMD just is taking a gamble it doesn't backfire (locked into the wrong/slow memory tech.
#3-- Possibly an improvement for MP systems, I fail to see how this improves performance on UP systems (other than the effect of #2). Timna had integrated hublink.
#4-- PII, and Coppermine both saw a new cache design (not just bigger) than their predecessor
#5-- PIII added SSE... new brand name, but same "generation"
#6-- How was it reworked? Most everything that I have seen says AMD has added a stage or two to the decoder, but the rest stays essentially the same.
#7-- Process improvements do not define an architecture... Tualatin moving to copper didn't mean it is a new architecture
#8-- Could you be more specific? Other than some changes to the decoders, I haven't seen much new.

Willamette shared no transistors with the P6 generation... I don't think you can come close to saying the same for K7 and Hammer

Varun · Sep 30, 2002

I don't see why it's such a big deal if the core is basically the K7. From articles I have read about the Hammer the attempt is to utilize the core much more than it is now. I'm no computer engineer (yet) but basically the data pipes sit empty most of the time in the current Athlon. The attempt with the hammer is to optimize the data so that the core is being used much much closer to 100% than it is now. Plus add in things like the SSE-2 and other new features to improve performance. They can call it the K8 if they want because it isn't the K7. It may share some points with the K7 but only those that weren't saturated with workloads.

Just my 2c

Sohcan · Oct 1, 2002

*cracks knuckles*

Okay, I'm back.

Most previous x86 generational changes introduced not only a new microarchitecture, but a new execution paradigm. This unfortunately has caused most people to expect the same thing from subsequent microarchitectures. Ignoring the 80286 () the major (Intel) x86 microarchitectures brought the following:

80386: (more) orthogonal register set and paging (perhaps the two most important items), 32-bit flat addressing, translation lookahead-buffers
80486: Fully pipelined integer execution, integrated level 1 I/D caches, integrated floating-point unit
Pentium: 2-way issue statically scheduled superscalar
Pentium Pro: 3-way issue dynamically scheduled superscalar and speculative execution (and all the goodies that come with it: advanced dynamic branch prediction, branch target buffers, decoupled execution).

But since 1995 - 1996 when most MPU manufacturers introduced their dynamically scheduled superscalar processors, no one has really introduced any new paradigms (except Itanium / Itanium 2, but that's based on a older idea and a different tangent altogether). The "second-generation" dynamically scheduled superscalar processors that have been coming out the last year or two/will come out soon (Athlon, Hammer, P4, EV7, POWER4, etc) have merely improved upon the idea.

The P4 perhaps took the most radical approach, attempting to reduce wire delays as much as possible (through trace cache, pipeline stages devoted to signal propogation, double-speed ALUs to reduce data bypass delays) in the light of the increasing gap between wire delay and gate delay. Obviously it has proven to be an acceptable design decision, though by no means the only route that is necessary.

Aside from the exterior changes to the Athlon core (routing links and integrated DDR controller), the Hammer core is very similar; a few integer reservation stations were added, the fetch and decode stages were modified, and the TLB and branch predictor were improved. This is by no means a bad thing; the Alpha 21364, due out soon, takes the 6-year old 21264 core with almost no changes, and adds a 1.5 MB on-die L2 cache, 2 x 64-bit RDRAM controllers (12.8 GB/sec of memory bandwidth), 4 x 6.4GB/sec routing links, and 1 x 6.4 GB/sec IO bus. The 21364 should be on the top of SPECint and SPECfp performance, at least until Madison (Itanium 2 follow-up) is released next year.

The really cool new ideas that academia has been toying with for a decade probably won't show up in commercial microprocessors for a few years: data value prediction, load address value prediction, trace processors, multithreading processors, data flow, multiscalar processors. The transistor requirements for many of these techniques are still too high. The P4's trace cache and upcoming simultaneous multithreading (Hyperthreading) put it the closest to some of these ideas, though it is a bit castrated compared to the real deal.

Unfortunately there are not a lot of mainstream articles written yet about the new upcoming processing paradigms...they mainly consist of a large number of academic papers written over the last ten years. Here's a few important ones if anybody is interested (you may be able to find them at various universities' web sites using Google)

L. Hammond, M. Willey, and K. Olukotun, "Data speculation support for a chip multiprocessor."
G. S. Sohi, S. E. Breach, T. N. Vijaykumar, "Multiscalar processors."
J. G. Steffan, T. C. Mowry, "The potential for using thread-level data speculation to facilitate automatic parallelization."
J. B. Dennis and D. P. Misunas, "A preliminary architecture for a basic dataflow processor."
D. M. Tullsen, S. J. Eggers, and H. M. Levy, "Simultaneous multithreading: Maximizing on-chip parallelism."
A. Roth and G. S. Sohi, "Speculative multithreaded processors."

ElFenix · Oct 1, 2002

alpha is an HP property, along with PA-RISC and partial credit for IA-64... compaq effectively killed alpha through their bungling (had a good chance of ruling the server market) so who is doing most of the development now? samsung?

Mrburns2007 · Oct 1, 2002

Doesn't matter what marketing crap they come up with or whether it is a warmed up K7. The bottomline is how well does it perform and how much does it cost.

If it's 30-40% faster then the K7 then it will rock but if it's overpriced I won't be buying it.

Nemesis77 · Oct 1, 2002

Originally posted by: lambasa
Depends on your definition... My definition is rather strict i.e. P.Pro -->PII -->PIII (Katmai) -->PIII (Coppermine)---> PIII (Tualatin)---> Timna (never saw the light of day)---> All 6th generation

If you don't consider the above all the same generation, then there is no need to argue. If you do, lets debate your argument point by point
#1-- If you want to count this fine, but this leads to the Athlon with 64-bit argument Burns was trying to make

Making Athlon 64bit (doubling the amount of registers, making them 64bits etc.) is pretty major change IMO.

#2-- This was done on Timna, an intended value CPU. This is not a new idea, AMD just is taking a gamble it doesn't backfire (locked into the wrong/slow memory tech.

Just because Intel did it first with an unreleased CPU, means that Athlon can't use it and call the chip "next generation"? And *Hammer get's DDR2-support in it's first die-shrink (H2 2003). Untill then, 333MHz DDR is more than enough. Integrated mem-controller reduces latency and improves bandwidth-utilization. Also, the CPU may be designed in such way that it doesn't need vast amounts of memory-bandwidth (compare Athlon XP and P4. P4 needs loads of memory-bandwidth, Athlon XP does not)

#3-- Possibly an improvement for MP systems, I fail to see how this improves performance on UP systems (other than the effect of #2). Timna had integrated hublink.

More bandwidth is always a good idea.

#4-- PII, and Coppermine both saw a new cache design (not just bigger) than their predecessor

Yes, so? That alone would not be enough to call *Hammer "new generation", but it's just one of several changes that they made to the core.

#5-- PIII added SSE... new brand name, but same "generation"

Yes, so? And the brand-change sure points to the direction that Intel thought it's "new and improved", maybe even "new generation". But if AMD adds SSE2 to their product (along with ALOT other changes), they can't call their product "new generation"? Why can Intel do so with alot less changes?

#6-- How was it reworked? Most everything that I have seen says AMD has added a stage or two to the decoder, but the rest stays essentially the same.

Yep, they added two stages to the pipeline. And in my book, that means "reworked". Also, I read an article (don't remember where) where it was said that the changes in the pipeline would be perfect for double-pumped ALU's. But take that with a grain of salt.

#8-- Could you be more specific? Other than some changes to the decoders, I haven't seen much new.

Significantly improved branch-prediction among others. And there are bound to be other changes and improvements to the core that simply get less attention that the main points (64bits, memory-controller etc.)

Willamette shared no transistors with the P6 generation... I don't think you can come close to saying the same for K7 and Hammer

For practical purposes, changes in *Hammer are just as significant, even more so, that changed in P4 when compared to P3. Why then could you call P4 "new generation" while *Hammer is not? Just because AMD re-uses some of their previous tech (that is known to be good) in their new product, doesn't IMO mean that the new product is somehow less "new generation". Why design a brand-new tech if you already have kick-ass tech in your disposal?

Cosmic_Horror · Oct 1, 2002

... isn?t much more than a warmed-over version of the existing Athlon, or K7, processor.

well what did you expect Intel to say? "it is the best thing since sliced bread"

as long as the chip preforms well, i am not fussed what generation it is based on

Degenerate · Oct 1, 2002

Interesting thread.
The more important thing is when is Hammer going to see the light of day?

Sohcan · Oct 1, 2002

Originally posted by: ElFenix
alpha is an HP property, along with PA-RISC and partial credit for IA-64... compaq effectively killed alpha through their bungling (had a good chance of ruling the server market) so who is doing most of the development now? samsung?

The EV7 and EV79 teams are still working at Compaq (HPaq?); when they are done they will be transferred to Intel. HP is still promoting the EV7 for the current Alpha customers; it is the PA and new customers to which Itanium is being promoted.

lambasa · Oct 1, 2002

Originally posted by: Nemesis77

For practical purposes, changes in *Hammer are just as significant, even more so, that changed in P4 when compared to P3. Why then could you call P4 "new generation" while *Hammer is not? Just because AMD re-uses some of their previous tech (that is known to be good) in their new product, doesn't IMO mean that the new product is somehow less "new generation". Why design a brand-new tech if you already have kick-ass tech in your disposal?

Hammer is a significant extention of the K7 microarchitecture. The feature list might make it look more dissimilar to the Athlon than P4 was to P3, but not in the underlying microarchitecture. As sohcan mentions, new microarchitectures are usually new paradigms in design. The P4 brought us a trace cache which effectively decouples the x86 decoder from the execution resources, the first implementation of SMP (although currently only seen on Xeons), a significant change in the pipeline length, a much bigger out of order window, double speed ALU's... and these are just the big ticket items of differences between P4 and P3. These two do not share a single transistor. I'm sorry, but when the fundamental pipeline remains the same, I really don't consider it a new generation.

As a consumer, I couldn't care less about which generation it belongs to. If it performs well at a fair price... great! As somebody interested in microprocessor design, I see an "Athlon with a whole bunch of goodies." (Or if you are Intel--a warmed over Athlon)

Nemesis77 · Oct 1, 2002

Originally posted by: lambasa
Hammer is a significant extention of the K7 microarchitecture. The feature list might make it look more dissimilar to the Athlon than P4 was to P3, but not in the underlying microarchitecture. As sohcan mentions, new microarchitectures are usually new paradigms in design. The P4 brought us a trace cache which effectively decouples the x86 decoder from the execution resources

*Hammer brings us HyperTransport. Why is trace-cache more significant than HyperTransport?

the first implementation of SMP (although currently only seen on Xeons)

Symmetric MultiProcessing? Ummmm, we have had that long before P4 . I think you meant SMT (Symmetric MultiThreading)

a significant change in the pipeline length

Hammers pipeline is also longer than on Athlon. How much do you have to increase the lenght in order to be able to say it "new generation"?

Sunner · Oct 2, 2002

*Hammer brings us HyperTransport. Why is trace-cache more significant than HyperTransport?

I fail to see how going from the EV6 bus to HT makes it any more of a new generation?

Might as well claim that the move from GTL+ to AGTL is what makes the P4 a new generation.

Nemesis77 · Oct 2, 2002

Originally posted by: Sunner

*Hammer brings us HyperTransport. Why is trace-cache more significant than HyperTransport?

Click to expand...

I fail to see how going from the EV6 bus to HT makes it any more of a new generation?

Might as well claim that the move from GTL+ to AGTL is what makes the P4 a new generation.

I fail to see how changing Pentiums cache makes it any more new generation.

imgod2u · Oct 2, 2002

I fail to see how changing Pentiums cache makes it any more new generation.

It's not just the cache. If that were the case it wouldn't be a "big" change. It's the entire working structure of the instructing handling system. Although, I gotta say, even the reworked instructing handling wouldn't make it "next generation". The fact that every processing stage was redesigned and does a completely different thing than what was on the P6 core does though.

Sunner · Oct 2, 2002

Originally posted by: Nemesis77

Originally posted by: Sunner

*Hammer brings us HyperTransport. Why is trace-cache more significant than HyperTransport?

Click to expand...

I fail to see how going from the EV6 bus to HT makes it any more of a new generation?

Might as well claim that the move from GTL+ to AGTL is what makes the P4 a new generation.

Click to expand...

I fail to see how changing Pentiums cache makes it any more new generation.

Umm, GTL+ and AGTL are the bus protocols used by the P6 and P7 cores respectively.

Sohcan · Oct 2, 2002

Originally posted by: Nemesis77

I fail to see how changing Pentiums cache makes it any more new generation.

Don't get so hung up on a definition of an x86 "generation"...it's a rather artifical construct when assigned to x86 microprocessors. It is perfectly reasonable for a company's two consecutive microarchitectures to have similar design choices in addition to incremental improvements. The only real talk in the industry and academia of generations is about the true "fourth-generation" architectures that should start appearing in 3-5 years. Whereas the second- and third-generation architectures aimed at improving instruction-level parallelism (ILP) through pipelining then multiple issue (static and dynamic scheduled superscalar), most of the "fourth-generation" ideas focus on implementing thread-level parallelism (TLP) on top of ILP within the microprocessor.

These microarchitectures include some ideas that are more immediately realizable (simultaneous multithreading (SMT) and on-chip multiprocessing (CMP)) and others that are more radical (trace, data flow, and multiscalar processors). For example, a trace processor goes beyond the fetch and branch predict decoupling present in today's dynamically scheduled superscalar by implementing a trace cache that decouples fetch, decode, and data and control prediction/speculation from the back-end execution. The front-end builds "traces," in which trace contains multiple basic blocks of code with the branches removed through prediction and speculation. The traces are built to be largely independent from each other through the use of data value and load address prediction/speculation (only a minimal amount of intertrace dependency checking is required for trace issue). The back end then includes multiple "processing elements," each of which resembles the back-end of today's 4-way issue superscalar processor. Each processing element fetches one "trace" at a time from the trace cache, and can execute them independently using today's dynamically scheduled superscalar techniques. The trace processors proposed have four processing elements, capable of a total peak execution of 16 instructions/cycle (compared to 3 or 4 for today's superscalar). Not only would this 4 x 4-way trace processor (16 instructions/cycle) achieve a higher average IPC than a 16-way superscalar given the same transistor resources, but control logic and wire lengths are more distributed, leading to higher clock rates.

The POWER4 (with 2-way CMP) and the Pentium 4 (with its trace cache and 2-way SMT) are first steps towards the new "generation" of architectures, but they are implemented on a much smaller scale than the proposed ideas...at heart, they are still dynamic superscalar rather than TLP designs. Again, this concept of "generation" is rather meaningless to the consumer or enthusiast...make your decisions based on price and the performance in applications you use. The design decisions mentioned about are much more a concern to the MPU designers (do you care whether an MPU uses a history table or a reorder buffer to maintain precise interrupts?)

KF · Oct 3, 2002

I can't say I follow much of what Sohcan said about CPU innards, but it was ten times more understandable to me than what I usually read.

What is considered a generation to processor theorists is not like what the PR people at the chip companies extol. To Intel/AMD it is more like a total redesign, like the auto companies do occasionally. Auto companies do not really make a new type of automobile. Generally speaking, all the design elements that may be newly employed by AMD/Intel have been used before by others. Itanium may be an exception. AMD/Intel/Cyrix are bringing these things to a mass market.

Is Hammer a new generation for AMD? I think it is. Even while many blocks get the same name, I believe they won't be done the same way, because a change anywhere in the system requires changes elsewhere to gain any advantage, and even if it didn't, it presents opportunities that were previously unavailable.

Going to 64 bits is a major jump, even though obvious, which inevitably would have to happen, even if 64 bits is not likely to be used much for several years or be of much use to the average user today. People may recall that Windows programs up to 3.1 did not use 32 bit instructions, despite the fact that the 80386 had that available for many years. (But MS had an interim Win32 module.) With Windows 95, 32 bits became the programming model and the default mode. By the time that happened, programmers where getting very frustrated and feeling constrained. AMD by providing this 64 bit solution early, gives programmers an easy solution that they can gradually avail themselves of. It is a shame Intel did not take the lead (or maybe Intel has a plan behind closed doors?) I suppose a version of Itanium was planned for the mass market segment, but it has turned out to be further from practicality than Intel predicted.

The AMD 64 bit programming model is so simple, it is difficult to believe programmers will not use it. That will put Intel in a difficult position. Still, Intel has always prevailed before, and one can predict that that Intel will successfully surpress 64 bit programming until they have their own version. AMD is gambling otherwise. It is a long shot. It is an expensive gamble because producing a 64 bit chip is going to take more chip real estate and drive up AMD's cost of production.

majewski9 · Oct 3, 2002

Originally posted by: lambasa
I would like to know what Mr. Burns would say about the Xeon using the same logic

I don't think Intel has ever claimed the Xeon to be a next generation CPU (over the P4). AMD is saying the Hammer is 8th generation while the Athlon is 7th generation. From all the microarchitectural details released so far, I would say Mr. Burns is about right. The Hammer is a 64-bit Athlon with an integrated memory controller. The rest of the changes seem to be minor tweaks, rather than a new microarhcitecture. Kind of like the Pentium Classic --> Pentium MMX... some new instructions (MMX), and a few tweaks to the microarchitecture (bigger L1). Nobody ever called the Pentium MMX the next generation.

Come to think of it wasnt the Pentium 3 essentially a Pentium 2 with SSE?

I think that it is clear to everyone that the Hammer K8 is significantly different than the Athlon K7 for it to earn its eight generation status.

Markfw · Oct 3, 2002

Lets see: It doesn't have the same number of pins, requires a totally new chipset, thus a new motherboard. If you didn't call it a K8, then you would really have a mess. Especially in light of the Pentium 2/3 (same chipset) situation, who could possibly argue that it is an enhanced K7 when it at least does require a new socket/chipset,etc...

imgod2u · Oct 3, 2002

Well, to clarify, the P2 -> P3 wasn't exactly a "next gen" shift. It was still the P6 core. The P3 -> P4 shift was a "next gen" shift as it transitioned to the P7 core. Now while I'm not inclined to call Hammer a "next gen" as it is built upon the previous K7 core, I can't exactly call it a "next gen" core either. Putting a new spoiler and adding a turbo charger on a BMW doesn't make it the same BMW, it doesn't make it a new design either.

AMD?s 64-bit gamble

Diamond Member

Member

Golden Member

Platinum Member

Elite Member

Platinum Member

Diamond Member

Golden Member

Platinum Member

Platinum Member

Member

Diamond Member

Elite Member

Diamond Member

Senior member

Elite Member

Platinum Member

Golden Member

Platinum Member

Moderator Emeritus, Elite Member

Senior member