Pentium 5? AMD has Hammer in production already

andreasl · Jul 13, 2002

What I'm wondering is whether Intel will put 6 decoding units on Prescott due to the fact that it is decoding 2 separate threads each with the ability to allow 3 x86 instructions to be decoded per clock. If so, that would make it a 6-way superscalar processor. That'd be sweet.

This doesn't make sense at all. Why would you slap on 6 decoders on the P4 when it only has a single one now in the first place? And the P4 isn't limited by it's decoder due to the trace cache. The decoder is only used when there is a trace cache miss. The rest of the time it sits idle. With 6 decoders you would have them all sit idle all the time. To improve HT performance Intel could increase the trace cache bandwidth (from the current 3 uops/cycle to perhaps 6 uops/cycle), increase the trace cache size and/or add more execution units. I'm sure there could be other even smaller tweaks that would increase HT performance. BTW those changes I mentioned would also increase performance on single-threaded codes somewhat.

Damascus · Jul 13, 2002

Originally posted by: CrawlingEye
Don't forget that as anyone ramps up higher they lose IPC, and at the current rate of things, AMD ramping up to the same speeds as current P4's would give them potentially less of an IPC than the P4 has, giving Intel the edge on even a clock per clock comparison.

I remember you saying this in a thread from a few months ago. I still haven't received a
satisfactory explanation...

jbond04 · Jul 14, 2002

That makes no sense....That's like saying a P4 1.6A has a higher IPC than a 2.53GHz P4..... The Athlon would only lose IPC if they increased the number of stages in their pipeline without enhancing any other portions of the chip to compensate.

And crawling eye....are you related to SSXeon5 in anyway? Your sigs look very similar....

Chaotic42 · Jul 14, 2002

Originally posted by: Duvie
Good one, sherlock!!!

Thanks, it took me a few days to come up with that. I hope you like it.

HowDoesItWork · Jul 14, 2002

One thing to throw in the hopper (I know all this is pointless, but I am bored at work) is that Intels 64-bit processor will not be backward compatible, it will have to run 32-bit applications through software emulation. Which, as I understand it, will make it very slow in 32-bit. AMDs solution will run both 64 and 32-bit applications natively. This seems to be an advantage to me. Backwards compatibility is always nice. Besides, how will you benchmark the first 64-bit processors? Hell, they will probably still run a Quake 3 benchmark on it.

Wingznut · Jul 14, 2002

The Itanium processors do not use software emulation for 32-bit applications. It is done in hardware. Sure, it's not all that fast, but you have to remember the intentions of the cpu. It isn't intended for desktops. It's intended for high-end server environments where 32-bit code is hardly needed. But it's very nice to have that capability if needed.

And don't presume that because the Itanium isn't very strong at 32-bit code, that some future desktop verson of IA-64 (if one is ever designed) wouldn't be backwards compatible with 32-bit.

CrawlingEye · Jul 14, 2002

Originally posted by: Damascus

Originally posted by: CrawlingEye
Don't forget that as anyone ramps up higher they lose IPC, and at the current rate of things, AMD ramping up to the same speeds as current P4's would give them potentially less of an IPC than the P4 has, giving Intel the edge on even a clock per clock comparison.

Click to expand...

I remember you saying this in a thread from a few months ago. I still haven't received a
satisfactory explanation...

I all ready told you before, but I'll state it again.

The P4 all ready has a long pipeline which could take them to/near 5 GHz or so without having to increase it much more, the frequency of decay in IPC would be greater for the AXP's than the P4, since the P4's are closer to it.

Are you understanding things now?
The AXP @ 3 GHz might be able to keep it's IPC relatively close it, but on a clock-for-clock basis, the AXP's losing out at perhaps a speed of say... 5GHz.

As I've said before, IPC decreases linearly as clockspeed raises. Why else do you think they're having such a hard time ramping up even with a .13u cpu?

CrawlingEye · Jul 14, 2002

Originally posted by: jbond04

And crawling eye....are you related to SSXeon5 in anyway? Your sigs look very similar....

Nope, just friends of.

Accord99 · Jul 14, 2002

Originally posted by: jbond04
That makes no sense....That's like saying a P4 1.6A has a higher IPC than a 2.53GHz P4..... The Athlon would only lose IPC if they increased the number of stages in their pipeline without enhancing any other portions of the chip to compensate.

If you keep all other components equal but merely replace a 1.6A with a 2.4A for the majority of applications, you will not see a 50% increase in performance because most applications don't fit in CPU caches and must go out to RAM or the hard drive. The 2.4A will spend more CPU cycles than the 1.6A waiting for data, which decreases its IPC.

SSXeon5 · Jul 14, 2002

Originally posted by: CrawlingEye
A couple things:

Steve (SSXeon) all ready made posts showing pictures of live Prescotts in reference boards.
Search for it, if you want.

Granite bay won't be a Prescott chipset, it's intended for the P4's, as a workstation mobo.
It'll be expensive and late. Springdale will be released originally as DC DDR (first Intel DC DDR desktop chipset) as well as later being re-released with DDR-II capabilities. Springdale will be a Prescott chipset.

Originally posted by: WarCon
When Intel demonstrated Prescott, didn't it run at 4ghz to start. I am also curious what Netburst is all about? I think I understand Hyperthreading somewhat. Isn't it where the processor allows a dual processor enabled OS to view the processor as two separate processors. I am curious what benefits this will allow a single processor system. If I am wrong, please correct me because I would like to understand why this would be beneficial. If anyone knows what Netburst is can you explain it for me. Or is it just a name for how the processor is pipelined?

HERE is the link you are talking about And It was the prescott, the northwood core cant even reach 4Ghz on .13um .... the .09um sure as hell can ... prob even 6GHz given the moore law

As for this whole topic I think Hammer is a POS .... the idea of a onboard mem controller is pretty horrible with you think of everyone screaming about upgrade paths, being with the hammer you will need new mem/mobo/chip for a new upgrade everytime. The current Hammer chip has a DDR-I memory controller on it and isnt suppose to get a DDR-II onboard till around 2004 (coming from AMD themselves) Prescott will have duel channel DDR333 when launched and get a 800Mhz fsb boost and DDR-II support later that year. With that said I dont see how Intel will not be leading like they are now The 2.53GHz and RDRAM killed the fastest AMD and even the 2.0a kicked the 1.73Ghz AMD's butt. I got that info from the new maximum PC, with a 2.0a Northie/512MB PC-800/Intel D850MV mobo/Ti4600 VS 1.73 XP/1.5GB DDR333/MSI KT3 Ultra/Ti 4600. Sad really when thats a 2100+ and a 2.0a beats it with 1/3 the ram too lol.

Originally posted by: mechBgon

Originally posted by: 7757524
FYI. Those benchmarks of the 800mhz hammer by that weird german site were faked. They used fake strings in the CPUID and erased the code at the bottom so that you can verify the results. AMD says that the Hammer will be aprx 20% faster clock for clock than the current athlon. If that's so, it's not even close to twice as fast as P4 clock for clock. I'm not worried about Intel losing it's speed lead esp with the 2.8 being released in less than two weaks and the hammer coming in 7 months.

Click to expand...

That's the first I've heard of those benchmarks not being authentic. Can you point out a source for that?

O man you made me laugh my ass off .... you thought those were real ahahahhaah. So just because 1 (ONE) German site got a hold of a Q3 bench the 800Mhz hammer is going to always be equal to a 1.6GHz willy .... give me a break. I never believed those benchmarks when i saw them .... I mean come on, and they did erase the code at the bottom. But whatever ... Its not out so there is no final results just OPINIONS and one freaken stupid benchmark.

SSXeon

SSXeon5 · Jul 14, 2002

Originally posted by: CrawlingEye

Originally posted by: jbond04

And crawling eye....are you related to SSXeon5 in anyway? Your sigs look very similar....

Click to expand...

Nope, just friends of.

Yup Intel 0wnz j00!

SSXeon

mechBgon · Jul 14, 2002

CrazySaint · Jul 15, 2002

Originally posted by: mechBgon

CrazySaint · Jul 15, 2002

Chip Wars: The Intel Fanboy Strikes Back Coming soon to a theater near you!

imgod2u · Jul 15, 2002

Originally posted by: andreasl

What I'm wondering is whether Intel will put 6 decoding units on Prescott due to the fact that it is decoding 2 separate threads each with the ability to allow 3 x86 instructions to be decoded per clock. If so, that would make it a 6-way superscalar processor. That'd be sweet.

Click to expand...

This doesn't make sense at all. Why would you slap on 6 decoders on the P4 when it only has a single one now in the first place? And the P4 isn't limited by it's decoder due to the trace cache. The decoder is only used when there is a trace cache miss. The rest of the time it sits idle. With 6 decoders you would have them all sit idle all the time. To improve HT performance Intel could increase the trace cache bandwidth (from the current 3 uops/cycle to perhaps 6 uops/cycle), increase the trace cache size and/or add more execution units. I'm sure there could be other even smaller tweaks that would increase HT performance. BTW those changes I mentioned would also increase performance on single-threaded codes somewhat.

Even if the trace cache were able to issue 6 micro-ops per clock, that still wouldn't guarantee that all the execution units would be utilized. Data dependencies usually prevent such things. However, in order to determine data dependencies, you would STILL have to decode the x86 instructions into micro-ops. This means that being able to decode 6 or 3 x86 instructions per clock and put them in the trace cache, more independent micro-ops can be found and fetched from the trace cache. If you decode, say 3 or 6 x86 instructions into 5 or 9 micro-ops and only 3 or 5 of them can be executed independently, that'd still be better than if you only decoded 1 x86 instructions into 3 micro-ops and only 2 of them could be executed. Also, the trace cache is only 12k on the P4. That is way too small for any significant loop to be stored in. While it's a great idea, rarely do you ever get code that has small enough recurring parts to be run solely from the trace cache. But I will agree that increasing the trace cache's issueing rate is more important than adding decoding ability.

lRageATMl · Jul 15, 2002

what is the point of all this arguing? If you like Intel buy it, if you like AMD buy it. who the fck cares?

me personally? AMD is a better buy when it comes to the best bang for the buck.

alexruiz · Jul 15, 2002

Originally posted by: SSXeon5
As for this whole topic I think Hammer is a POS .... the idea of a onboard mem controller is pretty horrible with you think of everyone screaming about upgrade paths, being with the hammer you will need new mem/mobo/chip for a new upgrade everytime. The current Hammer chip has a DDR-I memory controller on it and isnt suppose to get a DDR-II onboard till around 2004 (coming from AMD themselves) Prescott will have duel channel DDR333 when launched and get a 800Mhz fsb boost and DDR-II support later that year. With that said I dont see how Intel will not be leading like they are now The 2.53GHz and RDRAM killed the fastest AMD and even the 2.0a kicked the 1.73Ghz AMD's butt. I got that info from the new maximum PC, with a 2.0a Northie/512MB PC-800/Intel D850MV mobo/Ti4600 VS 1.73 XP/1.5GB DDR333/MSI KT3 Ultra/Ti 4600. Sad really when thats a 2100+ and a 2.0a beats it with 1/3 the ram too lol.

O man you made me laugh my ass off .... you thought those were real ahahahhaah. So just because 1 (ONE) German site got a hold of a Q3 bench the 800Mhz hammer is going to always be equal to a 1.6GHz willy .... give me a break. I never believed those benchmarks when i saw them .... I mean come on, and they did erase the code at the bottom. But whatever ... Its not out so there is no final results just OPINIONS and one freaken stupid benchmark.

SSXeon

Well, and they say the AMD users are the fanboys. I totally disagree with the assumption that having on-board mem controlller is bad for the upgrade.... You will need a new motherboard if you want the latest , and that is the same for a conventional array CPU-chipset..... as if you were to switch "chipsets" in your current mobo. LOL

The integrated mem controller is in fact, easier on the chipset/mobo manufacturers. The chipset manufacturers don't have to worry about implementing a new kind of memory, and for the mobo makers it means only new CPU/mem, not chipset. Granted, it is harder on AMD, but as overall picture for compu makers/mobo makers/chipset makers that is a better approach. And let me know how having separate mem controller is easier on the upgrade.... maybe I don't know how to implement it in my machine without replacing the mobo

About those benchmarks of maximum PC, come on guys. This is a forum of users is search of the truth, who stop to analyse what they see and don't believe it blindly. Regardless of what you want and praise, we want to have accurate and cold facts. Maximum PC uses a set of benchmarks consisting in sysmark 2001/pc mark 2002, quake III and some photoshop. Those are benchmarks well known for being Intel friendly. Bapco itself has said that "AMD doesn't belong to the consortium and doesn't help, and Intel is a helpulful member who even suggests benchmarks..." Stop to think a moment and tell me, Who of you uses windows media encoder??? Why didn't they choose POV raytracing, Serious Sam for gaming (which has no optimizitions) and sciencemark??? How about Unigraphics or ProE?? FEA??? Simulations??? PSpice??? The most powerful PC is that one that gives you the best performance in what you do, and if you don't use WME, who cares about it being twice as fast in a P4??? How about that new benchmark created by Van Smith, COSBI (that is a pure mathematical benchmark, so pick the winner.....)???

Sorry guys, if you think those facts given by bapco's benchmarks, then you need to be looking for answers about computers in Cnet.com or pcworld.com..... According to those benchmarks, even a coppermine-128 celeron beats a morgan duron clocked higher..... and yes, I know, that sounds ridiculous.

I admit the P4 is the fastest overall right now at stock speeds, but no way as much as those numbers say. I play Serious Sam, and only a 2.4 Ghz or higher P4 with Rambus can beat an Athlon.... so why should I get a P4??? I do some scientific simulations, and my Athlon at home beats the living crap of the worksations at work.... The P4 wins in memory output, no question, and IF and ONLY IF the application is optimized with SSE2 it wins also, otherwise the Athlon is still the best. Gaming is still very close.

Regarding SSE2 optimizations, they are not magical, they help if the type of data can be easily parallelizable (witness sciencemark). The netburst architecture has been out for almost 2 years now, and I still have to see the majority of the software saying "optimized for the P4". There are things that cannot be optimized using SSE2.

How about the bechmarks at Heise.de?? I think thay are a serious site. Just check some olders reviews of other CPUs. They also made clear the statement that those were preliminary numbers from A0 silicon, so not to believe those numbers. they said they didn't have the hardware very long, so no careful test was going to be run. Deleted string??? I'll bet you AMD asked for deletion of that data as part of a NDA....

Finally, for those really excited about granite bay (SSXeon5), I am launching a public challenge: You get a high end desktop-workstation with a P4 and granite bay, and with the SAME AMOUNT of money you spend (or little less) I'll get a hammer config of my choice... and we pit them together face to face in ANY benchmark you want..... and the only results to care about are the highests scores, not bang for the buck, only brute force. Do you take it???

My point here is to just to remind everyone that even if AMD hasn't been doing good marketing or promotion, the market segmentation thay are approaching with the hammer seems logical and healthy for the company:

Budget and mainstream: Clawhammer with single bank DDR.... joesixpack doesn't need dual channel DDrR
High end desktop: Dual clawhammer or single sledgehammer with twin bank DDR. (I'll take dual clawhammer) Hypertransport will make dual CPU machines as common as any desktop because of the glueless MP cabability, that is -once again- easy on the chipset-mobo makers.
Higher that those before: At least 2 sledgehammmers.....

Remember, in a hypertransport setup, the bandwidht of the system increases with the number of CPUs, so a wimpy 2.7 GB/s in a single clawhammer becomes a fast 5.4 GB/s in a dual setup..... and sorry, if you think that a single CPU (even with hypertreading enabled) with 6.4 GB/s can beat a dual brain machine with 5.4 GB/s... Well, I'll offer to pay a suscription to pcworld.com (which, by the way, shows the Athlon still on top in its benchmarks.....it is a weird world, isn't it??)

alexruiz · Jul 15, 2002

I almost forgot it. Who has read the hypertyransport papers?? I keep reading about the FSB in future P4s and how it compares to the "FSB" of the hammer. Remember, the hammer doesn't have FSB, it communicates using HT "tunnels" with an starting bandwidth of 3.2 GB/s ("equiv" to 800 MHz) and that can be two-folded easily to 6.4 GB/s ("equiv" to 1.6 GHz ""FSB")... and those tunnels don't apply to the memory controller, which is integrated in the chip (That is why the 6.4 GB/s differs to the 5.4 GB/s in my previous post)

I wait for the flames.

imgod2u · Jul 15, 2002

Originally posted by: alexruiz

Well, and they say the AMD users are the fanboys. I totally disagree with the assumption that having on-board mem controlller is bad for the upgrade.... You will need a new motherboard if you want the latest , and that is the same for a conventional array CPU-chipset..... as if you were to switch "chipsets" in your current mobo. LOL

The integrated mem controller is in fact, easier on the chipset/mobo manufacturers. The chipset manufacturers don't have to worry about implementing a new kind of memory, and for the mobo makers it means only new CPU/mem, not chipset. Granted, it is harder on AMD, but as overall picture for compu makers/mobo makers/chipset makers that is a better approach. And let me know how having separate mem controller is easier on the upgrade.... maybe I don't know how to implement it in my machine without replacing the mobo

An onboard memory controller means the processor determines the speed and type of memory. So everytime you want to change to a different type of memory, or a different speed of memory, you'd have to replace the processor with a whole new one. In contrast, a separate memory controller means you'd only have to replace the motherboard. As is the case with the P4 and its 533MHz FSB. You can run it with SDR SDRAM, DDR SDRAM (of all speeds), RDRAM (of varying speeds), etc. And as future chipsets come out (Granite Bay) with faster or better memory, you still wouldn't need to replace the CPU. Of course, I can't recall the last time I replaced my motherboard without replacing my CPU as well, but for those who upgrade constantly........
On a sidenote, this is ironic I think. When Intel released the s473 P4's and later switched to s478, every single AMD fanboy jumped on that saying it didn't provide an upgrade path.

About those benchmarks of maximum PC, come on guys. This is a forum of users is search of the truth, who stop to analyse what they see and don't believe it blindly. Regardless of what you want and praise, we want to have accurate and cold facts. Maximum PC uses a set of benchmarks consisting in sysmark 2001/pc mark 2002, quake III and some photoshop. Those are benchmarks well known for being Intel friendly. Bapco itself has said that "AMD doesn't belong to the consortium and doesn't help, and Intel is a helpulful member who even suggests benchmarks..." Stop to think a moment and tell me, Who of you uses windows media encoder??? Why didn't they choose POV raytracing, Serious Sam for gaming (which has no optimizitions) and sciencemark??? How about Unigraphics or ProE?? FEA??? Simulations??? PSpice??? The most powerful PC is that one that gives you the best performance in what you do, and if you don't use WME, who cares about it being twice as fast in a P4??? How about that new benchmark created by Van Smith, COSBI (that is a pure mathematical benchmark, so pick the winner.....)???

Do you run COSBI? Do you use it? Do you use PSpice? The same could be said about the set of benchmarks you listed, but that's your bias I guess. I'll agree Sysmark isn't exactly a reliable benchmark but Windows Media Encoder is actually a very useful tool. For those who do video over the internet, there is no better choice. And Q3A has been the de facto benchmark for ages. No, they're not ALL that's out there, but bashing those sites for picking what you call "useless" benchmarks and then listing a set of benchmarks which are also useless and suggesting that they count more is sheer hypocracy.

Sorry guys, if you think those facts given by bapco's benchmarks, then you need to be looking for answers about computers in Cnet.com or pcworld.com..... According to those benchmarks, even a coppermine-128 celeron beats a morgan duron clocked higher..... and yes, I know, that sounds ridiculous.

I admit the P4 is the fastest overall right now at stock speeds, but no way as much as those numbers say. I play Serious Sam, and only a 2.4 Ghz or higher P4 with Rambus can beat an Athlon.... so why should I get a P4??? I do some scientific simulations, and my Athlon at home beats the living crap of the worksations at work.... The P4 wins in memory output, no question, and IF and ONLY IF the application is optimized with SSE2 it wins also, otherwise the Athlon is still the best. Gaming is still very close.[q/]

Gaming uses almost no SSE/SSE2 enhancements. The P4 still does pretty well.

Regarding SSE2 optimizations, they are not magical, they help if the type of data can be easily parallelizable (witness sciencemark). The netburst architecture has been out for almost 2 years now, and I still have to see the majority of the software saying "optimized for the P4". There are things that cannot be optimized using SSE2.

You do much video editing? FlaskMpeg has long been SSE2 optimized along with the divx codec itself since version 4. Not to mention Pinnacle Studio, Premiere, Photoshop. SSE2 is useful for those things. And just so you know, it could help the P4 to even use SSE2 in a scalar form. In other words, only include 1 data in a datacluster (sure you'd be wasting space but you'd be bypassing the limitations of the x87 FP stack). SSE/SSE2 registers are in a flat address stack. That saves so much time in that FXCH instructions are almost never required anymore.

How about the bechmarks at Heise.de?? I think thay are a serious site. Just check some olders reviews of other CPUs. They also made clear the statement that those were preliminary numbers from A0 silicon, so not to believe those numbers. they said they didn't have the hardware very long, so no careful test was going to be run. Deleted string??? I'll bet you AMD asked for deletion of that data as part of a NDA....

Finally, for those really excited about granite bay (SSXeon5), I am launching a public challenge: You get a high end desktop-workstation with a P4 and granite bay, and with the SAME AMOUNT of money you spend (or little less) I'll get a hammer config of my choice... and we pit them together face to face in ANY benchmark you want..... and the only results to care about are the highests scores, not bang for the buck, only brute force. Do you take it???

I plan on getting a P4 config (with a watercooling kit) and overclocking it on a Granite Bay board when it comes out. I'll post benchmarks as I'm sure many others will.

My point here is to just to remind everyone that even if AMD hasn't been doing good marketing or promotion, the market segmentation thay are approaching with the hammer seems logical and healthy for the company:

Budget and mainstream: Clawhammer with single bank DDR.... joesixpack doesn't need dual channel DDrR
High end desktop: Dual clawhammer or single sledgehammer with twin bank DDR. (I'll take dual clawhammer) Hypertransport will make dual CPU machines as common as any desktop because of the glueless MP cabability, that is -once again- easy on the chipset-mobo makers.
Higher that those before: At least 2 sledgehammmers.....

Remember, in a hypertransport setup, the bandwidht of the system increases with the number of CPUs, so a wimpy 2.7 GB/s in a single clawhammer becomes a fast 5.4 GB/s in a dual setup..... and sorry, if you think that a single CPU (even with hypertreading enabled) with 6.4 GB/s can beat a dual brain machine with 5.4 GB/s... Well, I'll offer to pay a suscription to pcworld.com (which, by the way, shows the Athlon still on top in its benchmarks.....it is a weird world, isn't it??)

If you're talking about Prescott, we don't know as of yet what kind of enhancements it will have. It's way too early to tell and you're guessing even on how well Hammer will perform. And btw, Hypertransport may increase the bandwidth, but it also increases usage of the bandwidth. Since each CPU gets its own dedicated memory, it will have to use a significant part of that bandwidth between the CPU's and to the memory banks coordinating tasks so that 2 processors aren't working on the same thing at once and waste processing power (or worse, one processor gets one result and another processor gets another result). This is not a problem with a single-channel memory interface.

alexruiz · Jul 16, 2002

Good post imgod2u, that is exactly what I want to read, people being inquisitive and asking the "why", not only the "how"..... I won't quote to make it easier to read (I hope)

Obviously, we still disagree in some things that I will try to clarify.

1) Changing mobos without a new CPU....??? Well, I don't think any power user does it. The average user can do it, but they don't care about the most powerful setup.

2) Benchmarks, you are supporting my point (even if you don't accept it). Pick carefully a set of benchmarks and one CPU will win, change the set and the other will win.

I do some video on the internet, and DivX 5.02 with carefully chosen parameters can give a file almost as good as WME (which is still the best for low bitrates). The advantage?? Even a Morgan duron encoding DivX will by faster than the fastest P4 encoding WME..... And you can be my witness here.

Yes I use PSpice. Yes, I use COSBI. No, I am not a hypocrit, I just wanted to stress the point about benchmarks selection.

3) Yes, I do video, quite a lot. I agree, most of the video software is SSE2 optimized, but remember, no matter how well optimized the software is IF the codec isn't. Again, a Morgan Duron will beat a P4 if the duron uses virtualdub + divx 5.02 and the P4 uses pinnacle + indeo.... I can see you do video too, so you can confirm all my facts. The p4 is the fastest for video, no question.

However, common sense plays also in the computer stuff. All my encoding are in 2 groups, low bitrate for the internet or high quality. The low bitrates are done in seconds (320x240, 150 kbps, DivX 5.02 pro with b frames), the long ones are done overnight.... at least for me, makes no sense staring at the computer screen for hours.

If I were a video pro, I would get a dual Athlon, that can beat a P4 for the same price, even encoding video... and you should agree.

4) Good point about a scalar use of SSE2.

5) Well, you want a granite bay, I'll have a hammer. Benchmarks will show the winner. Challenge is set.

6) The memory bandwidth in NOT wasted in a dual HT CPU array. You keep thinking "FSB" and memory controller on chipset. The bandwidth that I wrote is ONLY memory badwidth, the communication between CPUs is the HT tunel. What I mean that the CPU communicates using HT is to the exterior world (memory not included). That is what HT is for.. fast communication between devices.

gennro · Jul 16, 2002

all i have to say who ever can offer top performance at the lowest price i'll buy it first.... and right now AMD seems to be the winner
MIPS (Million instructions per second) a my 1.3 ghz t-bird does the same MIPS as a 2ghz p4

Duvie · Jul 16, 2002

Too bad that isn't all that matters..LOL!!!

My 130 dollar 1.6 @2.733ghz with ease does higher mips then t-bred 2200+....and by a bit...Big f^ing deal...

alexruiz,

Hey since I have actually owned both and ran on same platform I think I will speak....

1.4tbird@1.5 w/ gknot (same pro, movie, settings, etc.) = 4hr15min

1.6a@1.6 w/ gknot = 4hr10min....manynow the 1.6a has performance close to 1.8 in apps and this may be one that utilizes the estar cache...

1.8a@2.4 w/ gknot = 2hr 39min

1.6@2.66 w/ gknot = 2hr 15min

1.6@2.733 w/ gknot = 2hr 11min

Gknot is optimized to sse which xp has as well now....

imgod2u · Jul 17, 2002

Originally posted by: alexruiz
Good post imgod2u, that is exactly what I want to read, people being inquisitive and asking the "why", not only the "how"..... I won't quote to make it easier to read (I hope)

Obviously, we still disagree in some things that I will try to clarify.

1) Changing mobos without a new CPU....??? Well, I don't think any power user does it. The average user can do it, but they don't care about the most powerful setup.

The sheer number of people who've tossed out their old KT133 boards and replaced it with KT333 boards should be of some indication of who would change their motherboards (and therefore memory) without replacing the processor.
I don't see what the big deal is as I usually get my parts right the first time. But just pointing out the irony.

2) Benchmarks, you are supporting my point (even if you don't accept it). Pick carefully a set of benchmarks and one CPU will win, change the set and the other will win.

The difference is, a lot of people (including me) DO use WME. So we would care about performance in it.

o some video on the internet, and DivX 5.02 with carefully chosen parameters can give a file almost as good as WME (which is still the best for low bitrates). The advantage?? Even a Morgan duron encoding DivX will by faster than the fastest P4 encoding WME..... And you can be my witness here.

I wouldn't say so. I have a P3 and my WME encodes (with SSE enhancements) would be faster than any DiVX (5.02 or 4.11) encoding that achieves the same quality PLUS the WMV file is smaller than the DiVX files AND it can be streamed more easily. Again, you're choosing a specific situation and software. What I'm pointing out is that indeed SSE/SSE2 optimized software ARE a good test IF they are indeed used by people. And I think you'll agree with me that a lot of people do use WME.

I use PSpice. Yes, I use COSBI. No, I am not a hypocrit, I just wanted to stress the point about benchmarks selection.

By doing the opposite and providing a list of benchmarks which practically nobody used but is favorable towards the K7 design?

Yes, I do video, quite a lot. I agree, most of the video software is SSE2 optimized, but remember, no matter how well optimized the software is IF the codec isn't. Again, a Morgan Duron will beat a P4 if the duron uses virtualdub + divx 5.02 and the P4 uses pinnacle + indeo.... I can see you do video too, so you can confirm all my facts. The p4 is the fastest for video, no question.

Put the P4 on the VirtualDub + DiVX 5.02 and it'll encode faster. What's yer point? DiVX is SSE/SSE2 optimized too. Are you going to say it's an unfair test because of it? Obviously the P4 would be slower when using a very poor codec like Indeo. What is the significance of that?

However, common sense plays also in the computer stuff. All my encoding are in 2 groups, low bitrate for the internet or high quality. The low bitrates are done in seconds (320x240, 150 kbps, DivX 5.02 pro with b frames), the long ones are done overnight.... at least for me, makes no sense staring at the computer screen for hours.

That's all well and good for long videos. But for short 10 minute clips, I'd like them to be done encoding as soon as possible so that I can check the results and adjust the color/brightness/hue as needed along with running a few more filters in order to make it look better. I'd hate to wait 10 minutes each time I re-encode.

If I were a video pro, I would get a dual Athlon, that can beat a P4 for the same price, even encoding video... and you should agree.

I'd overclock a P4 so that it'd cost less than that dual Athlon and still be almost as fast. I'd also want my comp to be a good all-purpose computer because I just don't have enough room for a second box.

4) Good point about a scalar use of SSE2.

5) Well, you want a granite bay, I'll have a hammer. Benchmarks will show the winner. Challenge is set.

6) The memory bandwidth in NOT wasted in a dual HT CPU array. You keep thinking "FSB" and memory controller on chipset. The bandwidth that I wrote is ONLY memory badwidth, the communication between CPUs is the HT tunel. What I mean that the CPU communicates using HT is to the exterior world (memory not included). That is what HT is for.. fast communication between devices.

Whenever you have 2 separate pockets of memory, you always have to spend time coordinating between them. Which means a lot more memory fetches and writes than would normally be required. That is wasted bandwidth of both the memory and the HT link that's between the CPU, which, btw, has to be used when one CPU needs to fetch data from the other CPU's memory bank.

alexruiz · Jul 17, 2002

iamgod2u, well, I guess we are going off topic, so I will refrain from adding more data. Some of your answers make no sense, but let's show maturity here and stop that debate.

The point here is you said you will have a granite bay + P4 and I'll have the hammer of my choice, so the challenge is set. No need to argue about something that hasn't been released. Facts will speak for themselves. We will remind each other when they are released.

Rainsford · Jul 17, 2002

Originally posted by: Duvie
Too bad that isn't all that matters..LOL!!!

My 130 dollar 1.6 @2.733ghz with ease does higher mips then t-bred 2200+....and by a bit...Big f^ing deal...

While I think the 1.6A overclocking is cool (takes me back to the Celeron 300A days), you can't compare prices when you overclock. Well...I guess you can, but it's meaningless since such a small percentage of users overclock. But point is well taken, performance in apps is what counts.

alexruiz,

Hey since I have actually owned both and ran on same platform I think I will speak....

1.4tbird@1.5 w/ gknot (same pro, movie, settings, etc.) = 4hr15min

1.6a@1.6 w/ gknot = 4hr10min....manynow the 1.6a has performance close to 1.8 in apps and this may be one that utilizes the estar cache...

1.8a@2.4 w/ gknot = 2hr 39min

1.6@2.66 w/ gknot = 2hr 15min

1.6@2.733 w/ gknot = 2hr 11min

Gknot is optimized to sse which xp has as well now....

So how would an XP 1900+ stack up in that comparison? XP may have SSE, but the Tbird doesn't, so comparing them in an SSE application proves nothing.

Pentium 5? AMD has Hammer in production already

Senior member

Golden Member

Senior member

Lifer

Member

Elite Member

Senior member

Senior member

Platinum Member

Senior member

Senior member

Super Moderator<br>Elite Member

Platinum Member

Platinum Member

Senior member

Senior member

Platinum Member

Platinum Member

Senior member

Platinum Member

Member

Elite Member

Senior member

Platinum Member

Lifer