RC5 + P4

BurntKooshie · Nov 20, 2000

P4 + RC5

P4 blows chunks, as we've speculated in the past. That turns out to be about 1.32kkeys/sec. That also means that the P4, without a good core (I assume he just let it autoselect, and I'm curious which one he tried) is about as fast, per clock, as A WINCHIP CLASSIC...but at least it scales better ;-)

BoberFett · Nov 20, 2000

The P4 is great!

For me to poop on.

You beat me by a few seconds.

mechBgon · Nov 20, 2000

I'd spend the money on a dual-Thunderbird DDR system, I think. Intel seems to have painted themselves into a corner...

crYnOid · Nov 20, 2000

Ok, so my guess work sucked :Q:|

Wellcky · Nov 20, 2000

Someone here once told me "Once you go Dual, you'll never go back." I think he's right

BurntKooshie · Nov 20, 2000

cyrnoid - maybe not. I think what you did is actually extremely valuable - It might prove useful in calculating something - I'll get to it in a few days when I have time

office boy · Nov 20, 2000

Being the memory and bandwidth moster that it is, I can't wait to see some Seti scores.

Adul · Nov 20, 2000

those number severaly blow. Maybe there is hope for it on seti or something else...

he you notice the plug for our rc5 team

Shuxclams · Nov 20, 2000

RC5 + P4=DooDoo

SHUX

BurntKooshie · Nov 20, 2000

while Anand said it couldn't be sped up much using a new core, SSE2 has 128 bit integer SIMD....which is what Altivec has which allows it to tear through WU's. Only time will tell if that's possible for the P4 (I don't know enough about programming to know )

sciencewhiz · Nov 20, 2000

A while back, I found an article on SSE2 on Intel's website. It was not very detailed (and my poor memory makes it worse), but I seem to remember a 128 bit rotate instruction. When I get a chance, I'll see if I can't find more information

dawks · Nov 20, 2000

Well, the new version of SETI runs better on chips with small chache's, but the performance is worse on chips with large caches.. which is odd. Anyone know why?

SETI is also very sensitive to memory latency.. I am curious to see how the P4 performs in SETI with high latency Rambus ram..

BurntKooshie · Nov 20, 2000

the reason is simple. The memory footprint used to be large enough to basically fit into the cache of large cached CPU's. Now, its smaller, and fits into the cache of smaller CPU's. To make up for the fact that the time to complete a WU on an "average" "smaller cached" CPU, they gave it more work to do (they have waaaaaaaaaaaaay excess processing power at this point, why not soak it up doing more useful stuff, right?). So, while the net effect is that its faster for the smaller cached CPU's, the larger cached ones were just fine before. But now, they have more work to do.

Basically, the smaller cached CPU's used to be bound by the fact that the memory footprint was just too large. That's not the case anymore. They added more work to do. The large ones were nearly as fast as they were going to get, because the memory footprint/bandwidth issue, well, wasn't an issue. Now, they have more work to do.

[EDIT]

It'll certainly be interesting. The P4 has as much L2 cache as the P3. The P4's L2 cache has the same latencies (per clock), but twice the bandwidth per clock. This is because it transfers data EVERY cycle, whereas the Cumine only did it on every other cycle.

The latencies on RAMBUS aren't quite as bad as you'd at first expect, because of the fact that it is in a dual channel form. I don't fully understand it, but when you add another channel, it does more than double bandwidth: it also lowers latencies. Its like a beefed up i840 as far as I know.

But its not RAMBUS technology that sucks. Generally speaking, its Intels implementation of it that sucks. If RAMBUS sucked so bad, API would not be using it. I've read a post where an API architect even said the only problem with RAMBUS for x86 was the fact that intel doesn't know how to deal with it, and that the chipsets are crap for it (I paraphrased a lot there).

mindless · Nov 20, 2000

quick question.

How is it that a benchmark run wit a core optimised for all CPU's but one is as good as gold, but one run where that single CPU has the benchmark comiled with a new compiler to make use of its new "technology" is completely disregarded?

Im not saying the Intel benches should be used, because they shouldn't, Im saying any program that has to be recompiled with a new core to show the power of the CPU should only be included if ALL CPU's tested have a core. The P4 has no special core, and is relying on one that was designed before we knew anything about it. Now before I get flamed for this, I am not saying the P4 will completely rule RC5 (would be nice if it did though, Id consider one if it kicked the crap out of a G4) because I have nothing to base it on, but what I am saying is it could be decent. Don't rule something out because of an unfair benchmark.

Train · Nov 20, 2000

mindless, you are totally right, people are not even giving this chip a chance, remember that intel plans on this core lasting them a good 3 or 4 years at least, possibly 6. And they also dont see this becoming a mainstrean chip until like 2002. With that in mind, remember how slow comps were when the P6 core first came out. Todays applications choke the old Pentium Pro 166, but what are the applications of 3 years from now going to need? Well no one is sure, but Intel has a good idea, and they are betting/hoping that what they predicted will come true, because if it does, the P4 will be one hell of a core.

BurntKooshie · Nov 20, 2000

mindless - I thought I basically said that....

[EDIT] Oh yeah, and while I generally agree about it "not being fair"...well, people weren't whining when the Athlon came out, because it did suprisingly well without its own core. Once it got its own core, its performance went up another 10%. Now that the P4 does not do well, its suddenly "unfair"?

I actually think the P4 could do significantly better in RC5, despite what Anand said, for my reasons stated above.

MTP · Nov 20, 2000

I can't believe you people could come to a conclusion so fast!!

DUH!!!!

I am almost to the point of calling you guys idiots

Wait for an optimized core, THEN flame Intel

The last 3 posts were good though

-edit-

The P4 has a TOTALLY diffrent architecture than the Athlon, It did resonably well at the beginning because it was a lot like the PIII's, the P4 is TOTALLY different

Moose · Nov 20, 2000

Can I just ask one simple question of everyone who did benchmarks of the P4?

Did you run the benchmark with all cores (dnetc -bench rc5) or did you run the autodetect benchmark or set the core by hand (dnetc -benchmark)?

If you could post all of your benchmark numbers (dnetc -bench rc5) I'd be interested in seeing. Also if someone has numbers on OGR (dnetc -bench ogr) I would enjoy that as well.

thanks!
moose

ViRGE · Nov 20, 2000

Moose, Anand is the only person that benched the P4(heck, he's the only person with a P4). However, if you need some data to make a P4 core with, I'll see what I can coax Anand out of.

PS As far as I know of, he just let the client autodetect the CPU

Wolfie · Nov 20, 2000

My goodness. My Pii450@504 gets better keyrate then that. Now I feel cool cause my CPU is bettah!!!!!

Wolfie

BurntKooshie · Nov 20, 2000

<< I can't believe you people could come to a conclusion so fast!!

DUH!!!!

I am almost to the point of calling you guys idiots

Wait for an optimized core, THEN flame Intel >>

I really shouldn't feel this way, but I almost feel as though that's a personal attack on my posts....

Here are some quotes from my posts that I think people either just ignored, or glossed over.

1)"...the P4, without a good core (I assume he just let it autoselect, and I'm curious which one he tried) is about as fast, per clock, as A WINCHIP CLASSIC"

2)"SSE2 has 128 bit integer SIMD....which is what Altivec has which allows it to tear through WU's. Only time will tell if that's possible for the P4 (I don't know enough about programming to know " (I said this assuming people would understand that I was talking about getting the core bitsliced, in much the same way that mmx allowed DES to be bitsliced, and how Altivec allows RC5 to be bitsliced).

As for the "wait for an optimized core" part, why is it that it was perfectly fine to compare the Athlon, without its own core, to everything else? No one seemed to be having a hissy fit about it. Riiiiiiight. That's because it did well. I guess doing well precludes people from getting irate about the performance. But as soon as it doesn't turn out well, people cry "foul!"

No, Anand didn't benchmark RC5 correctly (at least, so it appears). I implied that, trying to say it without making myself look like I was badmouthing him. Then at the very end, ViRGE says that Anand probably just let it autoselect....did I not say that as well?

I just don't get it when people just gloss over what I say.... I am not always right and I'll be the first to admit that....but I always put reasoning behind my posts...
Am I invisible here????

BoberFett · Nov 20, 2000

Let's face it, it's not looking good for the P4. Even if it doubles its performance in RC5, it still won't catch up to the Athlon. And doubling the performance is being pretty generous. Add to that the high cost of admission compared to the current prices for Athlons, and you've got yourself a very expensive, not so great cracker.

Train · Nov 20, 2000

you guys are talking NOW, remember this chip is not even supposed to be Mainstream until 2002, thats 2 years away folks, you Athlon junkies cant see that far ahead.

BoberFett · Nov 20, 2000

You think AMD will just sit on their hands for the next two years?

Moose · Nov 20, 2000

We have no idea what core was selected. The autoselect is not very good at finding the right core when the chip is not know. I've seen a new chip show up as a 486 in testing and get an horrible keyrate. After the optimal core was found and the autoselector was told about it the keyrate went way up. I don't beleive we have enough info to say its sucks and what not. So how about we find some numbers then have this conversation sometime later.

We will work to get all the info we can on the P4 and find a way to increase the numbers for our client. That will take time and we will need access to a P4 machine. If you know someone who wants to help, please feel free to download the source and take a look at the core see andmake any improvements for the P4.

We have always been commited to optimizing cores and we will continue to optimize.

Thanks
paul

RC5 + P4

Diamond Member

Lifer

Super Moderator<br>Elite Member

Senior member

Golden Member

Diamond Member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Lifer

Diamond Member

Member

Member

Elite Member, Moderator Emeritus

Platinum Member

Diamond Member

Lifer

Lifer

Lifer

Member