In DrMrLordX's defense, if anything this shows what absurd crap can happen to simple loops in VMs or JITs. Overflow of int tanking perf? No easy IEEE 754 control? Weird as crap reactions to massive thread counts?
I may have over-estimated the expense of int overflow. It's something I need to isolate and test on an individual basis.
The 9/24 build is problematic. Not only was it running too few loop iterations, it also had serious issues with report time fluctuations, probably due to some flaws left over in my loop code from when I was experimenting with ways to get away from rounding failure at float1/float2 = 2^32. Some of my loop code was still doing float1 += float2 and float2 -= float1 instead of float1++ and float2++. So, the 9/24 results are at least partially garbage. The cool part was that spawning such a large number of threads still worked. Pity I hosed up some other parts of the code.
This problem carried over to what was going to be the 9/25 build. On top of that problem, I found yet another problem with how I arranged my nested while loop: all variable declarations moved inside the new for loop. Apparently this caused the VM to decide that the contents of the for loop were something it could afford to simply not carry out. It's easy to understand why. The while loop no longer performed any reads from or writes to variables or objects outside the for loop or even the runloops() method. I knew there was a problem when increasing the number of for loop iterations to over 600000 barely dented performance. That's over ten trillion while loop iterations per thread. No way that's getting done in a few seconds on an x2 220, unlocked or no.
Basically, if you've got a nested loop arrangement like this:
Code:
for (int j = 0; j < 1337; j++)
{
int stuff = 0;
int morestuff = 0;
while (stuff < 31337)
{
stuff++;
morestuff++;
}
}
the VM is going to ignore the while loop. Or, at least, it certainly APPEARS to do that.
The feared float1++/float2++ NOP problem became a very real problem in which the entire loop was apparently reduced to a NOP. To correct the problem, I moved all variable declarations out of runloops() and, viola, the VM is no longer skipping the while loop, despite the fact that each iteration of the for loop resets all variables to 0.
Code performance is now falling more in line with what it was in version from before 9/24, though it is still a bit faster . . . which is interesting. You'd think it would get slower now that float1 and float2 are iterating properly, but no, it's faster. That's probably the real, measurable effect of reducing the number of int rollovers and operations involving a float at Infinity. Maybe. If I can find an elegant way to stop int3 from rolling over itself and float3 from reaching Infinity without adding if statements or increasing the number of operations per while loop iteration, I may do that next.
His code is open source and is becoming more messy the more real it gets. This is why it is more interesting than mickey mouse benches that are proprietary where no one validates functionality and forums like these slurp up results between ISAs as gospel comparisons.
Messy? Tell me about it. Of course, I'm the one messing it up, so I have no real reason to complain. Now that you mention it, I should put this code (and future versions) under GPL or Creative Commons or something.
I wish SPEC would come back. Till then, keep up these efforts. This exposes the massive flaws in the weak benchmarking we have available, and if he iterates enough with open feedback, he may yet have a more valid JIT cross compare than is avail elsewhere.
Hah! Well, we'll see where it all leads. For now it's just a tinker-toy. But it is fun tinkering with it. Speaking of which . . .
The new not-messed-up version of the Awesomeballs test is here! Hooray!
Here are the .class files. I've included a handy .bat file to run the command-line program for all you Windows users out there. Just unzip the contents and run the .bat file and away you go.
The source code is
here.
I do appreciate all the people who have taken the time to run this, and I'll keep working on it as time allows to make it more interesting and informative where possible.
Run times for the 9/25 build:
Stars chip:
It took 545133 milliseconds to complete IntegerLoop.
It took 141815 milliseconds to complete FloatLoop.
It took 26995 milliseconds to complete IntegerLoopNoDiv.
It took 76675 milliseconds to complete FloatLoopNoDiv.
It took 544263 milliseconds to complete IntegerLoopWithLatch.
It took 139666 milliseconds to complete FloatLoopWithLatch.
It took 27023 milliseconds to complete IntegerLoopWithLatchNoDiv.
It took 76350 milliseconds to complete FloatLoopWithLatchNoDiv.
Total execution time for your selection is 1577920 milliseconds.
Jaguar chip:
It took 608504 milliseconds to complete IntegerLoop.
It took 583426 milliseconds to complete FloatLoop.
It took 125737 milliseconds to complete IntegerLoopNoDiv.
It took 231941 milliseconds to complete FloatLoopNoDiv.
It took 600236 milliseconds to complete IntegerLoopWithLatch.
It took 579095 milliseconds to complete FloatLoopWithLatch.
It took 125735 milliseconds to complete IntegerLoopWithLatchNoDiv.
It took 228563 milliseconds to complete FloatLoopWithLatchNoDiv.
Total execution time for your selection is 3083237 milliseconds.
Hmm. Looks like some stuff actually got slower on the Jaguar, such as IntegerLoop.