java test: unexpected results on AMD CPUs

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

biostud

Lifer
Feb 27, 2003
18,566
5,227
136
Impressive, though somewhat perplexing. The only architectural difference between your chip and Maximilian's is that yours has HT and his does not. On his i5, the FloatLoop and IntegerLoop performance is about the same, but on yours, FloatLoop is almost twice as fast. Hmm!

Mine is a hex-core @ 4.2Ghz vs quad @ 3.2/3.6Ghz

Ah, you mean mine is twice as fast in float vs int, while no HT is the same.
 
Last edited:

cytg111

Lifer
Mar 17, 2008
23,834
13,335
136
It took 60166 milliseconds to complete the Integer loop.
It took 32046 milliseconds to complete the Float loop.

i7-5820K @ 4.2GHz

See, compared to Maximilian's results that makes no sense what so ever. (116053 / 111856). Too many variables here for this to be a viable benchmark of anything.

edit : allready pointed out I see ...
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,924
11,426
136
If there is any plausible explanation for it, I'm sure janeuner is furthest along in providing one.

Biostud, could you disable HT on your 5820k and re-run the original test? It's contained in the new code if you want to use it, or you can just re-use what you've already got, makes no difference to me.
 

SlowSpyder

Lifer
Jan 12, 2005
17,305
1,001
126
Just ran the GUI version of this non-benchmark for fun (not sure if the GUI was updated as mentinoned or not) on my FX 9370 currently running 210x24.5 for 5145MHz. Had to install Java 1.8 for it to work.

41861 milliseconds to complete Interger loop.
69894 milliseconds to complete Float loop.
Total - 1:50
 

DrMrLordX

Lifer
Apr 27, 2000
21,924
11,426
136
My guess is the GUI version has not yet been updated. That's up to Maximilian as to whether or not he wants to update that.

Your times are pretty darn good. Go Vishera go!
 

biostud

Lifer
Feb 27, 2003
18,566
5,227
136
With HT (2nd run)

It took 57683 milliseconds to complete the Integer loop.
Starting Float bench!
It took 31436 milliseconds to complete the Float loop.
Stopped, total execution time was 1 minutes and 29 seconds.

No HT

It took 61189 milliseconds to complete the Integer loop.
Starting Float bench!
It took 56622 milliseconds to complete the Float loop.
Stopped, total execution time was 1 minutes and 57 seconds.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,924
11,426
136
With HT (2nd run)

It took 57683 milliseconds to complete the Integer loop.
Starting Float bench!
It took 31436 milliseconds to complete the Float loop.
Stopped, total execution time was 1 minutes and 29 seconds.

No HT

It took 61189 milliseconds to complete the Integer loop.
Starting Float bench!
It took 56622 milliseconds to complete the Float loop.
Stopped, total execution time was 1 minutes and 57 seconds.

Fascinating. HT does practically nothing for the original IntegerLoop. I'll bet it would work out okay for the nodiv version.

https://www.dropbox.com/s/trnrpmc3ji9mlfw/benchmark2.jar?dl=0

There ya go chaps, updated GUI version, now with more accurate title.

Thanks!

I updated the OP to add that link, plus the console version and source code.
 
Last edited:

biostud

Lifer
Feb 27, 2003
18,566
5,227
136
Starting OriginalCode run!
It took 57396 milliseconds to complete the Integer loop.
It took 32628 milliseconds to complete the Float loop.
Starting OriginalCodeNoDiv run!
It took 16933 milliseconds to complete the Integer loop.
It took 20823 milliseconds to complete the Float loop.
Starting LatchEnabled run!
It took 57946 milliseconds to complete the Integer loop.
It took 31149 milliseconds to complete the Float loop.
Starting LatchEnabledNoDiv run!
It took 16914 milliseconds to complete the Integer loop.
It took 20979 milliseconds to complete the Float loop.
 

SlowSpyder

Lifer
Jan 12, 2005
17,305
1,001
126
My guess is the GUI version has not yet been updated. That's up to Maximilian as to whether or not he wants to update that.

Your times are pretty darn good. Go Vishera go!


Vishera has it's moments. I'll run all the tests when I'm home later just for the sake of comparison.
 

Maximilian

Lifer
Feb 8, 2004
12,603
9
81
i5 4570 again

Starting OriginalCode run!
It took 116734 milliseconds to complete the Integer loop.
It took 115734 milliseconds to complete the Float loop.
Starting OriginalCodeNoDiv run!
It took 34405 milliseconds to complete the Integer loop.
It took 48270 milliseconds to complete the Float loop.
Starting LatchEnabled run!
It took 114748 milliseconds to complete the Integer loop.
It took 109579 milliseconds to complete the Float loop.
Starting LatchEnabledNoDiv run!
It took 27761 milliseconds to complete the Integer loop.
It took 46969 milliseconds to complete the Float loop.
 

DrMrLordX

Lifer
Apr 27, 2000
21,924
11,426
136
Fascinating.

As has been cited earlier, there are so many variables in play that it is difficult to nail down exactly what is going on here, hence the questionable (if non-existent) value of this test as a benchmark. However, a number of interesting trends have emerged:

The use of a latch has produced very little gain on a tri-core Stars chip and on Haswell-E. On no-HT Haswell, it makes a pretty big difference, disproportionately leaning towards FloatLoop. On Jaguar dual-core, it makes a difference, but overall the difference is really pretty small (compared to what it did on no-HT Haswell, it's tiny).

When running the original code, the Haswell-E with HT disabled still has a larger bias towards FloatLoop than regular Haswell, though it is farr less pronounced than with HT enabled. If I had to guess, I'd figure that the i5-4750's turbo mode has something to do with this. We know Biostud's chip is going to be a flat 4.2 ghz, but there's no way of knowing how exactly turbo will behave while running this test. Regardless, it is clear that HT is helping with the original FloatLoop code.

Disabling turbo mode on the i5-4750 could provide some additional clues as to what's going on.

Also, sef's Piledriver performance is showing a trend very similar to SlowSpyder's. Namely:

On sef's chip, he's getting ~.625 ms spent on IntegerLoop operations per ms spent on FloatLoop operations running the original code.
On SlowSpyder's chip, he's getting ~.599 ms spent on IntegerLoop operations per ms spent on FloatLoop operations running the original code.

Pretty darn close. Would like to see Steamroller here if possible.
 
Last edited:

SlowSpyder

Lifer
Jan 12, 2005
17,305
1,001
126
My numbers below. Odd thing about this test, my CPU usage was almost always pegged at 100%. But my temps were quite low. I have Cool & Quiet running, sometimes it'd down clock to ~1500MHz but still show 100% load. It only did it a few times and briefly, so I don't think the numbers will be too affected. *edit - Now that I think of it, the low clocks might have been between tests. Numbers should be right.

Starting OriginalCode run!
It took 41732 milliseconds to complete the Integer loop.
It took 65356 milliseconds to complete the Float loop.
Starting OriginalCodeNoDiv run!
It took 34660 milliseconds to complete the Integer loop.
It took 44968 milliseconds to complete the Float loop.
Starting LatchEnabled run!
It took 63840 milliseconds to complete the Integer loop.
It took 90189 milliseconds to complete the Float loop.
Starting LatchEnabledNoDiv run!
It took 40849 milliseconds to complete the Integer loop.
It took 44362 milliseconds to complete the Float loop.
 
Last edited:

sm625

Diamond Member
May 6, 2011
8,172
137
106
Ugh. I just cannot get this to run. After installing the latest JDK, it just opens a command line for a split second and then its gone.
 

sm625

Diamond Member
May 6, 2011
8,172
137
106
Yep that is the one I used. But I figured it out anyway. You have to open it with javaw.exe, not java.exe

I did the last one only:

LatchEnabledNoDiv
It took 53219 milliseconds to complete the Integer loop.
It took 74960 milliseconds to complete the Float loop.''

Xeon X5550 2.67GHz

Hey its nice to know that AMD finally made a chip that can beat my 6 year old Xeon!
 
Last edited:

SlowSpyder

Lifer
Jan 12, 2005
17,305
1,001
126
Yep that is the one I used. But I figured it out anyway. You have to open it with javaw.exe, not java.exe

I did the last one only:

LatchEnabledNoDiv
It took 53219 milliseconds to complete the Integer loop.
It took 74960 milliseconds to complete the Float loop.''

Xeon X5550 2.67GHz

Hey its nice to know that AMD finally made a chip that can beat my 6 year old Xeon!


Let's not turn this thread into that.
 

DrMrLordX

Lifer
Apr 27, 2000
21,924
11,426
136
My numbers below. Odd thing about this test, my CPU usage was almost always pegged at 100%. But my temps were quite low.

That's pretty common actually. Just because the OS is reading 100% CPU utilization doesn't mean the code is keeping all execution resources busy. Think of how different stress tests produce different temperatures on the same CPU.

As a stress test, Awesomeballs is an abject failure. That's quite alright, as I never expected it to produce significant CPU temperatures.

Interestingly enough, I would like to point out that the E1-2500 machine I've used for testing hard-locks during any of the test modes. It would be interesting to put something like coretemp on there to see what happens in that department. The Jaguar really does work hard at this test.

I have Cool & Quiet running, sometimes it'd down clock to ~1500MHz but still show 100% load. It only did it a few times and briefly, so I don't think the numbers will be too affected. *edit - Now that I think of it, the low clocks might have been between tests. Numbers should be right.

That is probably the case, but you never know. The code is not streamlined.

Starting OriginalCode run!
It took 41732 milliseconds to complete the Integer loop.
It took 65356 milliseconds to complete the Float loop.
Starting OriginalCodeNoDiv run!
It took 34660 milliseconds to complete the Integer loop.
It took 44968 milliseconds to complete the Float loop.
Starting LatchEnabled run!
It took 63840 milliseconds to complete the Integer loop.
It took 90189 milliseconds to complete the Float loop.
Starting LatchEnabledNoDiv run!
It took 40849 milliseconds to complete the Integer loop.
It took 44362 milliseconds to complete the Float loop.

Wow. What? Using a latch killed your performance in everything except FloatLoopNoDiv. That is really weird. Would you be willing to re-run the tests with CnQ disabled?

Yep that is the one I used. But I figured it out anyway. You have to open it with javaw.exe, not java.exe

I did the last one only:

LatchEnabledNoDiv
It took 53219 milliseconds to complete the Integer loop.
It took 74960 milliseconds to complete the Float loop.''

Xeon X5550 2.67GHz

Thanks! Hmm. Your results are quite a bit different from the Haswell results. Would you be willing to run the other three tests? I'm curious as to how the latch is affecting your scores.
 
Last edited:

cytg111

Lifer
Mar 17, 2008
23,834
13,335
136
With HT (2nd run)

It took 57683 milliseconds to complete the Integer loop.
Starting Float bench!
It took 31436 milliseconds to complete the Float loop.
Stopped, total execution time was 1 minutes and 29 seconds.

No HT

It took 61189 milliseconds to complete the Integer loop.
Starting Float bench!
It took 56622 milliseconds to complete the Float loop.
Stopped, total execution time was 1 minutes and 57 seconds.

Damn .. HT almost doubles your float throughput .. But Why? For gods sake Why? It is pretty straight forward code, cant imagine alot of brach misses -> stalls -> HT time. So what gives? Is the JVM just crappy at constructing/JIT'ing floating point code? If its the context switching, why isnt integer effected? ... baffled.
 
Last edited:

SlowSpyder

Lifer
Jan 12, 2005
17,305
1,001
126
Wow. What? Using a latch killed your performance in everything except FloatLoopNoDiv. That is really weird. Would you be willing to re-run the tests with CnQ disabled?


I checked, I do actually have C&Q disabled. I've heard some things about this bios being a little borked. Might be the case. I ran it again, some numbers were a little higher, some a little lower. The cores would drop to 1500MHz during the latch tests. Don't know why. They were probably 85% of the time at 5.145GHz, but sometimes a few cores would drop, sometimes all. Temps are great, 47C is the highest I Coretemp registered. But most of the time it was lower than that, low 40's C. So, I'm not too sure what's happening there, maybe someone else with an FX 8xxx or 9xxx can run it. I think the clockspeed drops are what is hurting the numbers there.

I'll mess around with it more, see if I can't figure it out and get it to stay clocked up.
 

Ramses

Platinum Member
Apr 26, 2000
2,871
4
81
You guys are over my head on the Java internals but I'll contribute..

fx9590/16gig, stock, all the power management stuff enabled, no tuning other than manuall set ram to 1600cas7.

Starting OriginalCode run!
It took 46767 milliseconds to complete the Integer loop.
It took 75830 milliseconds to complete the Float loop.
Starting OriginalCodeNoDiv run!
It took 27943 milliseconds to complete the Integer loop.
It took 36077 milliseconds to complete the Float loop.
Starting LatchEnabled run!
It took 46279 milliseconds to complete the Integer loop.
It took 78668 milliseconds to complete the Float loop.
Starting LatchEnabledNoDiv run!
It took 27699 milliseconds to complete the Integer loop.
It took 36730 milliseconds to complete the Float loop.

I was only able to roast one tiny marshmellow while it was running.


Edit: Second run just because.

Starting OriginalCode run!
It took 47488 milliseconds to complete the Integer loop.
It took 70959 milliseconds to complete the Float loop.
Starting OriginalCodeNoDiv run!
It took 28088 milliseconds to complete the Integer loop.
It took 31138 milliseconds to complete the Float loop.
Starting LatchEnabled run!
It took 45168 milliseconds to complete the Integer loop.
It took 75313 milliseconds to complete the Float loop.
Starting LatchEnabledNoDiv run!
It took 27796 milliseconds to complete the Integer loop.
It took 36480 milliseconds to complete the Float loop.


FWIW I saw no dips from the stock 4700mhz but I was only using Core Temp to monitor visually.
It got warm and there was load going on for sure. More than I get with any actual application by far.

 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,924
11,426
136
Damn .. HT almost doubles your float throughput .. But Why? For gods sake Why? It is pretty straight forward code, cant imagine alot of brach misses -> stalls -> HT time. So what gives? Is the JVM just crappy at constructing/JIT'ing floating point code? If its the context switching, why isnt integer effected? ... baffled.

I think there may be a clue in some of that JoeRambo was saying earlier in the thread. He's running the code with division operations intact, and the division operations can't be pipelined . . . if I knew more about current implementations of HT, I might be able to carry on further in that line of thinking.

I checked, I do actually have C&Q disabled. I've heard some things about this bios being a little borked. Might be the case. I ran it again, some numbers were a little higher, some a little lower. The cores would drop to 1500MHz during the latch tests. Don't know why. They were probably 85% of the time at 5.145GHz, but sometimes a few cores would drop, sometimes all. Temps are great, 47C is the highest I Coretemp registered. But most of the time it was lower than that, low 40's C. So, I'm not too sure what's happening there, maybe someone else with an FX 8xxx or 9xxx can run it. I think the clockspeed drops are what is hurting the numbers there.

I'll mess around with it more, see if I can't figure it out and get it to stay clocked up.

Hmm! Curious as to why the CountDownLatch is making your CPU speed tank like that. One thing you could do is use an application like msrtweaker to alter your p-states. If you can figure out which p-state you're hitting during the latch tests, you can adjust it to max clockspeed and off you go. Sort of the same thing Kaveri owners are having to do to avoid downclocking during iGPU operation.

You guys are over my head on the Java internals but I'll contribute..

fx9590/16gig, stock, all the power management stuff enabled, no tuning other than manuall set ram to 1600cas7.

Starting OriginalCode run!
It took 46767 milliseconds to complete the Integer loop.
It took 75830 milliseconds to complete the Float loop.
Starting OriginalCodeNoDiv run!
It took 27943 milliseconds to complete the Integer loop.
It took 36077 milliseconds to complete the Float loop.
Starting LatchEnabled run!
It took 46279 milliseconds to complete the Integer loop.
It took 78668 milliseconds to complete the Float loop.
Starting LatchEnabledNoDiv run!
It took 27699 milliseconds to complete the Integer loop.
It took 36730 milliseconds to complete the Float loop.

I was only able to roast one tiny marshmellow while it was running.


Edit: Second run just because.

Starting OriginalCode run!
It took 47488 milliseconds to complete the Integer loop.
It took 70959 milliseconds to complete the Float loop.
Starting OriginalCodeNoDiv run!
It took 28088 milliseconds to complete the Integer loop.
It took 31138 milliseconds to complete the Float loop.
Starting LatchEnabled run!
It took 45168 milliseconds to complete the Integer loop.
It took 75313 milliseconds to complete the Float loop.
Starting LatchEnabledNoDiv run!
It took 27796 milliseconds to complete the Integer loop.
It took 36480 milliseconds to complete the Float loop.


FWIW I saw no dips from the stock 4700mhz but I was only using Core Temp to monitor visually.
It got warm and there was load going on for sure. More than I get with any actual application by far.

Looks like the latch isn't doing anything significant to boost your performance. I sort of expected that from Vishera, especially after what it did (or rather, didn't) do for Haswell-E. Also, the variations in completion time between your run . . . I wonder if that has anything to do with CnQ + Turbo being active or what. Might be. I'm not going to lose any sleep over it.

That being said, nice scores from the 9590 @ stock. It's interesting that you made such large gains on the nodiv test (even the one without the latch) versus SlowSpyder who beat you on the original IntegerLoop.

FWIW I'm still messing around with different ways to handle multiple threads in this test. Here is some preliminary data:

Reducing the number of threads in the pool causes performance to plummet for both the latch-enabled and original version of the code, though the latch-enabled code suffers less. It's rather obvious that using the while loop as in the original code really does come with a performance penalty, though that penalty is muted if you bury that one thread in a pile of four dozen other demanding threads. The latch seems to mitigate the effects somewhat, but reducing the thread pool from 48 threads to 3 still hurts performance. In other words, the latch comes with a performance penalty of its own which, again, can be limited by spawning an absurd number of other threads.

You would think forcing a limited number of cores to jump between threads frantically would make things worse, but in this case, it doesn't seem to do so.

I've started some testing by having the classes extend Thread so I can use the .setPriority() method to put a damper on the offending while loop while increasing the priority on IntegerLoop/FloatLoop. It looks promising so far. Hopefully I'll have some numbers up soon. It'll take a few minutes on the slow-arsed x2 though . . . and longer on the E1.

edit: using .setPriority() was extremely useful in cases when I tried setting the thread pool value down to the number of cores on the machine. Without using .setPriority(), even the latch code could be almost twice as slow. Using .setPriority(), I got it up to about 90-95% of its original speed. Using .setPriority() along with a pool value of 48 showed no notable difference (execution times were actually about two seconds slower this way). I guess I'm going to try abandoning the ExecutorService altogether and see what the performance is like if I just use join.
 
Last edited:

Ramses

Platinum Member
Apr 26, 2000
2,871
4
81
It's possible it's a CnQ/motherboard difference. I'm on a Sabertooth R2 and owned an Asrock Extreme9 before, they both treated thermal control very differently, possibly clock related functions too. Just a thought.

I had a browser with 47 tabs open while it ran and a dozen random things in the tray, wasn't exactly a clean run either go, but hey..
It definitely didn't back off clocks other than between tests and there was heat made fwiw. If I can assist further yell, I got cores to spare...
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Damn .. HT almost doubles your float throughput .. But Why? For gods sake Why? It is pretty straight forward code, cant imagine alot of brach misses -> stalls -> HT time. So what gives? Is the JVM just crappy at constructing/JIT'ing floating point code? If its the context switching, why isnt integer effected? ... baffled.


Without seeing generated assembly it is barely a guess, but in my opinion HT helps floating point much more due to these reasons:


1) Workload dominated by division and no chance for out of order execution since everything depends on everything
2) Integer version is working with 64bits and executing IDIV instructions that are comparatively pain in ass to execute and tie down registers and CPU execution ports, leaving low chance for HT to shine during division
3) Float version is working on 32bit values and executing DIVSS instructions that are very tight in what ports they use, leaving a lot of opportunities for HT sibling to execute.
4) Due to SIMD requirements FP divisor implementation is much more flexible and can probably work on two parallel divisions at same time, something that is not possible for IDIVs ( they both use same port on haswell afaik ).
 

richierich1212

Platinum Member
Jul 5, 2002
2,741
360
126
3570K @ 4.3GHz (10 tabs open in Chrome, youtube streaming while test was running. Win7 64-bit):

Starting OriginalCode run!
It took 90794 milliseconds to complete the Integer loop.
It took 89126 milliseconds to complete the Float loop.

Starting OriginalCodeNoDiv run!
It took 31110 milliseconds to complete the Integer loop.
It took 37568 milliseconds to complete the Float loop.

Starting LatchEnabled run!
It took 90872 milliseconds to complete the Integer loop.
It took 89602 milliseconds to complete the Float loop.

Starting LatchEnabledNoDiv run!
It took 30918 milliseconds to complete the Integer loop.
It took 36732 milliseconds to complete the Float loop.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |