java test: unexpected results on AMD CPUs

DrMrLordX

Lifer
Apr 27, 2000
21,924
11,426
136
Mathtester has moved. The discussion here relates to older builds of the program.

Note: The unexpected Jaguar(AMD E1-2500) and Stars(x2 220) results have been explained. Jaguar was scoring very high in IntegerLoop due to improvements in the way it handles division operations. So, you can ignore the second half of the topic.

Okay, so I put together a really simple test to see how fast some basic mathematical operations could be carried out under Java. It doesn't take that long to run, and the latest builds will have a .bat file to make startup easy under Windows (you Linux folks should know how to use the command line. Right? Right).

The newest build (10/3) is available now which has result logging and working warmup. The .class files:

https://www.dropbox.com/s/5hvxpnnl2kap50x/mathtester1032014classes.zip?dl=0

There is now a .bat file included with the .class files so that running it is as easy as unzipping the archive and double-clicking the .bat (for all you Windows users out there). All it does is type "java mathtester.Mathtester" but whatever. And yeah it requires JRE 1.8.

If you want the source:

https://www.dropbox.com/s/l3wlxax4mpigzb9/mathtester1032014source.zip?dl=0

Thanks to Maximilian, there is a GUI build of the 9/26 build available here:

https://www.dropbox.com/s/bkjhe5l2ycixm6e/benchmark2609.jar?dl=0

It also requires JRE version 1.8 to run if you're interested. The source for Maximilian's GUI version is here:

https://www.dropbox.com/sh/pwbe2pr42z4h6xb/AADv7_wCU7ONZqeZZvaVzzIaa?dl=0

The test attempts to run 48 different worker threads in a thread pool via ExecutorService. Each thread contains two loops:

A do while loop that iterates 16777216 times
A do while loop nested inside the first do while loop that iterates 128 times per iteration of the main loop, for a total of 2147483648 iterations per thread.

During each iteration of the while loop, the worker thread behaves according to the code version (normal vs nodiv)
normal: add + add, mult + add, div + add, add, add, add*
nodiv: add + add, mult + add, add, add, add*
*The last add is always an integer add(increment++)

Whether or not the instruction is integer or float based varies according to the part of the code path in execution. Each path runs one set of integer loops, followed by one set of float loops.

Currently there are seven code paths:
OriginalCode uses what is essentially the first run of code, with an ExecutorService and a thread.sleep() command to wait for completion of all worker threads.
OriginalCodeNoDiv is the same thing, but it runs worker threads that perform no division.
LatchEnabled is the same as OriginalCode except that it replaces thread.sleep() with a CountDownLatch and .await().
LatchEnabledNoDiv is LatchEnabled running the nodiv worker threads.
CastFloatToInt is mixed-mode and converts floats to ints via casting. It has integer iterators.
RoundFloatToInt is the same as CastFloatToInt, but it uses Math.round() instead of casting.
CastIntToFloat is mixed-mode and converts ints to floats via casting. It has float iterators.

All three of the cast/round paths have nodiv variants.

Note: there is now a separate set of tests that I will describe in greater detail later as time permits, perhaps in another thread, if there is one. Feel free to examine the source code to see what it does. Long story short: we've isolated individual tests (int add, int mult, etc) and the test has only 10 internal loops instead of 128. They're very quick.

Bugs/Other Problems:
Numerous problems have been encountered. A few have been quashed. A summary:

float1 and float2 in the various float-based worker thread variants were failing to iterate past 2^32. This issue has been addressed by nesting loops and resetting float1 and float2 per iteration of the main for loop.

There was an open concern that thread.sleep() was chewing up a lot of CPU time. So long as the thread count stays high(48 or higher), it doesn't seem to make a big difference. Different threading models have shown practically no improvement, though I have yet to try Futures. The CountDownLatch code as been removed.

There was concern that division operations were dominating the test results. To compensate, I have introduced nodiv variants of all tests so that people may compare and contrast div vs nodiv versions of the test at their leisure.

It always bugged me that part of the FloatLoop code involved an integer operation (increment++). I've switched increment to a float, and it is functioning properly.

I've restructured the loops to iterate the for look 2^23 times and the while loop 2^7 times. The net effect is that there are no more int rollovers or float operations involving Infinity.

I removed the if test designed to prevent division by 0 in the division-enabled tests. To prevent division by zero, int2 and float2 start at 1 and reset to 1 at the completion of each internal loop.

I converted all loops to do while loops to reduce overall loop iteration overhead.

Logging to a simple .txt file is now functional. The code warms itself up, so loops should be fully optimized on the first "actual" run.

Older builds:

https://www.dropbox.com/s/0coj24zbag39x23/mathtester.zip?dl=0
The 9/22 build, .class files. No .bat file is included.

https://www.dropbox.com/s/tzj5auejc20kjdw/mathtestercode.zip?dl=0
Source for the 9/22 build.

https://www.dropbox.com/s/vuopff5wlvswi7p/mathtester9242014fixedclass.zip?dl=0
.class files for the 9/24 build. Has some fatal flaws that skew performance. Uses a ton of memory compared to the other builds.

https://www.dropbox.com/s/bbhgmiqf5fd61wx/mathtester9242014fixedsource.zip?dl=0
Source for the 9/24 build.

https://www.dropbox.com/s/cnf01pbdjze87xz/mathtester9252014classes.zip?dl=0
.class files for the 9/25 build. Still uses integer loop increments in the FloatLoop variants, and still has some
int rollovers and float operations at Infinity.

https://www.dropbox.com/s/3boychgsejgmeb0/mathtester9252014source.zip?dl=0
Source for the 9/25 build.

https://www.dropbox.com/s/9vqbhvjxoacvhgk/mathtester9262014classes.zip?dl=0
.class files for the 9/26 build. Slower than 9/27 due to if tests and the for/while loop structure. Doesn't have the cast/round code paths.

https://www.dropbox.com/s/m8vf6yhbpclqf5e/mathtester9262014source.zip?dl=0
Source for the 9/26 build.

https://www.dropbox.com/s/zy8dqha25wg343q/mathtester9272014classes.zip?dl=0
.class files for the 9/27 build.

https://www.dropbox.com/s/wstb2erq2asu7ds/mathtester9272014source.zip?dl=0
Source for the 9/27 build.

https://www.dropbox.com/s/d446msmyq9z211n/mathtester9302014classes.zip?dl=0
.class files for the 9/30 build.

https://www.dropbox.com/s/w2rpoeaybmiwlow/mathtester9302014source.zip?dl=0
Source for the 9/30 build.

https://www.dropbox.com/s/i96bfq2xnycn223/benchmark3.jar?dl=0
Maximilian's GUI version of 9/22.

https://www.dropbox.com/s/bkjhe5l2ycixm6e/benchmark2609.jar?dl=0
Maximilian's GUI verion of 9/26.

Performance:

Below are some results that have been submitted in this thread. They are organized by user and code revision.

Code:
[b]DrMrLordX[/b]

9/22 build:
Stars chip:

Original code:
IntegerLoop: 582682 ms
FloatLoop: 217536 ms

Original, no division:
IntegerLoop:56775 ms
FloatLoop: 95434 ms

Latch code:
IntegerLoop: 582138 ms
FloatLoop: 216541 ms

Latch code, no division:
IntegerLoop: 56677 ms
FloatLoop: 96114 ms

Jaguar chip:

Original code:
IntegerLoop: 531171 ms
FloatLoop: 655941 ms

Original, no division:
IntegerLoop: 327075 ms
FloatLoop: 308441 ms

Latch code:
IntegerLoop: 522423 ms
FloatLoop: 635689 ms

Latch code, no division:
IntegerLoop: 321423 ms
FloatLoop: 304188 ms

9/25 build (9/24 results omitted: they were flawed)

Stars chip:

It took 545133 milliseconds to complete IntegerLoop.
It took 141815 milliseconds to complete FloatLoop.
It took 26995 milliseconds to complete IntegerLoopNoDiv.
It took 76675 milliseconds to complete FloatLoopNoDiv.
It took 544263 milliseconds to complete IntegerLoopWithLatch.
It took 139666 milliseconds to complete FloatLoopWithLatch.
It took 27023 milliseconds to complete IntegerLoopWithLatchNoDiv.
It took 76350 milliseconds to complete FloatLoopWithLatchNoDiv.

Total execution time for your selection is 1577920 milliseconds.

Jaguar chip:

It took 608504 milliseconds to complete IntegerLoop.
It took 583426 milliseconds to complete FloatLoop.
It took 125737 milliseconds to complete IntegerLoopNoDiv.
It took 231941 milliseconds to complete FloatLoopNoDiv.
It took 600236 milliseconds to complete IntegerLoopWithLatch.
It took 579095 milliseconds to complete FloatLoopWithLatch.
It took 125735 milliseconds to complete IntegerLoopWithLatchNoDiv.
It took 228563 milliseconds to complete FloatLoopWithLatchNoDiv.

Total execution time for your selection is 3083237 milliseconds.

9/26 build:

Stars chip:

It took 377182 milliseconds to complete IntegerLoop.
It took 199608 milliseconds to complete FloatLoop.
It took 40406 milliseconds to complete IntegerLoopNoDiv.
It took 100853 milliseconds to complete FloatLoopNoDiv.
It took 353902 milliseconds to complete IntegerLoopWithLatch.
It took 171373 milliseconds to complete FloatLoopWithLatch.
It took 86568 milliseconds to complete IntegerLoopWithLatchNoDiv.
It took 99511 milliseconds to complete FloatLoopWithLatchNoDiv.

Total execution time for your selection is 1429403 milliseconds.

Jaguar chip:

It took 614220 milliseconds to complete IntegerLoop.
It took 592927 milliseconds to complete FloatLoop.
It took 148752 milliseconds to complete IntegerLoopNoDiv.
It took 466238 milliseconds to complete FloatLoopNoDiv.
It took 604581 milliseconds to complete IntegerLoopWithLatch.
It took 586611 milliseconds to complete FloatLoopWithLatch.
It took 149063 milliseconds to complete IntegerLoopWithLatchNoDiv.
It took 473314 milliseconds to complete FloatLoopWithLatchNoDiv.

Total execution time for your selection is 3635706 milliseconds.

9/27 build:

Beginning entire batch.
It took 147884 milliseconds to complete IntegerLoop.
It took 176483 milliseconds to complete FloatLoop.
It took 37215 milliseconds to complete IntegerLoopNoDiv.
It took 102077 milliseconds to complete FloatLoopNoDiv.
It took 108602 milliseconds to complete IntegerLoopWithLatch.
It took 169686 milliseconds to complete FloatLoopWithLatch.
It took 36515 milliseconds to complete IntegerLoopWithLatchNoDiv.
It took 100453 milliseconds to complete FloatLoopWithLatchNoDiv.
It took 194541 milliseconds to complete CastFloatToInt.
It took 114170 milliseconds to complete CastFloatToIntNoDiv.
It took 400444 milliseconds to complete RoundFloatToInt.
It took 263148 milliseconds to complete RoundFloatToIntNoDiv.
It took 398612 milliseconds to complete CastIntToFloat.
It took 134204 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 2384034 milliseconds.

Jaguar chip:

Beginning entire batch.
It took 353819 milliseconds to complete IntegerLoop.
It took 587758 milliseconds to complete FloatLoop.
It took 147106 milliseconds to complete IntegerLoopNoDiv.
It took 481985 milliseconds to complete FloatLoopNoDiv.
It took 340907 milliseconds to complete IntegerLoopWithLatch.
It took 591690 milliseconds to complete FloatLoopWithLatch.
It took 146562 milliseconds to complete IntegerLoopWithLatchNoDiv.
It took 477767 milliseconds to complete FloatLoopWithLatchNoDiv.
It took 977271 milliseconds to complete CastFloatToInt.
It took 428923 milliseconds to complete CastFloatToIntNoDiv.
It took 1513947 milliseconds to complete RoundFloatToInt.
It took 1010457 milliseconds to complete RoundFloatToIntNoDiv.
It took 861010 milliseconds to complete CastIntToFloat.
It took 600603 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 8519805 milliseconds.

9/30 build:

Stars chip:

"Awesomeballs" test:

Beginning entire batch.
It took 60318 milliseconds to complete IntegerLoop.
It took 178747 milliseconds to complete FloatLoop.
It took 34547 milliseconds to complete IntegerLoopNoDiv.
It took 98585 milliseconds to complete FloatLoopNoDiv.
It took 190617 milliseconds to complete CastFloatToInt.
It took 103623 milliseconds to complete CastFloatToIntNoDiv.
It took 407643 milliseconds to complete RoundFloatToInt.
It took 271224 milliseconds to complete RoundFloatToIntNoDiv.
It took 415065 milliseconds to complete CastIntToFloat.
It took 130431 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 1890800 milliseconds.

Isolated test:

Beginning batch mode (all tests).
It took 19 milliseconds to complete integer addition.
It took 33 milliseconds to complete integer multiplication.
It took 28 milliseconds to complete integer division.
It took 17 milliseconds to complete float addition.
It took 22 milliseconds to complete float multiplication.
It took 8855 milliseconds to complete float division.
It took 13073 milliseconds to complete rounding float to integer.
It took 6586 milliseconds to complete casting float to integer.
It took 25 milliseconds to complete casting integer to float.

Total execution time for your selection is 28658 milliseconds.

Jaguar chip:

"Awesomeballs" test:

Beginning entire batch.
It took 327581 milliseconds to complete IntegerLoop.
It took 586038 milliseconds to complete FloatLoop.
It took 143055 milliseconds to complete IntegerLoopNoDiv.
It took 495517 milliseconds to complete FloatLoopNoDiv.
It took 979270 milliseconds to complete CastFloatToInt.
It took 428087 milliseconds to complete CastFloatToIntNoDiv.
It took 1567989 milliseconds to complete RoundFloatToInt.
It took 1028298 milliseconds to complete RoundFloatToIntNoDiv.
It took 857251 milliseconds to complete CastIntToFloat.
It took 587177 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 7000263 milliseconds.

Isolated test:

Beginning batch mode (all tests).
It took 126 milliseconds to complete integer addition.
It took 136 milliseconds to complete integer multiplication.
It took 137 milliseconds to complete integer division.
It took 136 milliseconds to complete float addition.
It took 137 milliseconds to complete float multiplication.
It took 38160 milliseconds to complete float division.
It took 51780 milliseconds to complete rounding float to integer.
It took 29447 milliseconds to complete casting float to integer.
It took 173 milliseconds to complete casting integer to float.

Total execution time for your selection is 120232 milliseconds. 

10/3 build:

Stars chip:

It took 29 milliseconds to complete integer addition.
It took 24 milliseconds to complete integer multiplication.
It took 32 milliseconds to complete integer division.
It took 23 milliseconds to complete float addition.
It took 25 milliseconds to complete float multiplication.
It took 8852 milliseconds to complete float division.
It took 13083 milliseconds to complete rounding float to integer.
It took 6516 milliseconds to complete casting float to integer.
It took 24 milliseconds to complete casting integer to float.

Total execution time for your selection is 28608 milliseconds.

It took 60340 milliseconds to complete IntegerLoop.
It took 178892 milliseconds to complete FloatLoop.
It took 34555 milliseconds to complete IntegerLoopNoDiv.
It took 97999 milliseconds to complete FloatLoopNoDiv.
It took 205274 milliseconds to complete CastFloatToInt.
It took 103446 milliseconds to complete CastFloatToIntNoDiv.
It took 386211 milliseconds to complete RoundFloatToInt.
It took 270569 milliseconds to complete RoundFloatToIntNoDiv.
It took 412343 milliseconds to complete CastIntToFloat.
It took 131067 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 1880696 milliseconds.

Jaguar chip:

It took 119 milliseconds to complete integer addition.
It took 133 milliseconds to complete integer multiplication.
It took 145 milliseconds to complete integer division.
It took 181 milliseconds to complete float addition.
It took 129 milliseconds to complete float multiplication.
It took 38152 milliseconds to complete float division.
It took 51837 milliseconds to complete rounding float to integer.
It took 29739 milliseconds to complete casting float to integer.
It took 152 milliseconds to complete casting integer to float.

Total execution time for your selection is 120587 milliseconds.

It took 324612 milliseconds to complete IntegerLoop.
It took 585148 milliseconds to complete FloatLoop.
It took 144685 milliseconds to complete IntegerLoopNoDiv.
It took 483505 milliseconds to complete FloatLoopNoDiv.
It took 879674 milliseconds to complete CastFloatToInt.
It took 424309 milliseconds to complete CastFloatToIntNoDiv.
It took 1551693 milliseconds to complete RoundFloatToInt.
It took 1024541 milliseconds to complete RoundFloatToIntNoDiv.
It took 859844 milliseconds to complete CastIntToFloat.
It took 587406 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 6865417 milliseconds.

[b]Maximilian[/b]

9/22 build(GUI)

i5 4570 again

Starting OriginalCode run!
It took 116734 milliseconds to complete the Integer loop.
It took 115734 milliseconds to complete the Float loop.
Starting OriginalCodeNoDiv run!
It took 34405 milliseconds to complete the Integer loop.
It took 48270 milliseconds to complete the Float loop.
Starting LatchEnabled run!
It took 114748 milliseconds to complete the Integer loop.
It took 109579 milliseconds to complete the Float loop.
Starting LatchEnabledNoDiv run!
It took 27761 milliseconds to complete the Integer loop.
It took 46969 milliseconds to complete the Float loop.

9/26(GUI)

i5 4570 @ stock
Running batch mode (all code)...
It took 110155 milliseconds to complete IntegerLoop.
It took 78311 milliseconds to complete FloatLoop.
It took 19136 milliseconds to complete IntegerLoopNoDiv.
It took 57978 milliseconds to complete FloatLoopNoDiv.
It took 110787 milliseconds to complete IntegerLoopWithLatch.
It took 78052 milliseconds to complete FloatLoopWithLatch.
It took 18937 milliseconds to complete IntegerLoopWithLatchNoDiv.
It took 58678 milliseconds to complete FloatLoopWithLatchNoDiv.
Total execution time for your selection is 532034 milliseconds.

[b]Biostud[/b]

pre-9/22 build (same code as 9/22, but only runs OriginalCode)

i7-5820K @ 4.2GHz 

With HT (2nd run)

It took 57683 milliseconds to complete the Integer loop.
Starting Float bench!
It took 31436 milliseconds to complete the Float loop.
Stopped, total execution time was 1 minutes and 29 seconds.

No HT

It took 61189 milliseconds to complete the Integer loop.
Starting Float bench!
It took 56622 milliseconds to complete the Float loop.
Stopped, total execution time was 1 minutes and 57 seconds.

9/22 build(GUI)

Starting OriginalCode run!
It took 57396 milliseconds to complete the Integer loop.
It took 32628 milliseconds to complete the Float loop.
Starting OriginalCodeNoDiv run!
It took 16933 milliseconds to complete the Integer loop.
It took 20823 milliseconds to complete the Float loop.
Starting LatchEnabled run!
It took 57946 milliseconds to complete the Integer loop.
It took 31149 milliseconds to complete the Float loop.
Starting LatchEnabledNoDiv run!
It took 16914 milliseconds to complete the Integer loop.
It took 20979 milliseconds to complete the Float loop.

9/26 build:

5820K @ 4.4Ghz

Original code:
IntegerLoop: 53551 ms
FloatLoop: 40257 ms

Original, no division:
IntegerLoop:9422 ms
FloatLoop: 31806 ms

Latch code:
IntegerLoop: 48481 ms
FloatLoop: 40303 ms

Latch code, no division:
IntegerLoop: 9503 ms
FloatLoop: 31909 ms

9/30 build:

"Awesomeballs" test:

Beginning entire batch.
It took 38864 milliseconds to complete IntegerLoop.
It took 35691 milliseconds to complete FloatLoop.
It took 9997 milliseconds to complete IntegerLoopNoDiv.
It took 32312 milliseconds to complete FloatLoopNoDiv.
It took 29374 milliseconds to complete CastFloatToInt.
It took 22069 milliseconds to complete CastFloatToIntNoDiv.
It took 75265 milliseconds to complete RoundFloatToInt.
It took 52424 milliseconds to complete RoundFloatToIntNoDiv.
It took 59655 milliseconds to complete CastIntToFloat.
It took 31999 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 387650 milliseconds.

Isolated

It took 16 milliseconds to complete integer addition.
It took 29 milliseconds to complete integer multiplication.
It took 27 milliseconds to complete integer division.
It took 25 milliseconds to complete float addition.
It took 25 milliseconds to complete float multiplication.
It took 1977 milliseconds to complete float division.
It took 2555 milliseconds to complete rounding float to integer.
It took 1554 milliseconds to complete casting float to integer.
It took 14 milliseconds to complete casting integer to float.

Total 6222 milliseconds 

10/3 build:

Original

It took 17279 milliseconds to complete IntegerLoop.
It took 35492 milliseconds to complete FloatLoop.
It took 8396 milliseconds to complete IntegerLoopNoDiv.
It took 31688 milliseconds to complete FloatLoopNoDiv.
It took 29575 milliseconds to complete CastFloatToInt.
It took 21703 milliseconds to complete CastFloatToIntNoDiv.
It took 75033 milliseconds to complete RoundFloatToInt.
It took 51759 milliseconds to complete RoundFloatToIntNoDiv.
It took 56265 milliseconds to complete CastIntToFloat.
It took 31679 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 358869 milliseconds.

Isolated
It took 20 milliseconds to complete integer addition.
It took 19 milliseconds to complete integer multiplication.
It took 47 milliseconds to complete integer division.
It took 22 milliseconds to complete float addition.
It took 200 milliseconds to complete float multiplication.
It took 1963 milliseconds to complete float division.
It took 2734 milliseconds to complete rounding float to integer.
It took 1623 milliseconds to complete casting float to integer.
It took 16 milliseconds to complete casting integer to float.

Total execution time for your selection is 6644 milliseconds. 

[b]sefsefsefsef[/b]

Pre-9/22 build:

A10-5750m

It took 154648 milliseconds to complete the Integer loop.
It took 247604 milliseconds to complete the Float loop.

[b]Fox5[/b]

Pre-9/22 build:

On a Phenom II x6 1100T, I got:

It took 324227 milliseconds to complete the Integer loop.
It took 122040 milliseconds to complete the Float loop.

[b]SlowSpyder[/b]

9/22 build(GUI)

FX-9370 4.784 ghz

Starting OriginalCode run!
It took 43917 milliseconds to complete the Integer loop.
It took 62788 milliseconds to complete the Float loop.
Starting OriginalCodeNoDiv run!
It took 27448 milliseconds to complete the Integer loop.
It took 34409 milliseconds to complete the Float loop.
Starting LatchEnabled run!
It took 44446 milliseconds to complete the Integer loop.
It took 70787 milliseconds to complete the Float loop.
Starting LatchEnabledNoDiv run!
It took 27396 milliseconds to complete the Integer loop.
It took 28607 milliseconds to complete the Float loop

9/26 build:

A8-3400M (1.4GHz @ 1.075v / 2.3GHz turbo @ 1.375v) <- Factory settings and voltages.
~18 watts idle. ~39 watts IntergerLoop ~38 watts FloatLoop)
Running original code...
It took 578590 milliseconds to complete IntegerLoop.
It took 301283 milliseconds to complete FloatLoop.
Total execution time for your selection is 879873 milliseconds.


A8-3400M OC (1.667GHz @ 1.0375v / 2.733GHz turbo @ 1.3125v) <- P-states changed.
~16.5 watts idle. ~39 watts IntergerLoop. ~ )
Running original code...
It took 503550 milliseconds to complete IntegerLoop.
It took 260329 milliseconds to complete FloatLoop.
Total execution time for your selection is 763879 milliseconds.

Just ran the 9/26, clocks at 220MHz x 22 = 4840MHz.(FX-9370)

Running original code...
It took 47928 milliseconds to complete IntegerLoop.
It took 70033 milliseconds to complete FloatLoop.
Total execution time for your selection is 117961 milliseconds. 

[b]sm625[/b]

9/22 build:

LatchEnabledNoDiv
It took 53219 milliseconds to complete the Integer loop.
It took 74960 milliseconds to complete the Float loop.

Xeon X5550 2.67GHz

[b]Ramses[/b]

9/22 build(GUI)

fx9590/16gig, stock, all the power management stuff enabled, no tuning other than manuall set ram to 1600cas7.

Starting OriginalCode run!
It took 46767 milliseconds to complete the Integer loop.
It took 75830 milliseconds to complete the Float loop.
Starting OriginalCodeNoDiv run!
It took 27943 milliseconds to complete the Integer loop.
It took 36077 milliseconds to complete the Float loop.
Starting LatchEnabled run!
It took 46279 milliseconds to complete the Integer loop.
It took 78668 milliseconds to complete the Float loop.
Starting LatchEnabledNoDiv run!
It took 27699 milliseconds to complete the Integer loop.
It took 36730 milliseconds to complete the Float loop.

I was only able to roast one tiny marshmellow while it was running.


Edit: Second run just because.

Starting OriginalCode run!
It took 47488 milliseconds to complete the Integer loop.
It took 70959 milliseconds to complete the Float loop.
Starting OriginalCodeNoDiv run!
It took 28088 milliseconds to complete the Integer loop.
It took 31138 milliseconds to complete the Float loop.
Starting LatchEnabled run!
It took 45168 milliseconds to complete the Integer loop.
It took 75313 milliseconds to complete the Float loop.
Starting LatchEnabledNoDiv run!
It took 27796 milliseconds to complete the Integer loop.
It took 36480 milliseconds to complete the Float loop.

9/26 build:

Beginning entire batch.
It took 46642 milliseconds to complete IntegerLoop.
It took 87099 milliseconds to complete FloatLoop.
It took 15346 milliseconds to complete IntegerLoopNoDiv.
It took 45124 milliseconds to complete FloatLoopNoDiv.
It took 46036 milliseconds to complete IntegerLoopWithLatch.
It took 85097 milliseconds to complete FloatLoopWithLatch.
It took 15347 milliseconds to complete IntegerLoopWithLatchNoDi
It took 46140 milliseconds to complete FloatLoopWithLatchNoDiv.

Total execution time for your selection is 386831 milliseconds. 

[b]richierich1212[/b]

9/22 build(GUI)

3570K @ 4.3GHz (10 tabs open in Chrome, youtube streaming while test was running. Win7 64-bit):

Starting OriginalCode run!
It took 90794 milliseconds to complete the Integer loop.
It took 89126 milliseconds to complete the Float loop.

Starting OriginalCodeNoDiv run!
It took 31110 milliseconds to complete the Integer loop.
It took 37568 milliseconds to complete the Float loop.

Starting LatchEnabled run!
It took 90872 milliseconds to complete the Integer loop.
It took 89602 milliseconds to complete the Float loop.

Starting LatchEnabledNoDiv run!
It took 30918 milliseconds to complete the Integer loop.
It took 36732 milliseconds to complete the Float loop.

[b]Enigmoid[/b]

Pre-9/22 build(GUI)

3630qm @ 3.2 ghz
Original bench
Quote:
Starting Integer bench!
It took 114815 milliseconds to complete the Integer loop.
Starting Float bench!
It took 70992 milliseconds to complete the Float loop.
Stopped, total execution time was 3 minutes and 5 seconds.

[b]NostaSeronx[/b]

Factory FX-8320; 9/30 build

"Awesomeballs" test:

Beginning entire batch.
It took 73400 milliseconds to complete IntegerLoop.
It took 88730 milliseconds to complete FloatLoop.
It took 23446 milliseconds to complete IntegerLoopNoDiv.
It took 65422 milliseconds to complete FloatLoopNoDiv.
It took 85362 milliseconds to complete CastFloatToInt.
It took 40775 milliseconds to complete CastFloatToIntNoDiv.
It took 205737 milliseconds to complete RoundFloatToInt.
It took 130333 milliseconds to complete RoundFloatToIntNoDiv.
It took 107337 milliseconds to complete CastIntToFloat.
It took 83268 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 903810 milliseconds.

Isolated test:

Beginning batch mode (all tests).
It took 1933 milliseconds to complete integer addition.
It took 23 milliseconds to complete integer multiplication.
It took 22 milliseconds to complete integer division.
It took 22 milliseconds to complete float addition.
It took 22 milliseconds to complete float multiplication.
It took 4965 milliseconds to complete float division.
It took 6135 milliseconds to complete rounding float to integer.
It took 3614 milliseconds to complete casting float to integer.
It took 19 milliseconds to complete casting integer to float.

Total execution time for your selection is 16755 milliseconds.
 
Last edited:

Abwx

Lifer
Apr 2, 2011
11,446
4,203
136
Stars chip:
Integer: 583406 ms
Float: 218687 ms

Jaguar chip:
Integer: 543615 ms
Float: 659535 ms

Interesting, don't you think? Anyone else care a crack at it?

The difference in FP is possible only if the X2 has only two cores running, the integer results are completely off wether the X2 has 2 or 3 functional cores, all this assuming that your application is fully MThreaded, should be tested in other plateforms to have some clues.
 

Sequences

Member
Nov 27, 2012
124
0
76
I'm not sure what this test is supposed to accomplish. Your timings also include time to shutdown a 48 thread pool. Is that supposed to be included? Also, why 48 threads? Are you testing context switching performance as well?
 

DrMrLordX

Lifer
Apr 27, 2000
21,924
11,426
136
Your question? Why the relatively good jaguar score? Many variables.
Dont know if something like this made into java8
http://www.tomshardware.com/news/amd-oracle-java,18090.html
you could look into that (jag being gcn and all).

Project Sumatra is coming in 2015 with Java 9/1.9 .

As to the performance of the Jaguar chip, yeah, I find it to be rather interesting. I'd like more data points if anyone else is willing to run this code. Would it help if I provided .class files? I don't have anywhere to host files just now, though I guess I could fix that . . .

The difference in FP is possible only if the X2 has only two cores running, the integer results are completely off wether the X2 has 2 or 3 functional cores, all this assuming that your application is fully MThreaded, should be tested in other plateforms to have some clues.

I would like to see some other results as well, especially from other Stars chips and maybe Visheras or something. And Intel chips. Definitely Intel chips.

I'm not sure what this test is supposed to accomplish. Your timings also include time to shutdown a 48 thread pool. Is that supposed to be included? Also, why 48 threads? Are you testing context switching performance as well?

The thread shutdown is included in the report time. I haven't timed the startup, but the shutdown alone is about 700-800 ms. I made it 48 threads for two reasons:

1). To test context switching
2). To make sure the test could scale up to quad socket 12T-per-socket systems if I got a chance to test the code on such a machine. Really I should have made it 64 threads. It wouldn't be hard to fix that, though.

The overall purpose of the test was to see how well Java was performing on my system using some fairly common arithmetic operations. I had expected integer performance to be stronger, and it wasn't. It may be that my chip is bugged, but I'd like to see more Stars results before I jump to that conclusion.

Update: I disabled the third core (core 4) and re-ran the test. Performance worsened by the expected amount: both times increased to 150% from the previous run. I then locked the CPU completely, restored it to stock settings, and re-ran the test. The cores were abysmal at that point. Not pretty.
 

Maximilian

Lifer
Feb 8, 2004
12,603
9
81
i5 4570 @ stock, so that's either 3.2 or 3.6ghz, I don't know how far it turbos with 4 cores maxed.

Windows 8.1 JRE 1.8

It took 116053 milliseconds to complete the Integer loop.
It took 111856 milliseconds to complete the Float loop.
BUILD SUCCESSFUL (total time: 3 minutes 48 seconds)
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
On software side this is a "benchmark" that is way off. There is a boatload of results in Google about Java "micro benchmarks" and about how to design them properly. You certainly need warm up for methods to get compiled into native and ton of flags.

On hardware side, I am not sure there is point either at all. What is beeing tested? The thoughput of division op? It is very expensive operation and will completely dominate the loop, if you comment out addition/multiplication the results will come out ~same. So division breaks OoE, loop design breaks OoE, no chance to SIMD anything resulting into code straight from 1980 playbook.



P.S. and please never call Your own classes Float or Integer as they have their Java built in variants and hurt eyes a lot.
 

Borealis7

Platinum Member
Oct 19, 2006
2,901
205
106
don't forget that you have 1 thread that is "busy waiting" with the sleep(1) so in a dual-core CPU that would detrimentally affect your result.

perhaps instead of sleeping you can tell the Main thread to "join" on all the executor's threads. that would send the Main thread to suspension until all executors finish their jobs. i don't remember how you "join" with executor though...
 

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
On a Phenom II x6 1100T, I got:

It took 324227 milliseconds to complete the Integer loop.
It took 122040 milliseconds to complete the Float loop.
 

DrMrLordX

Lifer
Apr 27, 2000
21,924
11,426
136

I could not have named the test better myself, though it is hardly a benchmark (see Joe's comments below).

Your results are very strong compared to the Stars chip, which is not surprising given that you are running a Haswell. ARK reports a max turbo of 3.6 ghz, and a base clock of 3.2 ghz.

On software side this is a "benchmark" that is way off. There is a boatload of results in Google about Java "micro benchmarks" and about how to design them properly. You certainly need warm up for methods to get compiled into native and ton of flags.

It isn't a benchmark. I was careful not to use that word, since it isn't optimized for maximum throughput at all. As far as warm up is concerned, Oracle swears up and down in their FAQ that people don't need to/shouldn't do that. Experience tells Java developers otherwise. There are plenty of Java apps out there that use "fake" loop iterations to maximize performance once running the actual loops (or something similar).

For the purposes of this test, I see no real reason to attempt warm up of loops. Each loop is iterated over 2 billion times, and there are 48 different copies of the same loop running simultaneously. If it can't optimize within the first 5000-10000 iterations, then someone should probably throw a boot at Oracle (or just at me, for good measure). Even if the absence of a warm up procedure does produce increased run times, that could still be a useful element of the test.

FWIW I'll try a warm-up later after I mess with executor.awaitTermination and/or latches. Or maybe I'll do it first, who knows?

On hardware side, I am not sure there is point either at all. What is beeing tested? The thoughput of division op? It is very expensive operation and will completely dominate the loop, if you comment out addition/multiplication the results will come out ~same. So division breaks OoE, loop design breaks OoE, no chance to SIMD anything resulting into code straight from 1980 playbook.

You are correct that division dominates the benchmark performance. I sort of expected division to be slow, which is why I threw it in there. There is plenty of bad (or at least mediocre) code out there which is essentially a bunch of loops handling arithmetic operations that are not necessarily ordered properly for any sort of optimization, which can easily get you stuck running x87 (if the JVM still allows that, which I'm guessing it does). And hey, Java runs on 3 billion devices! Why not?

Originally I had planned to put together maybe 10-20 statements and pick randomly which one to execute per iteration of the loop, which would be even worse from an optimization point-of-view. But I was a bit too lazy to do that, and setting up something to test int/long vs float/double performance also looked like it might be at least vaguely informative. Maybe I'll go back and do what I have planned originally once I'm done mucking with this test.

Frankly I was a little surprised to see Stars fail so badly on the integer portion of the test. I was expecting it to choke on the fp math. There is still the possibility of the chip being bugged, but with it scaling as expected with clockspeed and # of cores, I'm beginning to think that's not the case.

P.S. and please never call Your own classes Float or Integer as they have their Java built in variants and hurt eyes a lot.

Glad you caught that. I altered the class names to get away from the wrapper class names. Thanks!

don't forget that you have 1 thread that is "busy waiting" with the sleep(1) so in a dual-core CPU that would detrimentally affect your result.

perhaps instead of sleeping you can tell the Main thread to "join" on all the executor's threads. that would send the Main thread to suspension until all executors finish their jobs. i don't remember how you "join" with executor though...

When using an ExecutorService, there are many different ways to monitor task completion. One option is to use a latch. Another is to use executor.awaitTermination, or you can use Futures ( .get() ). I suppose I could try all three to see which one provides better performance. The existing solution was just sort of thrown together.

As much as I would like the contents of IntegerLoop and FloatLoop to be, shall we say, delightfully sloppy/unoptimized, wasting a bunch of CPU time on thread.sleep seems a bad idea. That being said, the program should be running 49 different threads simultaneously. Assuming identical priority on all threads, having 1 thread out of 49 doing essentially nothing would probably not amount to a great waste of system resources. The larger the thread pool/number of tasks, the smaller the impact of wait commands.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,924
11,426
136
On a Phenom II x6 1100T, I got:

It took 324227 milliseconds to complete the Integer loop.
It took 122040 milliseconds to complete the Float loop.

Interesting! On your Stars chip, you get:

~0.376 ms spent on FloatLoop per ms spent on IntegerLoop.

On my Stars chip, I get:

~0.375 ms spent on FloatLoop per ms spent on IntegerLoop.

Almost identical. So we have a pattern. Stars sucks at unoptimized Java operations involving integer primitives.

That being said, your FloatLoop score is similar to Maximillian's i5-4750S. Are you running at stock speeds?

I get a java exception has occured.


Are you guys running a JRE older than 1.8? I got the same thing on the E1-2500 when it was running 1.7. I had to update.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
As much as I would like the contents of IntegerLoop and FloatLoop to be, shall we say, delightfully sloppy/unoptimized, wasting a bunch of CPU time on thread.sleep seems a bad idea. That being said, the program should be running 49 different threads simultaneously. Assuming identical priority on all threads, having 1 thread out of 49 doing essentially nothing would probably not amount to a great waste of system resources. The larger the thread pool/number of tasks, the smaller the impact of wait commands.

Actually the impact is pretty much constant waste of resources. If you have 2 cores to execute and 48 tasks to complete, doing 1000 context switches to check for completion will burn constant CPU time each second.

using stuff like

Code:
import java.util.concurrent.atomic.AtomicInteger;

public class Counter {
  
  private AtomicInteger count = new AtomicInteger(0);
  private final Object syncObj = new Object();

  public void incrementCount() {
    count.incrementAndGet();
  }
  
  public int getCount() {
    return count.get();
  }
  
  public void waitForZero() throws InterruptedException
  {
      synchronized (syncObj) {
          while (count.get() > 0) {
              syncObj.wait();
          }
      }
  }
  
  public void decrementCount()
  {
        int value = count.decrementAndGet();
        if (value == 0) {
            synchronized (syncObj) {
                if (value == 0) {
                    syncObj.notifyAll();
                }
            }
        }
    }
}
and incrementing in producer loop / decrementing in consumer threads / using waitForZero() is as good as lazy code can get.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
You are correct that division dominates the benchmark performance. I sort of expected division to be slow, which is why I threw it in there. There is plenty of bad (or at least mediocre) code out there which is essentially a bunch of loops handling arithmetic operations that are not necessarily ordered properly for any sort of optimization, which can easily get you stuck running x87 (if the JVM still allows that, which I'm guessing it does). And hey, Java runs on 3 billion devices! Why not?

....

Frankly I was a little surprised to see Stars fail so badly on the integer portion of the test. I was expecting it to choke on the fp math. There is still the possibility of the chip being bugged, but with it scaling as expected with clockspeed and # of cores, I'm beginning to think that's not the case.


Your test has nothing to do with real world code... Divisions are dead slow and only lately hardware is getting any improvement. It is 70-100 cycle NOT PIPELINED operation (compared to add/mul that can have 3-4 cycle latency and 1/2 cycle throughput). So all your test does is check divisor hardware implementation.

I have looked up latencies for division, just like I guessed - Jaguar has much improved divisor, capable of 12-43 cycle division, compared to 24-87 cycles on K10. Does not take a rocket scientist to see how 2 cores 1.4Ghz can match 3 cores @ 3Ghz+ if all you do is division.
 

sm625

Diamond Member
May 6, 2011
8,172
137
106
i5 4570 @ stock, so that's either 3.2 or 3.6ghz, I don't know how far it turbos with 4 cores maxed.

Windows 8.1 JRE 1.8

It took 116053 milliseconds to complete the Integer loop.
It took 111856 milliseconds to complete the Float loop.
BUILD SUCCESSFUL (total time: 3 minutes 48 seconds)

That just about sums it up. AMD was already falling behind with stars, before even switching to BD. They had serious integer problems, and their solution was to cripple their FPU. It would be nice to see the assembly code for both the i5 and the AMD chips. Could it have something to do with intel-only optimizations?
 

biostud

Lifer
Feb 27, 2003
18,566
5,229
136
It took 60166 milliseconds to complete the Integer loop.
It took 32046 milliseconds to complete the Float loop.

i7-5820K @ 4.2GHz
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
A10-5750m

It took 154648 milliseconds to complete the Integer loop.
It took 247604 milliseconds to complete the Float loop.

This workload uses a lot more floating point divide operations than you'd ever see in a real workload, so that's why Bulldozer's philosophy of sharing the FP unit seems to "fail" here. For integer, I'd say this laptop CPU compares quite favorably to the i5-4570 desktop CPU. Interesting results, but not really that meaningful, because this is the very definition of "synthetic."

EDIT: Compared to Haswell, in this test Piledriver seems to have just *barely* lower integer IPC. Too bad that doesn't pan out for them in real workloads.
 
Last edited:

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
Interesting! On your Stars chip, you get:

~0.376 ms spent on FloatLoop per ms spent on IntegerLoop.

On my Stars chip, I get:

~0.375 ms spent on FloatLoop per ms spent on IntegerLoop.

Almost identical. So we have a pattern. Stars sucks at unoptimized Java operations involving integer primitives.

That being said, your FloatLoop score is similar to Maximillian's i5-4750S. Are you running at stock speeds?





Are you guys running a JRE older than 1.8? I got the same thing on the E1-2500 when it was running 1.7. I had to update.

The cpu runs at 3.4ghz, or 3.7ghz turbo.
And I have my L3 cache overclocked to 2.4ghz.
 

DrMrLordX

Lifer
Apr 27, 2000
21,924
11,426
136
Your test has nothing to do with real world code... Divisions are dead slow and only lately hardware is getting any improvement. It is 70-100 cycle NOT PIPELINED operation (compared to add/mul that can have 3-4 cycle latency and 1/2 cycle throughput). So all your test does is check divisor hardware implementation.

I have looked up latencies for division, just like I guessed - Jaguar has much improved divisor, capable of 12-43 cycle division, compared to 24-87 cycles on K10. Does not take a rocket scientist to see how 2 cores 1.4Ghz can match 3 cores @ 3Ghz+ if all you do is division.

Just because you mentioned it, I've implemented a "nodiv" version of the code (see update in OP or below once I finish with it). The results reflect many of the facts you're stating here. I've also implemented a latch mechanism as a replacement for the original thread.sleep() code. Those results follow as well.

That just about sums it up. AMD was already falling behind with stars, before even switching to BD. They had serious integer problems, and their solution was to cripple their FPU. It would be nice to see the assembly code for both the i5 and the AMD chips. Could it have something to do with intel-only optimizations?

In all fairness to AMD, the 1100T results for FloatLoop line up pretty well with the i5-4750. Too bad it took Stars two more cores to pull off a similar score.

It took 60166 milliseconds to complete the Integer loop.
It took 32046 milliseconds to complete the Float loop.

i7-5820K @ 4.2GHz

Impressive, though somewhat perplexing. The only architectural difference between your chip and Maximilian's is that yours has HT and his does not. On his i5, the FloatLoop and IntegerLoop performance is about the same, but on yours, FloatLoop is almost twice as fast. Hmm!

A10-5750m

It took 154648 milliseconds to complete the Integer loop.
It took 247604 milliseconds to complete the Float loop.

This workload uses a lot more floating point divide operations than you'd ever see in a real workload, so that's why Bulldozer's philosophy of sharing the FP unit seems to "fail" here. For integer, I'd say this laptop CPU compares quite favorably to the i5-4570 desktop CPU. Interesting results, but not really that meaningful, because this is the very definition of "synthetic."

EDIT: Compared to Haswell, in this test Piledriver seems to have just *barely* lower integer IPC. Too bad that doesn't pan out for them in real workloads.

It would be interesting to see how your chip would fare with the nodiv version of the code.



The cpu runs at 3.4ghz, or 3.7ghz turbo.
And I have my L3 cache overclocked to 2.4ghz.

Okay, so basically stock except for your NB. Good scores for an old chip!
 
Last edited:

janeuner

Member
May 27, 2014
70
0
0
Impressive, though somewhat perplexing. The only architectural difference between your chip and Maximilian's is that yours has HT and his does not. On his i5, the FloatLoop and IntegerLoop performance is about the same, but on yours, FloatLoop is almost twice as fast. Hmm!

It probably takes just as long for the non-FPU instructions to set up the FLOP as it does for the FLOP to complete on the FPU. Because Java.

We seem to have found a niche case where hyperthreading helps mask terrible byte-code.
 

DrMrLordX

Lifer
Apr 27, 2000
21,924
11,426
136
I've tweaked the code to account for some of the issues raised in this thread. There are now four codepaths:

The original code (save that the timer no longer includes the executor.shutdown() operation)
The original code with division operations removed
The code rewritten to use a CountDownLatch instead of a while loop containing thread.sleep()
CountDownLatch version with division operations removed

I ran all four paths on both available test machines and got the following results:

Stars chip:

Original code:
IntegerLoop: 582682 ms
FloatLoop: 217536 ms

Original, no division:
IntegerLoop:56775 ms
FloatLoop: 95434 ms

Latch code:
IntegerLoop: 582138 ms
FloatLoop: 216541 ms

Latch code, no division:
IntegerLoop: 56677 ms
FloatLoop: 96114 ms

Jaguar chip:

Original code:
IntegerLoop: 531171 ms
FloatLoop: 655941 ms

Original, no division:
IntegerLoop: 327075 ms
FloatLoop: 308441 ms

Latch code:
IntegerLoop: 522423 ms
FloatLoop: 635689 ms

Latch code, no division:
IntegerLoop: 321423 ms
FloatLoop: 304188 ms

A few extra notes:

In the nodiv version of the code, the division operations are simply removed. Nothing replaces them.
I did try executor.awaitTermination(), but the results were inconclusive. Using a CountDownLatch seemed to work better. Haven't bothered with Futures yet but maybe later . . .

Anyway if anyone wants the .class files for the expanded version of the test:

https://www.dropbox.com/s/0coj24zbag39x23/mathtester.zip?dl=0

If you want the code (in case you don't trust my .class files):

https://www.dropbox.com/s/tzj5auejc20kjdw/mathtestercode.zip?dl=0

No GUI goodness here, but hey, I'm lazy.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |