III-V
Senior member
- Oct 12, 2014
- 678
- 1
- 41
One can afford to, the other cannot.Isn't intel doing the same thing?
Both companies are deciding to communicate less and less and keep their next moves more secretive.
One can afford to, the other cannot.Isn't intel doing the same thing?
Both companies are deciding to communicate less and less and keep their next moves more secretive.
Bad compilation with depreciated instructions.Yep, don't use Visual Studio 2008.
So why is this code running so bad on amd's uarch compared to intel's? Is there any optimization a that could be done to take advantage of amd's uarch without advanced extensions?
I have to just wonder what an evolution of Phenom II x 4/x6 would have been like on 32nm? Xtors capable of 5 Ghz coupled to a move to four wide front end (up from three wide) probably would have been pretty exciting. (With any luck we will see a return of this evolution).
In response to claims that 3DPM is biased against AMD made by Abwx.
3DPM is written, at its base level, very simply.
A for loop is declared to be multithreaded, and the code within the loop deals with x/y/z co-ordinates for trigonometric transforms on a struct with three main float class members.
I used the code to publish several scientific papers regarding electrochemical motion and interaction with surfaces. This is code written by a scientist, rather than a computer scientist with a background in code or programming languages. For lack of a better word, a self-taught noob. I'm a physical chemist first, programmer second.
So one could argue that the loops involved require integers, and the random number generator is predominantly bitshifts, but the bulk of the mathematics that takes time is basic floating point trig functions.
Disclosure: I'm the Senior Editor Ian Cutress on the main site. I don't visit the forums that much(!) If anyone wants to double confirm, I use this handle on Twitter as a secondary account as well as over at OCN. You can tweet me at @IanCutress or @borandi with this link and I'll respond.
This benchmark is heavily Cache oriented,
Celeron J1900 is 2x faster in Multithread than Celeron J1800 simple because 1900 has 2MB of L2 cache vs 1MB on the J1800.
And performance doubles again with ATOM C2750 that has 4MB L2 cache and 8 cores. Also, ATOM C2750 is faster than Core i5 2500K
This benchmark is heavily Cache oriented,
Celeron J1900 is 2x faster in Multithread than Celeron J1800 simple because 1900 has 2MB of L2 cache vs 1MB on the J1800.
And performance doubles again with ATOM C2750 that has 4MB L2 cache and 8 cores. Also, ATOM C2750 is faster than Core i5 2500K
Nonsense.
The J1800 is 2 cores, J1900 4 cores.
One of the algorithms looks like this, in mixed C++/pseudocode using OpenMP:
.
I used the code to publish several scientific papers
As pointed by Shintai the core count matters but there must be a hell of a CPU dspatching given the Bay trail score in respect of the 5800K, this is just impossible that it has 50-60% better IPC in FP, i would be the bench designer i would double check everything as it s obvious that it s litteraly a viral marketing bench in its current form, in his post Ian Curtress somewhat admit that he did only the mathematical part and that he doesnt actualy know how this bench is optimised/unoptimised in respect of code paths, for me it s obvious that the Intel CPU have a much better optimised path than the Piledriver, for the record i get a 71 single core score with a Pentium T4400 2.2, this suggest that ST FP perf of my T4400 is as good as a 4.1 GHz Piledriver, hence it has 80% more FP IPC at least, wich is just ridiculous.
sorry but this code makes no sense, for ex. the induction variable "j" isn't used in the inner loop but there is a series of overwrite to particle instead,
also using a statement like
if(particle.z < 0) {particle.z -= particle.z;}
for a clamp to 0.0 (or a buggy absolute value ?) looks beyond clumsy
I really hope this was a fixed version of this nonsense, after proper peer review
Piledriver has only two FPUs compared to the Baytrails 4. Piledriver and all the modular CPUs simply suck at this test, PII is significantly farther ahead.
Again as I said before this is complex mathematical code involving a lot of high latency instructions.
Piledriver has only two FPUs compared to the Baytrails 4. Piledriver and all the modular CPUs simply suck at this test, PII is significantly farther ahead.
Hint hint, Nobody really cares about how optimized the code is, only that it works and returns the correct values.
If it was an Intel CPU that was that disadvantaged we would hear a totaly different discourse, isnt it, it just show that ultra biaised benches do suit you as long as it s AMD that is handicapped...
Your explanations are just irrelevant, if Piledriver was that weak compared to a Baytrail in FP this would show in Cinebench or Povray.
You have Cinebench R10, 11.5 , R15 and Povray wich are FP, in the former three the 5800K is twice as fast as a J1800 and 2.5x with Povray, but keep on spreading your thoughts that 3D particle could be an accurate representation of thoses two CPUs respective FP perfs, sure that it will help increase your credibility.
http://www.anandtech.com/bench/product/675?vs=1227
Kabini with four(4) FPUs which are faster than Baytrail also sucks running this code.
And it seams Intel HT is working very well, unlike AMDs CMT.
Also, AMD Phenom X6 is faster than Quad Core Haswell
http://www.anandtech.com/show/8427/amd-fx-8370e-cpu-review-vishera-95w/2
this code is very obviously wrong, so it can't return correct values
but will it work?
obviously not, if you replace the "=" with "+=" it makes some sense, though
hint: the "starting position" comment where positions are initialized, this should be a summation and the code, as is, is clearly buggy as hell
Its very likely he simply extracted code from the program and its missing pieces.
I still don't understand why you do not like that line. The variable is negative. The line subtracts the negative variable from itself (-=) giving 0. Perhaps setting directly to 0 would be better but this looks to me like it should work (I am no expert on programming).
Its very likely he simply extracted code from the program and its missing pieces.
float particle_i_x = 0.0;
for (int j=0; j < 100; j++)
{
float alpha = 2 * pi * randgen();
particle_i_x = cosf(alpha);
}
double particle_i_x = 0.0;
for (int j=0; j < 100; j++)
{
float alpha = 2 * pi * randgen();
particle_i_x += cosf(alpha);
}
I agree with bronxzv, it's very very bad programming. Thee is no way around that fact I'm afraid.