"PhysX hobbled on CPU by x87 code"

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Feel free to correct me if I'm mistaken, but I don't think threads and core (thread parallelism) have anything to do with SSE (instruction parallelism) ...




I also agree : being quite familiar with ray tracing et al, I know for a fact that it's more often easy than not to extract instruction parallelism from already embarrassingly parallel computations, and it's quite feasible to strongly optimize critical sections of the code using SSE (given a little more work, it's even possible to keep the same source and switch between SSE and x87 using templates or ifdefs). It is thus a bit surprising that no effort have been made on that matter (or at least none made it to us mere mortals).

Just to be clear, I don't think nVIDIA (nor AEGIA for that matter) crippled anything ... They just don't have any motive to invest some time into something that's not their priority (CPU PhysX). That is what I don't understand, given how slow CPU PhysX is : developers would be more inclined to learn the PhysX API if they could benefit from acceptable performance on CPU (but again, maybe nVIDIA did some benchmarks and decided the performance gain was not worth the cost) ...




As for multi-threading, I didn't dig into the library, so I can't tell if it's almost free to implement with PhysX or if you have to struggle to make it happen. What I gathered from my readings is that some complain that multi-threading is not embedded in the engine (the library does not spawn threads on it's own). We developers all know how lazy we are ...

Finally, I do think the article is misleading, suggesting the code is *artificially* crippled ...

You have much to learn about people . They won't adderess AVX as its a threat to PX.
All you will here is how SSE isn't suited to this type of work. I not a programmer so I don't know . But code for SSE is probably easier than PX code .

This is a good question for the guys at Beyond 3D . Its a great place to get this info . But that forum has been invaded by want to bees . That just don't cut it.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
This is a good question for the guys at Beyond 3D . Its a great place to get this info .

Not really.
Beyond3D was started by Dave Baumann. Originally he was an ATi shill, but now he has an official function with AMD/ATi.
So officially Dave stepped back as forum administrator, but he's still pulling the strings behind the scenes.
Beyond3D is not exactly a good place to get good info on nVidia-related technology. It's very biased towards AMD.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
It was a trap scali I knew you would bite. I even thought you might mention Dave a good man . All tho I didn't know who would be first to reply you or the Force. Calling Dave a shill isn't right considering your a NV shill. Don't get me Wrong I enjoy your post and you are knowledgeable . Its like pot meet kettle . Same as the CPU section your attack against JFAMD. It was completely on called for as JF makes no secret about who he works for . Unlike AEG members who hide in secrecy . WHO are gifted to spread the word of NV light.

Kr@n you see what I mean about understanding people . The replies here reflect the ans. YOU got.

You have one warning for this admitted bait and accusation. Post carefully from here on out.
Anandtech Moderator - Keysplayr
 
Last edited by a moderator:

BenSkywalker

Diamond Member
Oct 9, 1999
9,140
67
91
All tho I didn't know who would be first to reply you or the Force.

I'll chime in with some corrections- Beyond3D was formed by Dave Barron and Kristof Beats- both of whom went on to work for 3dfx; after they collapsed Kristof went on to PowerVR and Dave went to work for a startup 3D company that has since been bought out by ATi(not sure if he is with them now or not). When Kristof and Dave went to 3dfx, Rev took over for a while(he was previously running a now gone site, The Pulpit) and after a while he decided it was time to move on and then Wavey took over which as everyone here likely knows led to his job at ATi(and his presence there saved the 4x00 parts from what would have been disaster). Nothing at all against the newer management, but it was nice back in the day when things were simpler and if we had a question about how something was handled in a game we could just shoot out an email to Carmack/Epic/Tarolli and have it answered and get some decent conversation going. The way things are now the reality of everyone's situation has made that less realistic, but it was nice back then.

Calling Dave a shill isn't right

It's Dave's job to be a shill, saying he is one is a compliment to his work ethic at this point.

Most of the guys over at Beyond 3D, at least the old school ones, simply believe in a certain technological direction. A lot of really smart guys, shockingly bad track record on seeing where the industry wants to go over the life of that site, but smart guys none the less. Their is a big difference between the reasoning over there, and most of the reasoning here. Over there, the people are interested in seeing the technology evolve in a certain direction, mainly minimalist approaches and just in time feature sets that place a high priority on elements that are the polar opposite of what a company like nV does. They don't support AMD because it is AMD, they like the direction they are headed in. This was true of Kristof and Dave in the past also, they simply liked the direction 3dfx was headed in(post processing effects on existing games was a better route then increased featrues for games coming out down the road). What you confuse is when people like a certain direction and you get it mixed up with liking a certain company. For myself, I have 11 years worth of posting history here(October of '99 is when these boards went live), the common element you will always find is I always support the company pushing forward the hardest in terms of features(and the best AF support possible ). Doesn't matter which company it is, ever. I also have never been a supporter of major shakeups in terms of moving from rasterization as to date we haven't seen anything that works better(that is subject to immediate change on my end as soon as it changes on the tech end). Feel free to dig all you'd like, doesn't matter which company it is, those pushing technology forward are those that I get behind(back in the late 90s there was actually a fairly big divide between the fans of 3dfx and those on the ATi/nV bandwagons, yes, ATi and nV were regularly grouped together in that timeframe).
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Keys it wasn't baiting . It was an out and out trap. I know the baiting rules . If a thing looks like a fish and swimms in water and smells like a fish. Its likely a freaken fish . Yes I know better than to reply like that. But honesty trumps Points . I fully expect to see same Warning passed to the Force as this Sentance right here is a bait.

It's Dave's job to be a shill, saying he is one is a compliment to his work ethic at this point.


At this Time . So he is referring to now that Dave works for ATI . Never the less its a bait . Because befor Dave went to work for ATI were is proof he is shill befor that . Ya I have read it all . Maybe true to . But unless Dave says so its wrong. The differance is I don't hide behind BS sentances. I plainly stated my intention . Those who try to protect NV bond together to get rid of those who rail against NV . How many times was I banned because I called Rollo out . So it would seem that wrong trumps right if it suits a purpose . If you recall How many times did I ask Rollo if he worked for NV. He lied ever time . But I ended up with the ban. So lieing and cheating < baiting . This must be new school thinking. Lawyers bait witnesses all the time to get to truth .
You object to this. What I did here was exceptable. and I was honest about my motive. Your points against me are wrong and ya dam well know it.

Traps aren't really that effective without bait.
Please take a few days to consider this, and how there isn't any difference.
Thanks
Anandtech Moderator - Keysplayr
 
Last edited by a moderator:

SirPauly

Diamond Member
Apr 28, 2009
5,187
1
0
Not really.
Beyond3D was started by Dave Baumann. Originally he was an ATi shill, but now he has an official function with AMD/ATi.
So officially Dave stepped back as forum administrator, but he's still pulling the strings behind the scenes.
Beyond3D is not exactly a good place to get good info on nVidia-related technology. It's very biased towards AMD.

I knew Wavey Dave before the days at Beyond3d and is a straight shooter in my book. Remember his first journalistic views back in the old Voodoo Lounge. Dave Barron and Kristof Beets started Beyond3d, which handed the responsibility to Reverend and then to Wavey Dave, from my recollection.
 

SirPauly

Diamond Member
Apr 28, 2009
5,187
1
0

railven

Diamond Member
Mar 25, 2010
6,604
561
126
I'm not an NV shill.
Is this a blogpost of an NV shill? <link removed by Railven>

Seems like you're the one who needs to learn a lot about people. Some of them actually have something called integrity.
Heck, I currently have a Radeon 5770 in my machine, I don't even have a machine CAPABLE of running GPU PhysX.

Just a heads up, you should be careful about posting your blog. There was another poster here who got temp banned and I think eventually perm banned (haven't seen him post in a while.)

Just a heads up since I know this wasn't the first time and I enjoy reading your input on some of the subjects.

EDIT: Oh snap just read your blog. I remember that thread and I don't think its a good idea posting that here considering that "idiot" is in this thread. Doesn't seem like a good way to keep things amicable. Haha, good read though.
 
Last edited:

TuxDave

Lifer
Oct 8, 2002
10,572
3
71
Having dealt with x87 hardware a ton, that really makes me laugh. Why don't we go do some MMX code while we're at it. And yes, the processors are being optimized towards SSE/SSE2 and not so much x87/MMX.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
I knew Wavey Dave before the days at Beyond3d and is a straight shooter in my book.

Perhaps he was at some point...
But if you saw the PMs I've had with him regarding 3DMark05 and the 3Dc/DST/PCF issues (there was a discussion on Beyond3D, but he just kept deleting my posts)...
I came to two conclusions:
1) He doesn't have a lot of knowledge on the subject himself. Clearly he's not a D3D developer, doesn't know how to write shaders. Seemed like for every reply I gave, he had to 'consult' with some people with knowledge of shader programming... and even then, some parts got lost in translation.
(The lead programmer of 3DMark05 was an old friend of mine, I had discussed with him how the shaders were designed... and he even said that ATi devrel had worked with them to optimize the shaders, and that 3Dc was decided against, given the shader design. It wouldn't lead to better quality or performance, and I knew exactly why).
2) He doesn't have any integrity. He kept claiming that DST/PCF was nVidia-proprietary... but I pointed out that it would be part of the DX10 standard, and that he knew even better than I did, that ATi is going to support it aswell.
He also wouldn't admit that Richard Huddy just stabbed Futuremark in the back with his comments on 3Dc and how it would improve quality and performance. ATi devrel themselves didn't see a point in 3Dc in 3DMark05, and Huddy knew it.
And this was BEFORE Dave was officially an ATi/AMD employee.
 
Last edited:

Scali

Banned
Dec 3, 2004
2,495
0
0
EDIT: Oh snap just read your blog. I remember that thread and I don't think its a good idea posting that here considering that "idiot" is in this thread. Doesn't seem like a good way to keep things amicable. Haha, good read though.

Well, at least it should prove beyond a shadow of a doubt that I'm not an NV shill.
I am critical of nVidia hardware where I think it's necessary (as I am of all hardware/software).
And the most obvious point: I don't HAVE an nVidia card currently. If I were somehow involved with NV in a focus group or anything, don't you think NV would have donated me some hardware?

As for the "idiot", well, he seems to have made a fool out of himself again, regarding SSE2-instructions
 
Last edited:

BenSkywalker

Diamond Member
Oct 9, 1999
9,140
67
91
As for the "idiot", well, he seems to have made a fool out of himself again, regarding SSE2-instructions

I was going to let you slide, you were amusing enough- can't do bilinear and anisotropic at the same time under D3D, heh, but since you brought it up-

SSE's movntq should be fine

movntq is for a 64bit value, movnti is for a 32bit value.

As for the rest-

http://softpixel.com/~cwright/programming/simd/sse.php

http://softpixel.com/~cwright/programming/simd/sse2.php

You know another thing? A black and white checkerboard pattern that doesn't gray out on mip 9 will never gray out on mip 0. Can't happen(if filtered properly)

1) He doesn't have a lot of knowledge on the subject himself.

Anistoropic/anisotropic/point is far simpler matter. As of now, Dave Baumann has a major position at one of the two major graphics IHVs and had major oversight over one of their largest product launches in history, he carries far more credibility then you.

He kept claiming that DST/PCF was nVidia-proprietary... but I pointed out that it would be part of the DX10 standard

DXTC was part of the DirectX standard and was still proprietary to S3(S3TC)- MS licensed the technology and reached an agreement where other IHV's were allowed to use it under DirectX but not under other APIs. I recall the hoops we went through to try and get it working properly under Quake3. Saying something is proprietary in now way prevents it from being in DirectX if MS decides it is important enough to license.
 
Last edited:

Scali

Banned
Dec 3, 2004
2,495
0
0
movntq is for a 64bit value, movnti is for a 32bit value.

I never claimed otherwise.
I just wondered why you'd think it's so important.


Yea, as you see, rcp* and rsqrt* are in SSE.
You still haven't explained what you'd want to do with rcp* though.

Remember, you claimed that SSE2 would be a minimum requirement ("SSE is far too limited to be considered viable."), so I ask you to explain why regular SSE would be so much worse...
Clearly SSE2 is even more powerful than SSE, but shouldn't SSE already give you most of the advantages for the performance boost over x87? (flat register file, packed arithmetic, fast partial precision operations etc).

You know another thing? A black and white checkerboard pattern that doesn't gray out on mip 9 will never gray out on mip 0. Can't happen(if filtered properly)

Anistoropic/anisotropic/point is far simpler matter.

Yes, but someone with no programming/3D experience such as yourself still doesn't understand it.

As of now, Dave Baumann has a major position at one of the two major graphics IHVs and had major oversight over one of their largest product launches in history, he carries far more credibility then you.

Yea whatever. As if that was really Dave Baumann's doing. He's just a talking head, and not a very good one at that.

DXTC was part of the DirectX standard and was still proprietary to S3(S3TC)- MS licensed the technology and reached an agreement where other IHV's were allowed to use it under DirectX but not under other APIs. I recall the hoops we went through to try and get it working properly under Quake3. Saying something is proprietary in now way prevents it from being in DirectX if MS decides it is important enough to license.

The point he was trying to make was that it would only work on nVidia hardware, while at the same time, ATi was developing hardware with support (as MS put it in DX10 and took care of the licensing issues), and he knew about it. You just don't have any integrity if you try to argue such points.
 
Last edited:

Kr@n

Member
Feb 25, 2010
44
0
0
That being said, not all developers want SSE enabled by default, because they still want support for older CPUs for their SW versions.
I am inclined to believe nVidia when they state they did some research and found non-SSE code would be faster than SSE code (I find it hard to swallow, given how parallel physics computations are *edit: said to be*, but I certainly don't have their expertise on the matter). However, I don't understand the part I quoted, since I am quite sure it is possible to activate / deactivate SSE optimisations depending on CPU capacities (choose the code path accordingly at runtime, or when installing) ... I must be wrong somewhere ...

But I'm sure even if it's possible, the gain would not outweigh the pain (since you have to rewrite your code to make it SSE friendly, and it's especially painful if you want to keep only one source for both SSE/non-SSE binaries)


[...] a task-based approach that was developed in conjunction with Nvidia Apex product to add in more automatic support for multi-threading [...]
Automatic multi-threading would be good : at the very least, it should end this everlasting argument.


Unfortunately, previous authors are missing few vital points: PhysX SDK is used in many games running on CPU, and physics level in those titles can be easily compared to physics content in games based on other “non crippled” physics engines, like Havok; nor there are any games, that can offer content, similar to GPU PhysX effects, but running on CPU with stable framerate.
I was under the (maybe wrong) impression that some games (Bad Company, Ghost Busters) achieved much heavier physics than what was previously done with CPU PhysX (but obviously not as heavy as GPU PhysX), and did that without taxing the CPU ...
 
Last edited:

Scali

Banned
Dec 3, 2004
2,495
0
0
I am inclined to believe nVidia when they state they did some research and found non-SSE code would be faster than SSE code

I don't know who to believe as neither bothered to present an actual comparison to support their case.

However, I don't understand the part I quoted, since I am quite sure it is possible to activate / deactivate SSE optimisations depending on CPU capacities (choose the code path accordingly at runtime, or when installing) ... I must be wrong somewhere ...

No, you're right, I've mentioned that option aswell.
In fact, nVidia knows how to do that very well... their OpenGL driver will select the proper code version and report that in the version string.
You'll see something like "GeForce 9800 GTX+/PCI/SSE2" reported (you can check with a tool such as GPU Caps Viewer, in the OpenGL tab). They have various versions. I recall that I had a 3DNow!-optimized version with my Athlon Thunderbird and GeForce2 at the time.


But I'm sure even if it's possible, the gain would not outweigh the pain (since you have to rewrite your code to make it SSE friendly, and it's especially painful if you want to keep only one source for both SSE/non-SSE binaries)

I don't think this is a valid excuse really. I mean, assuming it's possible that the code can be sped up significantly with SSE, then saying "it's difficult to maintain the code" is not a good excuse for "Your code could be optimized significantly".
Then it's a fact that your code is suboptimal, and that you aren't willing to put in the effort.
Although you're free to make that choice as a company, the criticism would be justified.
 
Last edited:

Kr@n

Member
Feb 25, 2010
44
0
0
@Scali : I think we agree, then :

SSE optimisation is possible and within their means (though not trivial nor cheap), but would not necessarily bring any major gain to the table (I still think they could squeeze some 1.5x if it scales even half as good as ray tracing with SSE), hence it is perfectly acceptable for nVidia to not commit any resource on that (even if it seems they did if we believe the leaks for PhysX 3.0). It is also perfectly acceptable for us to criticize this position (but no need to make a complete fuss nor stating nVidia is evil ^^).

That aside, I am sure PhysX makes much better use of SSE than its AMD counterpart ... (niark)
 

Scali

Banned
Dec 3, 2004
2,495
0
0
Kr@n: Yes, pretty much.
As usual, the truth is somewhere in the middle... The RWT article paints a very bleak picture of PhysX' CPU implementation, with some points that are dubious, and some that are downright false.
On the other hand, nVidia's responses paint things a bit too rosy. There certainly is room for improvement, that much is true.
 

Lonbjerg

Diamond Member
Dec 6, 2009
4,419
0
0
So all we need now is for someone to show us a CPU havok physics game doing it "right"...

*waiting*

*waiting*

*waiting*

*waiting*

*waiting*

*waiting*

*waiting*

*endless loop*
 

Scali

Banned
Dec 3, 2004
2,495
0
0
I took the liberty of doing the Bullet-test myself.
I've downloaded the latest Bullet SDK (version 2.76).

I then compiled it with Visual Studio 2008, with the default Bullet project settings, which use SSE.
Then I added a new configuration, where I disabled SSE, but left all other options untouched, so I'd get a 'vanilla' x87 version.

I then ran the included benchmarks on my Core2 Duo 3 GHz machine:
SSE results.
x87 results.

As you can see, the difference is marginal at best. Sometimes x87 comes out on top. I see no indication of 1.5-2x speedup with the SSE code anywhere.

If anyone wants to try it on their PC, you can download my precompiled binaries here:
http://bohemiq.scali.eu.org/bullet/bullet-2.76-x87-sse.zip

And if you don't trust me... well, it's open source, you can build it yourself.
 

Lonbjerg

Diamond Member
Dec 6, 2009
4,419
0
0
I took the liberty of doing the Bullet-test myself.
I've downloaded the latest Bullet SDK (version 2.76).

I then compiled it with Visual Studio 2008, with the default Bullet project settings, which use SSE.
Then I added a new configuration, where I disabled SSE, but left all other options untouched, so I'd get a 'vanilla' x87 version.

I then ran the included benchmarks on my Core2 Duo 3 GHz machine:
SSE results.
x87 results.

As you can see, the difference is marginal at best. Sometimes x87 comes out on top. I see no indication of 1.5-2x speedup with the SSE code anywhere.

If anyone wants to try it on their PC, you can download my precompiled binaries here:
http://bohemiq.scali.eu.org/bullet/bullet-2.76-x87-sse.zip

And if you don't trust me... well, it's open source, you can build it yourself.

As I thought...pure anti-NVIDIA bullshit.

Making conclusion of a flawed foundation.

YHPM
-ViRGE
 
Last edited by a moderator:

GaiaHunter

Diamond Member
Jul 13, 2008
3,634
180
106
I took the liberty of doing the Bullet-test myself.
I've downloaded the latest Bullet SDK (version 2.76).

I then compiled it with Visual Studio 2008, with the default Bullet project settings, which use SSE.
Then I added a new configuration, where I disabled SSE, but left all other options untouched, so I'd get a 'vanilla' x87 version.

I then ran the included benchmarks on my Core2 Duo 3 GHz machine:
SSE results.
x87 results.

As you can see, the difference is marginal at best. Sometimes x87 comes out on top. I see no indication of 1.5-2x speedup with the SSE code anywhere.

If anyone wants to try it on their PC, you can download my precompiled binaries here:
http://bohemiq.scali.eu.org/bullet/bullet-2.76-x87-sse.zip

And if you don't trust me... well, it's open source, you can build it yourself.

So basically you are telling us that SSE isn't worth it and we should still be using x87?

And does running Bullet in a GPU get any performance gains?
 

Scali

Banned
Dec 3, 2004
2,495
0
0
So basically you are telling us that SSE isn't worth it and we should still be using x87?

No, I'm saying that in this particular case, getting a performance boost from SSE is far from trivial.
The Bullet library actually uses some of the SSE intrinsics from VS2008 aswell, so it has received at least a bit of hand-optimization.

As I said before in the thread, if the computational part is not the bottleneck in the first place, you're not going to gain much by optimizing that part.
I think this small Bullet-test at least shows two things:
1) David Kanter was jumping to conclusions with his figures of 1.5-2x speedup. It's not that simple.
2) nVidia was correct in stating that some things are just faster with x87 than with SSE (just like the example I gave, the dotproduct).

And does running Bullet in a GPU get any performance gains?

The only thing we have running on a GPU so far is the Cuda demo released with Bullet 2.74, and that performs better than a CPU yes.
 

Lonbjerg

Diamond Member
Dec 6, 2009
4,419
0
0
Oddly I got a PM:

Would you care to revise your post to actually engage in a technical discussion without thread crapping and name calling, or would you like a vacation for repeated trolling in the Video forum?

For stating the obvios.

DKanter did a borked "analyzis"
He PRESUMES that the use of x87( instead of using SSE) means that Physx is borked on the CPU and running much slower than it could.
But didn't verify in any means his findings.


Scali actually put this to the test...a test with a different physcis API and found no major difference between x87 and SSE in Bullet Physics.

That can only lead you to conclude that DKanter's piece was directed against NVIDIA, on a false premise..one that he never tested, but none the less he still (with no factual evidence) concluded that "those are the facts".

It's no secret that quite a few people dislike PhysX.
It's also no secret that no one can show another physics API on the CPU doing it better.

Those are the facts.
Prove me wrong.

But saying so...means you get accused of "threadcapping" and "namecalling".

Makes you wonder eh?
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |