BenSkywalker
Diamond Member
- Oct 9, 1999
- 9,140
- 67
- 91
Hold on a sec. How would an SM3.0 shader get around doing an intersection test to see if the light source can illuminate the object?
They don't get around it, it is the way it has to be done. Using SM3.0 you run a vertex test on each light source then pass the data to the pixel shaders to calculate out which shader routines need to be run. You can't do this under SM2.0 in the fashion it is possible with 3.0(lack of dynamic branching)- you would need to calculate out a per pixel coverage map which wouldn't be close to reasonable.
The 'normal' sequence I would assume for culling back-facing surfaces (which is what I *thought* you were talking about; suddenly now we're doing intersection tests as well?) during dynamic lighting would be to have a shader that looks something like this in SM3.0:
It is beyond culling back facing surfaces- it is also culling unneeded shader routines from forward facing pixles.
In SM2.0, you can get a very close effect by running a shader like this:
{
calculate surface normal to light
store surface normal
}
and then for each pixel that is facing the light, you run another pass of a shader that looks like:
{
calculate contribution of light to color of pixel
}
You need to calculate from the light to the surface, not the other way around. How would you go about storing the data for this? You could tile the scene into six or eight different full frame sized textures to store the relevant shader data(lack of precission supported under SM2.0) in terms of what pixels need what light routines run- and then you would have to take that data and run a shader program to create a shader routine for each pixel and then apply that for the final run time- you are talking radiosity levels of complexity(far worse then the overhead of simply handling all of the shaders in the first place).
Perhaps I'm not understanding the exact problem, but I think the solution I just outlined above would work (doing one or two low-instruction-count passes to cull out the pixels that will not be affected, then do the bulk of the work on the remaining ones).
The major problem with your solution is that you can't use conditionals with SM2.0(that is a 3.0 feature) so you move from a simple visiblity test to calculating out an intersection map and then creating a shader routine per frame- not viable.
I thought it still had the capability of fully working on 24 pixels, but 16 could actively output per clock (due to the 16 ROPs). Perhaps the article I read on the card's architecture misled me.
It has 24 ALUs for pixel shaders. The 7800GTX is a four quad GPU- it simply has more shader units then what it can ouput. I think you will find a lot of sites misleading people, for simplicities sake, by calling parts by the amount of ALUs they have moving forward instead of how many pixels they can draw.
"You should pay for SM3.0 because at some indeterminate point in the future, it might provide larger performance gains" is hardly an overwhelming argument in favor of SM3.0 ATM.
Pay what is the question. If ATi had a part out right now that performed identical to a X800 but had SM 3.0 and it was charging a 10% premium would you say to go for it or not?