Originally posted by: BenSkywalker
That is not at all what I meant. If you have to compute some mathematical function, the number of mathematical operations required to compute it does not change going from SM2.0 to SM3.0.
Using your example certainly not, although I can't imagine why you would try and write code like that crippling your hardware. You present the case and insist that the entire shader routine must be run again instead of utilizing a cummulative impact on the output of per light interaction- why? That is a good example of a game written with 2.0 in mind that won't speed up much when moving to 3.0- but that isn't what we are talking about.
Well, you said:
Could you explain how you can use loops, branches and collapse passes and not reduce the computational complexity? I can't think of a single example where you can do all of the former and not reduce overhead considerably.
I tried to give a counterexample. I also don't understand your reply here -- I was assuming that both the SM2.0 and SM3.0 shaders would, as you put it, 'utilize the cumulative impact on the output of per light interaction'. How would a program structured like this "cripple your hardware" in either SM2.0 or SM3.0?
Perhaps you could provide a better example of something that would see a significant (say, 10+%) improvement? I'm having a hard time seeing where enormous improvements in performance (on shaders of a reasonable length) are going to come from.
If you set a fixed maximum number of lights, and the instruction count per light source is not ridiculous, you can write multiple SM2.0 shaders (e.g. a shader to handle one light, a shader to handle two lights, etc.) to do this without incurring noticeable performance losses.
Is the SM3.0 code simpler? Sure. Can you make SM2.0 do this just as fast as SM3.0? Usually, yes.
If you jump through a bunch of hoops and restric yourself in a serious fashion then sometimes 2.0 can be as fast as 3.0- I don't think anyone is arguing against that.
Right -- but in many cases, dropping those restrictions will produce code that runs far too slowly to be useful on any of today's hardware. You're just not going to be able to run a scene with 64 dynamic pixel-shaded light sources on a GF6, whether it takes one pass or 10. Granted, it'll be a little faster in one pass, but not enough to overcome the raw number of mathematical operations the GPU needs to do.
A 6800GT running at 400Mhz gets ~16 * 400,000,000 = 6.4B pixel shader ops/sec. (assuming you're not further bound by memory bandwidth/latency, and everything runs at 100% efficiency all the time, all ops are equal in execution time, yada yada yada). If you want to put out 1600x1200@60FPS, that's ~120MPixels/sec. That gives you a computational budget of only ~50 shader ops/pixel (amortized over the whole frame; obviously not every pixel will be shaded all of the time). Even a 7800GTX (430Mhz, 24 pixel shaders) only has ~10.3B pixel shader ops/sec. to play with, which at 1600x1200@60FPS is ~90 pixel shader ops/pixel. My point is that
SM3.0 does nothing to change this number, which rapidly becomes the limiting factor when trying to write longer and more complex shaders.
Okay, but in your last post you alluded to things such as using shaders to replace (or at least supplement) surface textures. If you want to shift to that sort of model (where you use shaders everywhere), you simply don't need as much fixed-function hardware, and you would be better off having more transistors devoted to programmable elements. Current PC graphics cards are still maintaining both in roughly equal amounts.
What fixed function hardware do you think you are going to be able to give up? Actually, the fixed function elements still need increased complexity to handle a fully shaded environment. Even using fully shaded games you still need the capabilities of rendering to texturing and then reading that data- we also still need our ZBuffer, cache space and filtering hardware. What's more- we need to add complexity to filtering hardware to handle higher loads and also increased precission which will be required moving forward(particularly once we get in to radiosity). What fixed function hardware do you think we can do away with?
I'm not saying we won't need fixed-function hardware, just that if you shift to a programming model where you're doing more and more things on the fly with shaders, you will need less fixed-function hardware
relative to the amount of shader power you will need. Barring big improvements in display density, the overall number of pixels being output per second is not going to continue increasing much longer (it seems likely that something in the neighborhood of 1080p will be the target for the next few years), so just more fillrate and bandwidth and memory size is not going to help you (or at least not help you as much).
If/when the game industry really starts to move in this direction, the GPU designs will shift to ones more like the R500/XBox360 (and eventually to designs even more heavily skewed towards shader ops).
To some extent, yes, but the ratio of transistors devoted to shaders and fixed-function rasterization hasn't shifted all that much. A 16-pipe card with 16 pixel shaders and 6 vertex shaders is fundamentally just a 'bigger' version of an 8-pipe card with 8 pixel shaders and 3 vertex shaders. The 7800GTX changes this somewhat by only having 16 ROPs for its 24 pipelines, but fundamentally the architecture remains the same.
I know that you know that this is complete and utter BS. We started off with considerably less then one quarter of a shader unit to four pixel pipes(GF DDR) to now having 30 total shader units to 16 pixel pipes- there has been a staggering shift in a relatively short period of time. The R500 moves this to another level packing 48(albeit simplified) shader units to 8 ROPSs. This is over a course of five years- can you point to a more dramatic shift of die space in any other sector?
And if we compare against the Voodoo2, the percentage of transistors devoted to shaders has increased by (infinity * eleventy billion)%! :roll:
Compare starting with DX8.1 cards (which is what I would consider the start of the current line of programmable GPU evolution). While the balance has shifted somewhat (from maybe 2:1 against in a GF4Ti to somewhere in the neighborhood of 1:1 in the 7800GTX), we still have not moved to an architecture that heavily favors shader ops to fixed-function triangle rendering.
And which card are you referring to as having 30 shaders to 16 pixel pipes? The 7800GTX has 24 pixel and 8 vertex shaders (which is, if I'm adding right, 32), with 24 pixel pipes and 16 ROPs. The 6800GT/Ultra have 16 pixel and 6 vertex shaders (22), with 16 pipelines and 16 ROPs.
The 6800GT is (slightly) faster than the X800XL (and slower at a few things, like HL2), and has SM3.0, but is significantly more expensive.
snip -- much more detailed pricing info
I should have been more specific with my prices. I was comparing PCIe cards, using in-stock, non-MIR prices (since MIRs change on a daily/weekly basis, plus I really dislike MIRs personally). However, I will take MIRs into account and redo my analysis (all prices from Newegg, PCIe cards only):
6800GT: $319-335 (there is also a card for $346 with $50 MIR, which would be $296)
X800XL: $270-280 (there's a $20 MIR on one of the $270 ones, which would make it $250)
Without MIRs, you'd be paying around 18% ($50) more. With MIRs, it's an 18.5% increase ($46). While the 6800GT is a *little* faster than the X800XL, it's not 20% faster most of the time (and much of that gap is closed when using AA/AF).
6600GT: $165-170 (there is one for $154, but it's OOS with no arrival date given -- and I could not find a card that was $138 after MIR... link?)
X800: $175-180 (there is also one for $193 with $30 MIR)
Without MIR, the X800 costs 6% ($10) more. With MIR, the X800 is actually
cheaper. The X800 is generally in the range of 10-30% faster than the 6600GT (depending on application, settings, etc.)
I stand by my previous conclusion.
In any case, my point is that SM3.0 should be evaluated only as a factor (and IMO, a minor one) when buying a videocard right now. It certainly shouldn't be the only thing driving your purchasing decision.