SM3.0 is a scam.

hans030390 · Aug 10, 2005

Originally posted by: Pete

Originally posted by: TTLKurtis
SM3.0 is better because more future games will use SM3.0.

Click to expand...

There's a small disconnect in your statement. How many times have we said not to buy a card now for a game that isn't out yet (especially one that's a year or more away)?

However, I do agree that

nvidia is currently better ... because ... SM3.0 is just an added bonus

Click to expand...

when you compare it to a similarly-priced and performing ATI card. So I would recommend a nV card over an ATI one if price and performance are similar, just because the nV card lets you experience some extra special effects in some current games, and may have an edge in future ones.

Lots of people buy cards to keep for over 2 years. So, wouldnt SM3 make sense?

You do have to understand that some people don't have an unlimited amount money coming out of their ears.

Like me.

So naturally, I got the best card for the $200 price range (at the time) and I knew I'd be getting SM3. That was my plan. I was happy. I am happy. The 6600gt will be with me for a couple years.

Make sense?

Pete · Aug 10, 2005

No need to be so defensive, hans. Really read my post and you'll see I said if a SM3 card costs and performs the same as a SM2 one, might as well get the SM3 for potential future-proofing. I believe that's what you did.

But I wonder how playable your 6600GT will be two years down the line at a decent res like 1024x768, SM3 or not. Obviously, you'll make do with 8x6 or even 6x4 if you have to, but I don't think SM3 will give a card that old a significant step up from SM2 (like, say the ability to play a resolution higher or with nicer IQ at the same framerate*). And it remains a gamer's truth that you really shouldn't buy a card now thinking you know how it'll fare with a game coming out in a year. Near future, sure, and you can certainly make educated guesses as to future performance. But it gets trickier when you weigh more money now for *potentially* better performance then.

BTW, do you mind compressing your sig? You could probably fit all that into the 2-3 lines that netiquette dictates by separating the items with a | rather than a line break. Actually, maybe you'll be kind enough to replace your sig with this (seeing as I took the time to reformat and edit it):

Sony VAIO RS420 | P4 2.8GHz HT (800MHz FSB) | 512MB PC2700 RAM | Leadtek 6600GT AGP (latest official drivers) | 300W PSU | 120GB HDD | DVD+/-RW | 8150 3M03, 3440 3DM05.

As long as I'm imposing, maybe you could avoid quoting an entire post when replying close, much less adjacent, to it. Just start your reply with a "Pete, " and I might just be able to figure out the context all by myself!

* This would make an interesting retrospective article, comparing the performance of a X800XL and 6800GT in a year or two with the games available then.

Optimummind · Aug 10, 2005

He should be thrown back to Driverheaven or Rage3D where his kind congregates. =)

Patrick Wolf · Aug 10, 2005

I WANT HDR WITH SM3, unfortunetly I have an X850 XT PE... I didn't know HDR existed when I bought this card. I think I'll eat a now.

Ackmed · Aug 10, 2005

Originally posted by: Gamingphreek
Correction they didn't make Scan Line Interleave. However they did make Scalable Link Interface (they did borrow some tech from 3DFX though).

Additionally, although Rage Fury Maxx did fail, it still counts (according to rule number 3.431A Section 43Xx that says Rage Fury Maxx counts) as it was launched and it was in the retail channels.

-Kevin

Yes I know that. However SLI = multi cards in my post, and NV didnt "make" them first as I stated.

BenSkywalker · Aug 10, 2005

That is not at all what I meant. If you have to compute some mathematical function, the number of mathematical operations required to compute it does not change going from SM2.0 to SM3.0.

Using your example certainly not, although I can't imagine why you would try and write code like that crippling your hardware. You present the case and insist that the entire shader routine must be run again instead of utilizing a cummulative impact on the output of per light interaction- why? That is a good example of a game written with 2.0 in mind that won't speed up much when moving to 3.0- but that isn't what we are talking about.

If you set a fixed maximum number of lights, and the instruction count per light source is not ridiculous, you can write multiple SM2.0 shaders (e.g. a shader to handle one light, a shader to handle two lights, etc.) to do this without incurring noticeable performance losses.

Is the SM3.0 code simpler? Sure. Can you make SM2.0 do this just as fast as SM3.0? Usually, yes.

If you jump through a bunch of hoops and restric yourself in a serious fashion then sometimes 2.0 can be as fast as 3.0- I don't think anyone is arguing against that.

Introducing a lot of branch instructions into a shader can increase the total instruction count and/or execution time (especially if branch instructions are significantly more expensive than other instructions). Hence why loop unrolling is used as an optimization technique.

Of course- which is why I stated a while back that the 6x00 hardware isn't a good example while the 7800GTX is.

Okay, but in your last post you alluded to things such as using shaders to replace (or at least supplement) surface textures. If you want to shift to that sort of model (where you use shaders everywhere), you simply don't need as much fixed-function hardware, and you would be better off having more transistors devoted to programmable elements. Current PC graphics cards are still maintaining both in roughly equal amounts.

What fixed function hardware do you think you are going to be able to give up? Actually, the fixed function elements still need increased complexity to handle a fully shaded environment. Even using fully shaded games you still need the capabilities of rendering to texturing and then reading that data- we also still need our ZBuffer, cache space and filtering hardware. What's more- we need to add complexity to filtering hardware to handle higher loads and also increased precission which will be required moving forward(particularly once we get in to radiosity). What fixed function hardware do you think we can do away with?

To some extent, yes, but the ratio of transistors devoted to shaders and fixed-function rasterization hasn't shifted all that much. A 16-pipe card with 16 pixel shaders and 6 vertex shaders is fundamentally just a 'bigger' version of an 8-pipe card with 8 pixel shaders and 3 vertex shaders. The 7800GTX changes this somewhat by only having 16 ROPs for its 24 pipelines, but fundamentally the architecture remains the same.

I know that you know that this is complete and utter BS. We started off with considerably less then one quarter of a shader unit to four pixel pipes(GF DDR) to now having 30 total shader units to 16 pixel pipes- there has been a staggering shift in a relatively short period of time. The R500 moves this to another level packing 48(albeit simplified) shader units to 8 ROPSs. This is over a course of five years- can you point to a more dramatic shift of die space in any other sector?

The 6800GT is (slightly) faster than the X800XL (and slower at a few things, like HL2), and has SM3.0, but is significantly more expensive.

6800GT $296(AMIR)
X800XL $250(AMIR)

The 6800GT is 18.4% more expensive. Both PCIe prices as they are high end boards($253 v $269 for AGP- 5.9%). I suppose you may consider either $46 or $16 depending on your platform significant.

X800 $163(AMIR)
6600GT $138(AMIR)

The X800 is 15.3% more expensive- went with the lowest price on each of these. If you compare PCIe models strictly-

X800 $163(AMIR)
6600GT $154(AMIR)

Only a 5.5% premium there but AGP only-

X800 $245
6600GT- $138(AMIR)

Roughly 43.7% premium there. So what exactly are you trying to say? Why is that one is significantly more expensive and one is roughly the same price? I could spin it the opposite way for you and compare only AGP boards and point out that the X800 is significantly more expensive(plus 40% I think qualifies) while the 6800GT and X800 are roughly the same price(I think less then 6% qualifies there). All prices are best off of NewEgg as of now btw. In reality I think that comparing PCIe makes sense for the high end and AGP makes sense for the low end(by our terms) parts.

hans030390 · Aug 10, 2005

Pete:

Not trying to get defensive, but its kinda hard when everyone bashes me when I talk about SM3. Not saying you were bashing me or anything.

But no, I don't expect my 6600gt to play next gen that great (low/medium settings at 8x6 most likely, if I get more ram) BUT theres a chance that maybe with Sm3 it will, in which case I could bump up graphics some (over which I wouldnt be able to do with an SM2 card, say x800) or I could leave it and enjoy some free framerate increases.

If a 6800gt can beat an x850 in FEAR (and it seems that SM3 is starting to give bigger boosts) on the same IQ settings, I think thats evidence of some good performance increases for next gen games. or whatever. Its kinda hard to explain exactly what i mean.

I guess I'm just the type that likes the newest features if I can get it. It's not like I'm at a loss for getting it. So i'm just speculating that Sm3 will be faily beneficial to me (even on crappy graphics settings) in next gen games. Yet, some people still like to tell me I'm wrong...which...No one really knows.

So we all know SM3 is good, it will be used heavily next gen, and my 6600gt probably will play next gen games on lower settings. We also know that i'll get a performance boost from using SM3, regardless of if I use it to bump up graphic settings.

So what exactly are we arguing about now...i got kinda lost... (sorry, that whole post wasn't geared just towards pete)

Matthias99 · Aug 10, 2005

Originally posted by: BenSkywalker

That is not at all what I meant. If you have to compute some mathematical function, the number of mathematical operations required to compute it does not change going from SM2.0 to SM3.0.

Click to expand...

Using your example certainly not, although I can't imagine why you would try and write code like that crippling your hardware. You present the case and insist that the entire shader routine must be run again instead of utilizing a cummulative impact on the output of per light interaction- why? That is a good example of a game written with 2.0 in mind that won't speed up much when moving to 3.0- but that isn't what we are talking about.

Well, you said:

Could you explain how you can use loops, branches and collapse passes and not reduce the computational complexity? I can't think of a single example where you can do all of the former and not reduce overhead considerably.

I tried to give a counterexample. I also don't understand your reply here -- I was assuming that both the SM2.0 and SM3.0 shaders would, as you put it, 'utilize the cumulative impact on the output of per light interaction'. How would a program structured like this "cripple your hardware" in either SM2.0 or SM3.0?

Perhaps you could provide a better example of something that would see a significant (say, 10+%) improvement? I'm having a hard time seeing where enormous improvements in performance (on shaders of a reasonable length) are going to come from.

If you set a fixed maximum number of lights, and the instruction count per light source is not ridiculous, you can write multiple SM2.0 shaders (e.g. a shader to handle one light, a shader to handle two lights, etc.) to do this without incurring noticeable performance losses.

Is the SM3.0 code simpler? Sure. Can you make SM2.0 do this just as fast as SM3.0? Usually, yes.

Click to expand...

If you jump through a bunch of hoops and restric yourself in a serious fashion then sometimes 2.0 can be as fast as 3.0- I don't think anyone is arguing against that.

Right -- but in many cases, dropping those restrictions will produce code that runs far too slowly to be useful on any of today's hardware. You're just not going to be able to run a scene with 64 dynamic pixel-shaded light sources on a GF6, whether it takes one pass or 10. Granted, it'll be a little faster in one pass, but not enough to overcome the raw number of mathematical operations the GPU needs to do.

A 6800GT running at 400Mhz gets ~16 * 400,000,000 = 6.4B pixel shader ops/sec. (assuming you're not further bound by memory bandwidth/latency, and everything runs at 100% efficiency all the time, all ops are equal in execution time, yada yada yada). If you want to put out 1600x1200@60FPS, that's ~120MPixels/sec. That gives you a computational budget of only ~50 shader ops/pixel (amortized over the whole frame; obviously not every pixel will be shaded all of the time). Even a 7800GTX (430Mhz, 24 pixel shaders) only has ~10.3B pixel shader ops/sec. to play with, which at 1600x1200@60FPS is ~90 pixel shader ops/pixel. My point is that SM3.0 does nothing to change this number, which rapidly becomes the limiting factor when trying to write longer and more complex shaders.

Okay, but in your last post you alluded to things such as using shaders to replace (or at least supplement) surface textures. If you want to shift to that sort of model (where you use shaders everywhere), you simply don't need as much fixed-function hardware, and you would be better off having more transistors devoted to programmable elements. Current PC graphics cards are still maintaining both in roughly equal amounts.

Click to expand...

What fixed function hardware do you think you are going to be able to give up? Actually, the fixed function elements still need increased complexity to handle a fully shaded environment. Even using fully shaded games you still need the capabilities of rendering to texturing and then reading that data- we also still need our ZBuffer, cache space and filtering hardware. What's more- we need to add complexity to filtering hardware to handle higher loads and also increased precission which will be required moving forward(particularly once we get in to radiosity). What fixed function hardware do you think we can do away with?

I'm not saying we won't need fixed-function hardware, just that if you shift to a programming model where you're doing more and more things on the fly with shaders, you will need less fixed-function hardware relative to the amount of shader power you will need. Barring big improvements in display density, the overall number of pixels being output per second is not going to continue increasing much longer (it seems likely that something in the neighborhood of 1080p will be the target for the next few years), so just more fillrate and bandwidth and memory size is not going to help you (or at least not help you as much).

If/when the game industry really starts to move in this direction, the GPU designs will shift to ones more like the R500/XBox360 (and eventually to designs even more heavily skewed towards shader ops).

To some extent, yes, but the ratio of transistors devoted to shaders and fixed-function rasterization hasn't shifted all that much. A 16-pipe card with 16 pixel shaders and 6 vertex shaders is fundamentally just a 'bigger' version of an 8-pipe card with 8 pixel shaders and 3 vertex shaders. The 7800GTX changes this somewhat by only having 16 ROPs for its 24 pipelines, but fundamentally the architecture remains the same.

Click to expand...

I know that you know that this is complete and utter BS. We started off with considerably less then one quarter of a shader unit to four pixel pipes(GF DDR) to now having 30 total shader units to 16 pixel pipes- there has been a staggering shift in a relatively short period of time. The R500 moves this to another level packing 48(albeit simplified) shader units to 8 ROPSs. This is over a course of five years- can you point to a more dramatic shift of die space in any other sector?

And if we compare against the Voodoo2, the percentage of transistors devoted to shaders has increased by (infinity * eleventy billion)%! :roll:

Compare starting with DX8.1 cards (which is what I would consider the start of the current line of programmable GPU evolution). While the balance has shifted somewhat (from maybe 2:1 against in a GF4Ti to somewhere in the neighborhood of 1:1 in the 7800GTX), we still have not moved to an architecture that heavily favors shader ops to fixed-function triangle rendering.

And which card are you referring to as having 30 shaders to 16 pixel pipes? The 7800GTX has 24 pixel and 8 vertex shaders (which is, if I'm adding right, 32), with 24 pixel pipes and 16 ROPs. The 6800GT/Ultra have 16 pixel and 6 vertex shaders (22), with 16 pipelines and 16 ROPs.

The 6800GT is (slightly) faster than the X800XL (and slower at a few things, like HL2), and has SM3.0, but is significantly more expensive.

Click to expand...

snip -- much more detailed pricing info

I should have been more specific with my prices. I was comparing PCIe cards, using in-stock, non-MIR prices (since MIRs change on a daily/weekly basis, plus I really dislike MIRs personally). However, I will take MIRs into account and redo my analysis (all prices from Newegg, PCIe cards only):

6800GT: $319-335 (there is also a card for $346 with $50 MIR, which would be $296)
X800XL: $270-280 (there's a $20 MIR on one of the $270 ones, which would make it $250)

Without MIRs, you'd be paying around 18% ($50) more. With MIRs, it's an 18.5% increase ($46). While the 6800GT is a *little* faster than the X800XL, it's not 20% faster most of the time (and much of that gap is closed when using AA/AF).

6600GT: $165-170 (there is one for $154, but it's OOS with no arrival date given -- and I could not find a card that was $138 after MIR... link?)
X800: $175-180 (there is also one for $193 with $30 MIR)

Without MIR, the X800 costs 6% ($10) more. With MIR, the X800 is actually cheaper. The X800 is generally in the range of 10-30% faster than the 6600GT (depending on application, settings, etc.)

I stand by my previous conclusion.

In any case, my point is that SM3.0 should be evaluated only as a factor (and IMO, a minor one) when buying a videocard right now. It certainly shouldn't be the only thing driving your purchasing decision.

knyghtbyte · Aug 10, 2005

ok, whose up for finding the OP and SM3ing his ass for starting this arguement?...hehe ;-)

to be honest, games will run well on either companies cards if you buy the decent enough card, if you can only afford cheap cards then, sad to say, tuff titty, you gonna be looking a pixels in macro....lol

and to be honest, unless you play fast reflex online FPS games, or truly massive RTS or RPG games, you really dont need a top notch card.....if you play CS:S but you average kd is ~1.0 then get a mid range card, the graphics aint gonna help ya get better...lol

if u like eye candy however, get a good job so you can afford your hobby, if not dont whinge about which company has the better tech or which tech is useful and isnt right now....

dunno99 · Aug 10, 2005

I personally do graphics research using hardware acceleration. One thing I research is ray tracing on a GPU.
The fact that I can use one (pixel, as vertex is pretty much useless) shader program to do all my computations instead of having to render to texture all intermediate results (which is a LOT of additional memory writes, on top of the required memory reads for acceleration structures and material lookups) greatly speeds up the computation speeds. Along those lines, acceleration structures really need dynamic branching (which is not offered in SM2.0) to be efficient (non-efficient methods to fix non-dynamic branchings are possible). Hopefully, graphics chip design companies (ATi, nVidia, Matrox, etc...) will implement 64-bit/channel precision soon...
Furthermore, the shortcut that ATi decided to take with its pixel shaders at 96-bit precision (24-bits per channel/color) really hurt these calculations. With SM3.0 (and onwards), the required precision is raised to 128-bits, which forces ATi to implement the 128-bit precision (in R520, which isn't out yet).
Other considerations include the required support for multiple render targets, resulting in reduced memory load as you can generate multiple sets of data with a smaller increase in memory access (e.g. you have two modes you're rendering on the screen from the exact same view point. One mode involves normal vision and another involves IR vision. With multiple render targets, you can just slap on an extra piece of code for the IR part, save it to texture, and then later paste the texture back into the framebuffer. Voila, you have now two images with just one polygon and texture read of the object you're rendering). This technique is sometimes used, most notably in ATi demos.

Quick questions:
1. There generally isn't any branch prediction in these graphics hardware, correct?
2. Where can I get a copy of the exact SM3.0 specification (more detailed than http://www.microsoft.com/whdc/winhec/partners/shadermodel30_NVIDIA.mspx )?
3. I remember there are a few sites that listed all video cards' specs, such as clocks, SM implemented, shader precision, transistor counts, die size, etc... Anyone remember what that site is? I know one of them is http://www.beyond3d.com/misc/chipcomp/ ...anyone know what others there are?

Ackmed · Aug 10, 2005

Originally posted by: hans030390

If a 6800gt can beat an x850 in FEAR (and it seems that SM3 is starting to give bigger boosts) on the same IQ settings, I think thats evidence of some good performance increases for next gen games. or whatever. Its kinda hard to explain exactly what i mean.

You dont know if its because of SM3 though. It could be from something unreleated. Or it could be an unoptimized beta code that makes it this way.

UberL33tJarad · Aug 10, 2005

Not completely sure if this was mentioned, but running high-dynamic range (HDR) lighting in the PS2.0 and PS3.0 paths result in a difference. For example, in Splinter Cell Chaos Theory's latest patch, it introduces PS2.0 support; now soft shadows, offset mapping, and a form of HDR is capable on R3xx and R4xx series cards for the game. The lack of floating point precision results in a flushed out image. Bit-Tech did a comparison of them both here. http://www.bit-tech.net/gaming/2005/08/05/scct_sm20_sm30/1.html

Also, Valve have stated there will be two different implementations of HDR in Lost Coast and the rest of the Source engine, a low and high quality of HDR.

I've had a 6800GT AGP and X800XL PCIe. A year ago PS3.0 was a feature that didn't hurt to have, but it's quickly becoming a standard. Splinter Cell 4, FEAR, Serious Sam II, that Project Offset game made by three guys, Elder Scrolls 4, and a couple others are coming this year with PS3.0. I look foward to enjoying it, whether it's through 7800GTX or R520. The whole idea of PS3.0 being a "scam" is worthless once the industry has moved on without you to the point where all the latest graphics cards support it and most of the games support it. It's not a scam that only aids nVidia when ATi will support it very soon.

BenSkywalker · Aug 10, 2005

I also don't understand your reply here -- I was assuming that both the SM2.0 and SM3.0 shaders would, as you put it, 'utilize the cumulative impact on the output of per light interaction'. How would a program structured like this "cripple your hardware" in either SM2.0 or SM3.0?

You aren't exiting the shader execution when relevant data is calculated(which is a required SM3 feature and not supported under SM2)- if there isn't going to be a cumulative impact using SM3- you don't calculate it- you must execute it all with SM2.

Right -- but in many cases, dropping those restrictions will produce code that runs far too slowly to be useful on any of today's hardware. You're just not going to be able to run a scene with 64 dynamic pixel-shaded light sources on a GF6, whether it takes one pass or 10. Granted, it'll be a little faster in one pass, but not enough to overcome the raw number of mathematical operations the GPU needs to do.

Not using SM2- utilizing SM3 you exit each shader routine as soon as you have computed end results and eliminate redundnant overhead.

My point is that SM3.0 does nothing to change this number, which rapidly becomes the limiting factor when trying to write longer and more complex shaders.

Your point is wrong. Try skipping light calcuations on back facing pixels relative to a light under SM2.0- not a problem on SM3.0.

Compare starting with DX8.1 cards (which is what I would consider the start of the current line of programmable GPU evolution). While the balance has shifted somewhat (from maybe 2:1 against in a GF4Ti to somewhere in the neighborhood of 1:1 in the 7800GTX), we still have not moved to an architecture that heavily favors shader ops to fixed-function triangle rendering.

Why not use the GF3? Also- the 7800GTX has 1:2 in your example- not 1:1. Even if we use the boards you have selected that moves us from 2:1 to 1:2- a four fold shift in a relatively short period of time. The R500 is going to be a 1:6 part- I have no idea what you are talking about with your stance that the balance has not shifted enormously.

And which card are you referring to as having 30 shaders to 16 pixel pipes? The 7800GTX has 24 pixel and 8 vertex shaders (which is, if I'm adding right, 32), with 24 pixel pipes and 16 ROPs. The 6800GT/Ultra have 16 pixel and 6 vertex shaders (22), with 16 pipelines and 16 ROPs.

Meant to type 32.

I should have been more specific with my prices. I was comparing PCIe cards, using in-stock, non-MIR prices

What day of the week?

You list off a generalization that is extremely inaccurate when certain factors shift(ie- looking for an AGP X800 v 6600GT). Also- the X800 tends to be much closer to 10% faster then the 6600GT then 30%- and sometimes the 6600GT is close to 30% faster then the X800. Overall the X800 is faster, but it certainly isn't close to the slam dunk you make it out to be.

6600GT: $165-170 (there is one for $154, but it's OOS with no arrival date given -- and I could not find a card that was $138 after MIR... link?)

In stock now. Before MIR it is $158.

In any case, my point is that SM3.0 should be evaluated only as a factor (and IMO, a minor one) when buying a videocard right now. It certainly shouldn't be the only thing driving your purchasing decision.

For now- as soon as ATi replaces their dated architecture with something packing a current feature set will you say the same?

Ackmed · Aug 10, 2005

That depends on if more games actually use SM3. As of now, they dont. And I get zero benefit from it in any of the games I play. Later (probably much) down the road, SM3 will mean more to me. As of now, it means next to nothing.

Blastman · Aug 10, 2005

Serious Sam II is only going to be SM2.0. The shaders only max out around 30 instructions too -- well within the 96 limit of SM2.0 (this is from a SS2 developer interview at B3D).

The problem with the uptake of SM3.0 is that SM2.0 has enough instructions to do VG IQ already. I don?t see developers clamoring for 100+ shaders even in the next few years. And you could always write a similar SM2.0 or SM2.0b shader anyway.

The big jump in IQ was getting cards that could do AF and AA at reasonable frame rates. SM2.0 was a big jump in precision (FP) and number of instructions over SM1x (integer) which is quite limited. So the big jumps have already been made. The difference between SM2.0b (ATI?s extended 2.0) and SM3.0 isn?t a lot considering dynamic branching is probably next to useless on the NV40.

Matthias99 · Aug 10, 2005

Originally posted by: BenSkywalker

I also don't understand your reply here -- I was assuming that both the SM2.0 and SM3.0 shaders would, as you put it, 'utilize the cumulative impact on the output of per light interaction'. How would a program structured like this "cripple your hardware" in either SM2.0 or SM3.0?

Click to expand...

You aren't exiting the shader execution when relevant data is calculated(which is a required SM3 feature and not supported under SM2)- if there isn't going to be a cumulative impact using SM3- you don't calculate it- you must execute it all with SM2.

There was no extra calculation being done in the example SM2.0 routine I gave. You can get around this for fixed/limited numbers of loop passes by hardcoding SM2.0 shaders for each number of loops, and choosing which shader to run dynamically.

Harder to code? Sure. Limited to some extent? Of course. Almost the same speed for the sort of shaders you can actually run on today's hardware? Hmm... gonna be a 'yes' there too.

Right -- but in many cases, dropping those restrictions will produce code that runs far too slowly to be useful on any of today's hardware. You're just not going to be able to run a scene with 64 dynamic pixel-shaded light sources on a GF6, whether it takes one pass or 10. Granted, it'll be a little faster in one pass, but not enough to overcome the raw number of mathematical operations the GPU needs to do.

Click to expand...

Not using SM2- utilizing SM3 you exit each shader routine as soon as you have computed end results and eliminate redundnant overhead.

What 'redundant overhead'? Unless you design your shaders incredibly poorly, there's just not that much overhead to eliminate. Please give a more detailed example of something you feel would obviously benefit significantly from SM3.0. You're still just waving your hands and saying that SM3.0 will be vastly more efficient.

My point is that SM3.0 does nothing to change this number, which rapidly becomes the limiting factor when trying to write longer and more complex shaders.

Click to expand...

Your point is wrong. Try skipping light calcuations on back facing pixels relative to a light under SM2.0- not a problem on SM3.0.

And you can't do this in SM2.0 because...? Couldn't you do a short pass to sort out the pixels that face the light, then only do a second pass to compute the full light calculations for those pixels? It's a little less efficient, sure, but not dramatically so. Again, this seems like a case where SM3.0 allows a more elegant solution, but where SM2.0 can be made to work almost as fast if you're willing to restrict things a little and/or invest some extra effort.

Compare starting with DX8.1 cards (which is what I would consider the start of the current line of programmable GPU evolution). While the balance has shifted somewhat (from maybe 2:1 against in a GF4Ti to somewhere in the neighborhood of 1:1 in the 7800GTX), we still have not moved to an architecture that heavily favors shader ops to fixed-function triangle rendering.

Click to expand...

Why not use the GF3? Also- the 7800GTX has 1:2 in your example- not 1:1. Even if we use the boards you have selected that moves us from 2:1 to 1:2- a four fold shift in a relatively short period of time. The R500 is going to be a 1:6 part- I have no idea what you are talking about with your stance that the balance has not shifted enormously.

The R500 is not out yet (although it does illustrate my point that hardware is moving in this direction). The 7800GTX has 24+8 shaders and 24 pipelines; even if you're generous and assume that a single pixel/vertex shader contains a number of transistors comparable to a complete fixed-function rendering pipeline, that's 1.5:1 at best.

Nitpicking GPU architectures and transistor counts, however, is not really productive (especially when really, really accurate numbers are not directly available). Can we both just agree that future cards will have a hell of a lot more shaders?

I should have been more specific with my prices. I was comparing PCIe cards, using in-stock, non-MIR prices

Click to expand...

What day of the week?

You list off a generalization that is extremely inaccurate when certain factors shift(ie- looking for an AGP X800 v 6600GT).

Which is why I clarified my numbers, since you used a totally different set of assumptions and got different answers because of it. AGP cards are a different story, and obviously you have to take the 'deal of the week' into account sometimes.

Also- the X800 tends to be much closer to 10% faster then the 6600GT then 30%- and sometimes the 6600GT is close to 30% faster then the X800. Overall the X800 is faster, but it certainly isn't close to the slam dunk you make it out to be.

3Digest got much closer Doom3 numbers with more recent drivers. The 6600GT was less than 20% faster at Doom3 -- and the X800 beat it at every setting with AA/AF enabled.

The X800 is better across the board (sometimes by significant margins) at Far Cry and HL2. The Far Cry and HL2 numbers with AA/AF are just depressing; the X800 beats a 6600GT SLI setup.

Ditto for 3DMark05; it's significantly better than a 6600GT without AA/AF, and beats a 6600GT SLI with AA/AF enabled.

It's even competitive at Chronicles Of Riddick, which is a shader-heavy XBox port (it's faster in the first bench, equal or slightly slower in the second one depending on settings).

The X800 was up to 50% faster in the FEAR demo (an SM3.0 game!) -- although, to be fair, it's only getting 30FPS at 1024x768 without AA, so it's not like either card is running it very well.

Seriously, there 6600GT offers minimal advantage except in Doom3, and even that advantage is eliminated with AA/AF enabled. The 6600GT, of course, is horribly hamstrung by its 128-bit memory interface with AA/AF enabled in any demanding title (again, something that SM3.0 cannot really help with).

6600GT: $165-170 (there is one for $154, but it's OOS with no arrival date given -- and I could not find a card that was $138 after MIR... link?)

Click to expand...

In stock now. Before MIR it is $158.

Did you miss where I said I was comparing prices on PCIe cards?

In any case, my point is that SM3.0 should be evaluated only as a factor (and IMO, a minor one) when buying a videocard right now. It certainly shouldn't be the only thing driving your purchasing decision.

Click to expand...

For now- as soon as ATi replaces their dated architecture with something packing a current feature set will you say the same?

Yes, unless it dramatically shifts the pricing of current parts, or the price/performance of low-end and midrange R520/GF7 parts is better than that of the current generation. Or someone puts out a game that gets more than a 5% performance improvement from SM3.0 and/or offers significant and usable IQ improvements that are only available with SM3.0.

BenSkywalker · Aug 10, 2005

And you can't do this in SM2.0 because...? Couldn't you do a short pass to sort out the pixels that face the light, then only do a second pass to compute the full light calculations for those pixels?

No, you can't reasonably do that under SM2.0. You would need to do a raycast calculation per light figuring on a visibility intersect- that would be significantly more complex then the most demanding shader scenario we have discussed(approaching radiosity levels of complexity before we render anything at all).

What 'redundant overhead'?

Running shader routines on non visible pixels and calculating light surfaces that have no impact on pixels by rerunning entire shader routines. There is no way around this using SM 2.0- it is relatively trivial under 3.0.

The 7800GTX has 24+8 shaders and 24 pipelines;

It has sixteen traditional pixel pipes- the "24 pipelines" is from ALU shader hardware- it is only capable of drawing 16 pixels per clock. 16 pixel output per clock, 32 shader units- 1:2.

Which is why I clarified my numbers, since you used a totally different set of assumptions and got different answers because of it.

I figured it both ways- I used the AMIR numbers as those were the best prices and as you noticed that benefited the X800XL- not the 6800GT.

3Digest got much closer Doom3 numbers with more recent drivers. The 6600GT was less than 20% faster at Doom3 -- and the X800 beat it at every setting with AA/AF enabled.

The X800 is better across the board (sometimes by significant margins) at Far Cry and HL2. The Far Cry and HL2 numbers with AA/AF are just depressing; the X800 beats a 6600GT SLI setup.

Funny the link you provided has the 6600GT at the top of the usability ratings(the X800 is second ).

Did you miss where I said I was comparing prices on PCIe cards?

You said you couldn't find a card- I stated

The X800 is 15.3% more expensive- went with the lowest price on each of these. If you compare PCIe models strictly-

At which point I listed the PCIe prices which did not include the $138 6600GT.

Or someone puts out a game that gets more than a 5% performance improvement from SM3.0 and/or offers significant and usable IQ improvements that are only available with SM3.0.

The problem is that most comparisons are SM3+HDR vs SM2- and then you have the fact that you must used the dumbed down SM2 shaders as a starting point due to all of the resources they waste.

Drayvn · Aug 10, 2005

Originally posted by: BenSkywalker

The 7800GTX has 24+8 shaders and 24 pipelines;

Click to expand...

It has sixteen traditional pixel pipes- the "24 pipelines" is from ALU shader hardware- it is only capable of drawing 16 pixels per clock. 16 pixel output per clock, 32 shader units- 1:2.

It actaully has 24 traditional pixel pipelines. The card can actually produce 24 pixels at a time, but because different pixels can take varying amounts of time it can only render 16 pixels because of its 16 ROPs that its got. If it had 24 then it would be able to render 24 pixels.

And each pipeline has 2 ALUs working for it and 2 MADDs for each of those ALUs

The Linuxator · Aug 10, 2005

I think the Poster Prod1gy had a point but didn't know how to explain it let me tell you about my experience "

Right of the batt I will tell you that the best cards that I owned are ATI, so I am a ATI fan boy. but with all honesty when I first saw nvidia with an advantage position over ATI I bought the 6800Ultra through a very limited special offer for $ 325 after company rebates. Why because I dont follow names blindly I make a balance between products not names.
I said they are very close in performance but the SM3.0 is a plus
well days go by and the only game that benefited from SM3.0 ( That I ever seen) was FarCry so maybe what the orignal poster meant to say that it's a scam to sell people a technology that they might not use at the current time , so you know what I did ? I did the same thing he did after reaching my conclusion 4 months ago and sold my 6800 Ultra and got my self a decent VIVO X800XL and put the rest of the money in my pocket and If I had the chance to do it again I would have done the same thing over and over. I paid $250 for my X800Xl I think it's the best bang for the buck right now and if it's possible to run two of them in crossfire mode I will get my hands on another one for sure.

So the 6800ultra cost me $ 325 and I still did;t keep it thats how much I disliked the 6800ultra AA performance .

Pete · Aug 10, 2005

Dravyn, AFAIK: 24 fragment pipes (I don't think they qualify as either traditional or pixel pipes in "traditional" terminology), two MADD ALUs per pipe (not two MADDs per ALU).

As a general rule, trust Ben to know what he's talking about.

hans030390 · Aug 10, 2005

Hey matthias, anandtech has an article on the 6600gt. It compares it to an x800pro (not vanilla) and it only gets like 10-15fps less at either 10x7 or 12x10 (which are GREAT for games, no one really needs 16x12). Funny thing is, the 6600gt still runs above 40fps! Did you know thats REALLY playable? Most people don't need 1234309fps to play games. In fact, most people can't tell the difference between 30 and 60fps.

You know what else. Some people dont use AA/AF! OMG! Lets make a buying decision on a card because it does better with AA/AF! You know what, i KNOW i'll be using Sm3 sometime, in fact I even use it now. But I knew that AA/AF isn't something I use just because I don't need it. I'm fine with 10x7. No AA/AF. Its just a waste of performance. Perhaps its because I grew up without it.

Jeez. So wait, that means the x800 would be even close in performance to the 6600gt than the x800pro is. So...why not sacrifice a few FPS and still have a very playable game at high settings and not use the unneeded AA/AF so you can have something you will use in the future, SM3? And with SM3, it boosts performance in games that use it, so maybe you COULD put on that extra AA/AF if you want it and still have a playable game.

Sorry if you're the type that plays on uber 16x12 with full graphics settings and all AA/AF. Some of us really dont care about it.

The Linuxator · Aug 10, 2005

Originally posted by: hans030390
Hey matthias, anandtech has an article on the 6600gt. It compares it to an x800pro (not vanilla) and it only gets like 10-15fps less at either 10x7 or 12x10 (which are GREAT for games, no one really needs 16x12). Funny thing is, the 6600gt still runs above 40fps! Did you know thats REALLY playable? Most people don't need 1234309fps to play games. In fact, most people can't tell the difference between 30 and 60fps.

You know what else. Some people dont use AA/AF! OMG! Lets make a buying decision on a card because it does better with AA/AF! You know what, i KNOW i'll be using Sm3 sometime, in fact I even use it now. But I knew that AA/AF isn't something I use just because I don't need it. I'm fine with 10x7. No AA/AF. Its just a waste of performance. Perhaps its because I grew up without it.

Jeez. So wait, that means the x800 would be even close in performance to the 6600gt than the x800pro is. So...why not sacrifice a few FPS and still have a very playable game at high settings and not use the unneeded AA/AF so you can have something you will use in the future, SM3? And with SM3, it boosts performance in games that use it, so maybe you COULD put on that extra AA/AF if you want it and still have a playable game.

Sorry if you're the type that plays on uber 16x12 with full graphics settings and all AA/AF. Some of us really dont care about it.

lol, I don't mean to disrespect you, but this isn't a valid argument in any sense.
Graphics is about (quality /fps) ratio If something like AA/AF isn't worth getting eventhough X800XL has it for less price and performs better in AA/AF than nvidia 6series or any other, so why are you intrested in sm3.0 eventhough most probably you may upgrade once or even twice before really using it?
If AA/AF isn't important so why is sm3.0 important? if AA/AF means nothing to you go buy a console then you will save much money on hardware upgrades I beleive. Ppl who use A PC to game mainly withstand the cost of PC gaming because they demand the best graphics quality available to date and go for the best affordable/performance option.

Drayvn · Aug 10, 2005

Originally posted by: Pete
Dravyn, AFAIK: 24 fragment pipes (I don't think they qualify as either traditional or pixel pipes in "traditional" terminology), two MADD ALUs per pipe (not two MADDs per ALU).

As a general rule, trust Ben to know what he's talking about.

They are i think pixel fragment pipelines, all of them.

And the ALUs can do 2 MADD ops per clock each.

EDIT: Yup just checked around the web, and yea the 7800GTX has 24 fragment pipelines. And it has 2 MADDs per ALU. And there are 2 ALUs.

hans030390 · Aug 10, 2005

I dont understand a word you say for some reason.

Or perhaps you havent been reading. I WILL NOT BE ABLE TO UPGRADE FOR A COUPLE YEARS! Do you understand why SM3 was important to me? BECAUSE IT WILL BE USED EXTREMELY HEAVILY IN NEXT GEN GAMES. Even if I dont get the extra eye candy from it, it will sure as hell give me better performance than if i ran it on SM2 mode.

AA/AF isn't important just because all it does is either smooth the screen or show textures better from far away. Would you rather have a Sm2 card with poor AA/AF performance (and not use AA/AF) or a Sm1.4 card with great AA/AF quality?

Look man, im only freakin 15 right now, do you think i can spend uber amounts of money upgrading all the time? no. I had to have Sm3 because i knew it would be used. I needed the newest features because i knew i wouldnt be upgrading.

I'm sorry if you play with AA/AF on. I'm capable of using it, but i dont. Its just a waste of performance and the extra eye candy it gives isn't anything like SM3 could deliver.

whatever. I dont feel like arguing anymore cuz no one else gets it. or at least from my pov

Drayvn · Aug 10, 2005

Originally posted by: hans030390
I dont understand a word you say for some reason.

Or perhaps you havent been reading. I WILL NOT BE ABLE TO UPGRADE FOR A COUPLE YEARS! Do you understand why SM3 was important to me? BECAUSE IT WILL BE USED EXTREMELY HEAVILY IN NEXT GEN GAMES. Even if I dont get the extra eye candy from it, it will sure as hell give me better performance than if i ran it on SM2 mode.

AA/AF isn't important just because all it does is either smooth the screen or show textures better from far away. Would you rather have a Sm2 card with poor AA/AF performance (and not use AA/AF) or a Sm1.4 card with great AA/AF quality?

Look man, im only freakin 15 right now, do you think i can spend uber amounts of money upgrading all the time? no. I had to have Sm3 because i knew it would be used. I needed the newest features because i knew i wouldnt be upgrading.

I'm sorry if you play with AA/AF on. I'm capable of using it, but i dont. Its just a waste of performance and the extra eye candy it gives isn't anything like SM3 could deliver.

whatever. I dont feel like arguing anymore cuz no one else gets it. or at least from my pov

You could always turn to crime!

Err.... beddybies now....

SM3.0 is a scam.

Diamond Member

Diamond Member

Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Member

Diamond Member

Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Banned

Diamond Member

Diamond Member

Banned

Golden Member

Diamond Member

Golden Member