Does deferred rendering have a future?

Marty

Banned
Oct 11, 1999
1,534
0
0
I'm sure we all remember the days of the PowerVR, and now more recently the Dreamcast and Kyro products. Also recall the gigapixel technology aquired by 3dfx and now (apparently) held by nVidia. Like it or not, the technology has the respect of at least some in the industry. However, it has been shunned by others, notably Tim Sweeney, lead programer for the people who brought you Unreal Tournament.

As far as I can tell, the arguments against it deal with the fact that deferred rendering involves scene capture, where the entire scene is captured and processed (hidden surfaces are removed, etc.) prior to actual rendering, whereas traditional architectures deal with a stream of data. They end up doing more work, but can start doing it more quickly.

Scene capture in itself has a few negative aspects. Firstly, it uses additional memory on the card itself. All those triangles need to be stored somewhere after all. As triangle counts increase, so will memory requirements. Secondly, the entire scene must be processed before it is rendered. Again, the problem arises with higher polygon counts, when there is more work to be done. So it is when polygon counts are higher that deferred rendering meets its bottleneck.

On the other hand, traditional architectures also have problems with high polygon counts. This slow-down is actually not assosciated with the number of polygons, but with the higher overdraw factors that usually come into play with the environments that have such high polygon counts. Where deferred renderers draw only that which is seen, traditional renderers (simply represented) draw everything.

It seems that the issue comes down to triangle rate vs. fill rate. Will traditional renderers have enough pixel-pushing power to make up for the inherent disadvantage when compared to deferred renderers? Will triangle counts grow quickly enough to rule out a deferred renderer at the high end, or will they move slowly enough to allow a deferred renderer to excel?

Up 'till now we have not yet seen a high-performance implementation of a deferred renderer. Even so, recent products have left a favorable impression on the marketplace. My question is, where do you see the future taking us?

Also feel free to discuss any other advanced 3d-graphics related developments here.

Marty
 

Noriaki

Lifer
Jun 3, 2000
13,640
1
71
I don't know that the Kyro2 has a huge problem with large polygon counts, as it does with a high number of lights.

In scenes that make use of multiple lighting sources really slow it down compared to the GeForces that have 8 hardware lights.

The Kyro3 could help TBR quite a bit, when it bringing lighting into hardware.

I think the real benefit of deffered rendering is that you don't need to keep increasing memory bandwidth to improve performance.

Those RAM chips are going to start getting pretty pricey...
 

thraxes

Golden Member
Nov 4, 2000
1,974
0
0
I think what Tim Sweeny was moaning the most about TBR was that he saw little chance of getting a T&L unit to work with it which he needs for his high polygon Unreal2. The Kyro3 is supposed to have T&L and I read somewhere that it will have neat DX8 features like programmable pixel shaders and so on. If this is true then I'd say Mr Sweeny as a programmer is well informed about immediate mode rendering (as he should be) but not all that familliar with modern TBR systems and developments underway. Which is understandable since STMicro and their TBR only recently came back into discussion with their Kyro2, and before programmers were really working with the latest and greatest of immediate mode renderers from Nvidia and ATI.

I think in the long term, a good TBR system is the way to go, since the amount of memory bandwidth needed by more and more polygons will make the resilting bandwidth hungry IMR cards expensive simply because of the high speed RAM they need. TBRs don't have that problem. If STMicro pulls it of and brings the Kyro2 as a fully compliant DX8 chip without needing the ultrahigh bandwidth RAM like Nvidias GF3 while delivering similar performance then it's bye bye Nvidia. Then economics take over... who will buy a card for 300$+ when he can get similar to equal results from another technology that costs 50-100$ less???
Anybody??? No? Thought so.

At any rate, I have decided to wait for Kyro3... either it will be so darn good and offer buckets of performance and features at a great price or it will just flop and hopefully bring down the price of GF3s a little. Did you hear that my faithful TNT2?? Guess you're gonna stay in that tower for a little while longer after all.
 

nam ng

Banned
Oct 9, 1999
532
0
0


<< I think the real benefit of deffered rendering is that you don't need to keep increasing memory bandwidth to improve performance. >>


There's no free lunch, what was the trade-off for for less memory bandwidth requirement?
 

Noriaki

Lifer
Jun 3, 2000
13,640
1
71
It requires time to sort the polygons.

That's how it gets is high fillrate and low memory bandwidth requirements.

But if you have a hardware T&amp;L unit, you should be able to sort as you transform them.
 

Marty

Banned
Oct 11, 1999
1,534
0
0
Yes, it is the sorting that is the problem. As far as I know, however, the sorting is not a linear-time operation. In the best case it should be O(n*log n), which means that as triangle counts increase, the workload for a deferred renderer will increase more than for a traditional renderer, which has to deal with only linear time operations when it comes to triangles.

However, 3dfx gave some strong indications that there was a way around this problem and that the answer was in the Gigapixel tech. Its too bad we will never find out what it really was, and if it really worked.

Marty
 

nam ng

Banned
Oct 9, 1999
532
0
0
Meaning workloads causing latency were shifted up toward the front of the graphic pipeline, this proved it more performance capable than IMR traditional?
 

nam ng

Banned
Oct 9, 1999
532
0
0
Does anyone know if KyroII dual-engine is one tile at a time, or dual-engine --> dual-tile procesing?

KyroIII --> dual-tile, quad-tile processing, or only more umph per tile?
 

BenSkywalker

Diamond Member
Oct 9, 1999
9,140
67
91
The issue here is the binning part of the equation.

Visibility checks are done per pixel, which means that you can forget sorting at the transform level at least if you are still talking about a Tile Based Renderer(TBR). It would be possible rendering front to back to do an early sort and eliminate some of the poly load(much as the GF3 can eliminate OD this way), but you will still need to bin geometry to perform visibility checks.

For the time being, this isn't a major issue as the most complex games in terms of geometry on the PC side is Giants, and that peaks in the ~30K poly per frame range. However, with Unreal2 and Doom3 on the horizon this situation could very well change significantly as they are supposed to be pushing close to 250K polys per frame.

This leaves a situation which creates two problems in regards to binning. One is that you will need more RAM. I've seen it mistakenly stated on boards many times that a 32MB Kyro2 should be compared to a 32MB traditional when that simply isn't the case for normal operation. The binning space alloted takes up 6MB total on the K2 right off the top, that is ample room for the games of today however it will need to be increased for future titles. If you factor in FSAA for IMRs then the situation changes as you don't have to deal with as large of a framebuffer for TBRs, but in normal operation you are giving up a bit of RAM compared to an IMR.

Moving forward, it is reasonable to assume at the very least that a doubling of on board RAM will be required(for the next ~eighteen months conservatively) for binning purposes.

The amount of RAM is only part of the problem however. Another factor is the bandwith involved. If you integrate a T&amp;L unit into the core rasterizer then you need to have the vertice data transmitted to the card via the AGP bus(or stored locally which creates further bandwith constraints), then processed via the T&amp;L unit, then written back to RAM for binning. This swapping back and forth of data is going to increase the amount of bandwith needed on TBRs by a decent amount.

There are ways of working around this somewhat, using HOS you can eliminate a decent amount of OD polygons prior to binning, or it may be possible to bin the HOSs themselves and complete visibility checks on them prior to tessilation. This would save some bandwith, however the geometry aspects of TBRs will still require more RAM in terms of amount and bandwith then IMRs.

Will this offset their OD advantages? Time will tell, although as it is now IMRs closing in on the limits of display technology. At a certain point, the ability to push 1600x1200x64x4 will be a given for all rasterizers and at that point the advantages of TBRs will certainly be in question, particularly if the looming explosion in geometry useage continues past UN and Doom3 and doesn't &quot;settle in&quot; for a decent length of time.
 

br0wn

Senior member
Jun 22, 2000
572
0
0


<<
As far as I know, however, the sorting is not a linear-time operation. In the best case it should be O(n*log n), which means that as triangle counts increase, the workload for a deferred renderer will increase more than for a traditional renderer, which has to deal with only linear time operations when it comes to triangles.
>>



Just try to clarify some statements

Comparison-based sortings have a bound of (n* log n), where n is the input size,
which means the best comparison-based sortings will only have a guarantee
result of (n* log n) to perform sorting.
Examples of some comparison-based sortings are bubblesort (n^2 worst case),
quicksort (n^2 worst case), merge sort (n* log n) and heap sort (n* log n).

However, there are some sorting techniques that do not based on comparisons and
hence it is able to achieve LINEAR (or n, instead of n * log n) time bound.
This type of technique requires some knowledge of the domain (or inputs).
Example is radix sort. Let me know if you want me to post on how this radix
sort works.

 

Def

Senior member
Jan 7, 2001
765
0
0
br0wn,

I'm interested in how this type of sorting would work.

I'm familiar with all the previous ones you mentioned, but haven't heard of a method of sorting that works without comparison. Although I have only had an intro CS course, and that's quite enough for me. So go slow for the CS Simpleton.
 

Marty

Banned
Oct 11, 1999
1,534
0
0
Would the linear type sorting would be something like binning? Where every pixel on the screen has its own array of size 2^32 (for 32 bit Z), and if a triangle is encountered at a particular z value, the bit in the array at the pixel is flipped from 0 to 1 (or something equivalent). If this is what you mean, I guess it would work, except that the memory requirements would be huge. You would need 2^29 bytes for every pixel, something which is quite impossible.

Let me know if you had a different sorting algorithm in mind.

Marty
 

Noriaki

Lifer
Jun 3, 2000
13,640
1
71
Def:
Here is a decription of Radix sort.

As Br0wn said, you are making an assumption about the input.
You can't use Radix sort (with good results) on a completely random set.

This will be an O(n) sort time.

I'm not a rendering expert, but I believe the Domain is predictable enough that you could use a non-comparison sort.

Thus as I said before, Binning the geometry should not be a terribly huge bottleneck.
Of course it does use a fairly large amount of RAM, but less bandwidth. I think that a larger amount of slower RAM would be more cost effective in the long run than a smaller amount of very fast RAM. Having an extra 32MB for Geometry binning isn't that significant, when you consider that the GF4 is likely to have 128MB of I don't know say 250-300Mhz DDR SDRAM.

Compare that to 160MB of 183-200Mhz SDRAM for a KyroIII, and I think the K3 should have the cheaper RAM cost. Or perhaps even 192MB, I'm not sure exactly how much the Kyro2 uses. But still I think 192MB of 5.5-5ns SDRAM is fairly cheaper than 128MB of 4-3ns DDR SDRAM.

Though I could be all wrong, the domain of the geomtry to be sorted could be to unpredictable to use a non-comparison sort.
Which would throw everything I just said out the window

(Well my observation about RAM cost would still be true, but if you have to use a comparison sort, the K3 won't be able to keep up in performance)
 

br0wn

Senior member
Jun 22, 2000
572
0
0


<<
Would the linear type sorting would be something like binning? Where every pixel on the screen has its own array of size 2^32 (for 32 bit
Z), and if a triangle is encountered at a particular z value, the bit in the array at the pixel is flipped from 0 to 1 (or something equivalent).
If this is what you mean, I guess it would work, except that the memory requirements would be huge. You would need 2^29 bytes for
every pixel, something which is quite impossible.
>>



Yeah, it works like binning.
However, it doesn't require that much memory as you described.
Check the link posted by Noriaki, there is an animation
program that show how this radix sort works.
 

Marty

Banned
Oct 11, 1999
1,534
0
0
I don't see how that makes the worst-case memory requirements any smaller. You have an array of lesser length, but now each element in the array is an array of maximum depth equal to the length of the original array. For 32 bit Z, it still works out to 2^32. Furthermore, following the example in the link, you need a second array, which doubles your memory requirements. The example is only an improvement when you allow the memory size to be dynamic, which I don't believe it would be on a graphics card.

If you allow the memory allocated for this purpose to be dynamic in size, the average case may not be as bad, but the worst case still remains. I don't think that this algorithm lends itself to practical implementation, at least not yet, when memory sizes on graphics cards don't yet number in the gigabytes. Of course, there are no applications with triangle counts large enough to consume that much data, so the array would be mostly empty, allowing compression, but we are talking about the future, after all.

Marty
 

br0wn

Senior member
Jun 22, 2000
572
0
0


<<
I don't see how that makes the worst-case memory requirements any smaller. You have an array of lesser length, but now each element in
the array is an array of maximum depth equal to the length of the original array. For 32 bit Z, it still works out to 2^32. Furthermore,
following the example in the link, you need a second array, which doubles your memory requirements. The example is only an
improvement when you allow the memory size to be dynamic, which I don't believe it would be on a graphics card.
>>



You might be right as I don't have any expertise in graphics hardware and don't know for sure what are the inputs to be sorted here


Anyway, regarding to how to reduce high polygon count, currently
there is a hot topic discussing this technique which is called
&quot;Polygon Simplification&quot;. Basically it uses less amount of
polygons for object that are further away from the viewpoint
(changing the level of detail).
For example, an object A with 100,000 polygons.
When it is near the viewpoint, we use 100,000 polygons to render the object. When it is further away, we can use 1000 (or 10,000) polygons
to render the same object.

 

nam ng

Banned
Oct 9, 1999
532
0
0
Anyone can provide a link to any web sites, which analyzed KyroII poly handling ability? Small repetitive data set, large data set, or huge and constantly varied dynamic ones?
 

Noriaki

Lifer
Jun 3, 2000
13,640
1
71
It doesn't improve the worst case.

In fact the worst case of Radix sort is horrendous.

But the point is, if you can make some sort of assumptions about the data set you can probably tailor an algorithim to sort that data set.

Radix sort was an example of a non-comparative sort, it's not the only one, nor did I mean it to be the example that you would want to use for this application.

I don't know that much about graphics, or algorithim design besides your regular CS theory, perhaps you can't design an algorithim for this data set.

 

br0wn

Senior member
Jun 22, 2000
572
0
0


<<
It doesn't improve the worst case.

In fact the worst case of Radix sort is horrendous.
>>



That is not entirely true.

Remember that in measuring algorithm there are several metrics.
Here what we have discussed are time and space(memory) metrics,
so you shouldn't mixed them up.

Radix sort has advantage of having linear time bound but in the
expense of having worse memory requirements.

A comparison would be like hash table compare to other data
structure for searching. Using hash table, one can search
in constant time but require at least 2*n memory for storage.
While other structure like sorted linked list, one has to
search in log (n) time but only require n memory for storage.

There are other linear time sorting techniques like
counting sort and bucket sort.

All other techniques above are SOFTWARE solutions.
Since we are talking about designing new hardware,
we can even perform sorting using hardware.
An example would be a sorting network (using bitonic sort),
then we can even break LINEAR time bound, because we
can sort n numbers in order of (log^2 n) which is much
less than linear.

 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |