Today, many sources are touting that the biggest problems designers of next generation graphics chipsets have to face is memory bandwidth. There are several techniques to combat this problem; however, all of them have design trade-offs. For example:
- Tiling. Greatly improves fillrate efficency by eliminationg overdraw and unnecessary writes to frame buffer. However, practical implementations have been complex (breaking scene into small tiles is not trivial), and had low peak fillrate which leads to such issues as sluggishness with multi-pass transparencies. Also some programs doing tricks with Z-buffer might have issues with video accelerators using tiling architecture.
- eDRAM. Certainly the "coolest" new technique which brought Bitboys to the scene is to dedicate a large amount of on-chip memory, eDRAM as frame- and Z-buffers. However, no implementations exist. Perhaps current process tehcnology doesn't allow for creating large enough eDRAM buffers. Also in highest of resolutions, this kind of solution has to partially fall back into traditional memory for extending frame buffer.
- various tricks (Z-culling, guardband clipping) to reduce unnecessary color/Z-reads and writes by hidden surface removal. Certainly a good idea with very few drawbacks, but isn't as efficient as aforementioned tiling in eliminating overdraw. Also, issues might arise if Z-value range is fully utilized or insufficent (speculation warning: could this explain Radeon's weird issues in some titles with 16bit color?)
But what if we took the best of these three and put it all in one chip, this way:
- Chip has a memory subsystem consisting of a sizeable, say 1-2MB eDRAM buffer and fast (200MHzish range) DDR SDRAM memory
- Chip breaks scene into very large tiles (256x256 or even bigger), as large as eDRAM buffer can host. Then the area of the tile is rendered, and result written into frame buffer residing in local DDR memory.
- Chip supports some form of crude HSR techniques
Tiling overhead would be low, since there are only very few tiles. Rendering engine could be "traditional", something in the lines of GeForce2 or Radeon with crude HSR support, and thus have the high peak fillrate normally not associated with tiling renderers. Compared to eDRAM frame buffer such as Bitboys's Glaze3D this chip would be easier to manufacture, since eDRAM doesn't have to be large enough to house full frame buffer.
This kind of three-way compromise sounds like a winner to me, please tell me what you think. Is there a flaw in my reasoning?
Of course "the" chip would also have pixel pipelines capable of taking and gaussian-weighted-blending eight subsamples (with programmable positions) at a single pass, but that's another story and not related to memory bandwidth compromises
- Tiling. Greatly improves fillrate efficency by eliminationg overdraw and unnecessary writes to frame buffer. However, practical implementations have been complex (breaking scene into small tiles is not trivial), and had low peak fillrate which leads to such issues as sluggishness with multi-pass transparencies. Also some programs doing tricks with Z-buffer might have issues with video accelerators using tiling architecture.
- eDRAM. Certainly the "coolest" new technique which brought Bitboys to the scene is to dedicate a large amount of on-chip memory, eDRAM as frame- and Z-buffers. However, no implementations exist. Perhaps current process tehcnology doesn't allow for creating large enough eDRAM buffers. Also in highest of resolutions, this kind of solution has to partially fall back into traditional memory for extending frame buffer.
- various tricks (Z-culling, guardband clipping) to reduce unnecessary color/Z-reads and writes by hidden surface removal. Certainly a good idea with very few drawbacks, but isn't as efficient as aforementioned tiling in eliminating overdraw. Also, issues might arise if Z-value range is fully utilized or insufficent (speculation warning: could this explain Radeon's weird issues in some titles with 16bit color?)
But what if we took the best of these three and put it all in one chip, this way:
- Chip has a memory subsystem consisting of a sizeable, say 1-2MB eDRAM buffer and fast (200MHzish range) DDR SDRAM memory
- Chip breaks scene into very large tiles (256x256 or even bigger), as large as eDRAM buffer can host. Then the area of the tile is rendered, and result written into frame buffer residing in local DDR memory.
- Chip supports some form of crude HSR techniques
Tiling overhead would be low, since there are only very few tiles. Rendering engine could be "traditional", something in the lines of GeForce2 or Radeon with crude HSR support, and thus have the high peak fillrate normally not associated with tiling renderers. Compared to eDRAM frame buffer such as Bitboys's Glaze3D this chip would be easier to manufacture, since eDRAM doesn't have to be large enough to house full frame buffer.
This kind of three-way compromise sounds like a winner to me, please tell me what you think. Is there a flaw in my reasoning?
Of course "the" chip would also have pixel pipelines capable of taking and gaussian-weighted-blending eight subsamples (with programmable positions) at a single pass, but that's another story and not related to memory bandwidth compromises