hehe, well I won't hold it against you. No need to do it though. If you ever have a question, I'm always willing to answer.
So how does deferred rendering work (as seen on PowerVR and Gigapixel)?
Well, like with any graphics card, the first step if your basic T&L. I won't go into that as I assume you know how that works. Now you do a coarse rasterization of the incoming triangles, sorting them into the appropriate bins, and updating the graphics state (texture state, etc.) in each bin. When the frame is over, you have a binned display list which has stored the scene from the previous frame. Now you write your triangles to the tile buffer, which is a small cache that holds a tile. You iterate across the bins, processing each one separately. From here, you proceed to do what is known as ray casting. This is a massively parallel design which determines what pixels are visible in the tile. After determing pixel visibility, you know what you see and so all other data is dropped.
Now while your data is still in the tile buffer, you can do any additional texture layers you need, as well as anti-aliasing. This reduces memory bandwidth usage.
To finish things up, you finish going through each tile and after you've finished, you have your final image which is displayed.
As for NVIDIA. They work on a traditional architecture. Basically, they do their T&L. From there, they take their Z values, and right them to a Z-buffer in the order they receive them. This determines visibility. From there, they texture and then output to the back-buffer. Basically, the same thing that everybody else does.
Now I'm not completely sure how occlusion detection works on NV20. With early Z-checks, if they do that, you basically do a Z check before you read a texture value. This saves you time in doing texture reads that wouldn't be needed. This isn't going to make a huge difference, but everything helps.
Occlusion detection is a bit more difficult. There are a variety of ways you can do it, both in hardware and software.
You could do a low resolution Z buffer check. Now you have to use a CPU for T&L with this, because hardware T&L lacks the flexibility. Now you do your transformations in software and write a your Z values to system memory. From there, you check the Z value and see if it is occluded. If it is you drop it. This works entirely on a per-triangle basis.
Another option requires direct hardware support. Here you break up your screen into bounding boxes. You'd likely do this with each primitive, as you receive them. From there you'd check for visibility and if you don't see anything, nothing within the box is rendered (so basically you drop all the data).
Ben,
Where do you get the idea that I'm speculating? If I don't say I'm speculating I'm not speculating.
As for multi-sampling, it is the def. that most all other engineers use these days, which involves the reuse of a single texel value for each sample.