The point is that the primary cause of the micro stutters is the fact that the CPU need significantly less time to prepare a frame than the GPU needs to render it. The more even the load distribution is shared by the CPU and GPU, the less micro stuttering will be noticeable.
The problem you describe equally applies to single GPUs (e.g. you can hit a GPU limited scene where suddenly the CPU is ready much quicker relative to the GPU than it was the previous frame). Single GPUs have a fluctuating framerate but it?s obviously not as bad as AFR systems.
My theory is that micro-stutter is related to timing and synchronicity issues the driver has to deal with in order to keep multiple GPUs load balanced under AFR.
This is especially true if there are dependencies between GPUs such as render-to-texture operations done on one board that are then expected to be re-used in subsequent frames, but the other GPUs don?t have them. Those kinds of dependencies are not present on a single GPU.
If you delay the CPU for about half the time it takes the GPU to render the frame (a little less actually), the frame rate wont be affected much, but the frames are spread out much more regularly (^ representing the added delay):
Your charts aren?t accurate or to scale.
Using figures if we have something like this:
1 10 ms
2 12 ms
3 22 ms
4 24 ms
5 34 ms
6 36 ms
To even that out we need 6 ms between each frame which requires a 4 ms delay on every even-numbered frame:
1 10 ms
2 16 ms
3 22 ms
4 28 ms
5 34 ms
6 40 ms
The first example renders 6 frames in 36 ms while the second renders 6 frames in 40 ms, or an 11% increase in total rendering time; not huge but the question is whether IHVs will find that acceptable.
Of course depending on the size and frequency of variances and how many total frames are rendered, YMMV.
The difficult part is, that this must be highly dynamic, usually the delay needs to be added for one frame only (because the following frames will always follow the rhythm) until the scene changes and the frames get uneven once again.
Delays like that could also wreak havoc with input response if the game tick is still updating data while external rendering delays are being put in place which the game doesn?t know about.
Also, the more GPUs are used, the less pronounced is the micro-stuttering effect. I.e. in situation where you get micro-stutters with 2-way SLI, there's a very good chance, that the same scene works fine with Quad-SLI under Vista.
Why would more than two GPUs make a difference? If GPU 1 and 2 is experiencing micro-stutter (e.g. 10 ms to render on GPU 1 and 2 ms to render on GPU 2) what bearing does GPU 3 or 4 have on the situation?
GPU 3 could render in 2 ms (which would have no extra adverse effect) or it could render in 10 ms and cause micro-stutter of its own relative to GPU 2.