Favorite Processor Paper from ISSCC 2014
My favorite paper from the ISSCC processor session (5.6) describes an adaptive clocking technique implemented in AMDs 28nm Steamroller core that compensates for power supply noise. Most papers in the processor sessions are overviews that emphasize broad feature sets and the scope and scale of the project. In contrast, paper 5.6 was tightly focused on a specific problem (i.e. power supply noise) and clearly articulated a solution that was implemented in the Steamroller core.
...............................................................................................................................
Future Directions
AMDs adaptive clocking system in Steamroller is quite attractive, offering a significant improvement in power at a minimal cost in terms of area and a negligible impact on performance. However, there are several potential avenues for improvement.
First of all, the latency of the droop detection and clock stretching could be reduced. Currently, there is a minimum 3 cycle lag before the system can begin to compensate. The droop detector is an asynchronous circuit, which creates a slight delay as the output must be synchronized before it is passed to the clock stretcher. This means that Vmin must have enough guardband to tolerate a few cycles (probably <10) of voltage droop.
Reducing the response time of the clock stretching would reduce Vmin even further, resulting in greater power savings. Certain dI/dt events may be predicted in the pipeline.
For example, the front-end could signal a hint when decoding 256-bit AVX instructions, indicating that there is likely to be a dI/dt event when those instructions are executed.
Second, this technique could be applied to AMDs discrete and integrated GPUs, although it is hard to say how big the benefits would be for GPUs. The target clock frequency for a GPU is 1GHz rather than 3GHz and the clock domains are bigger and contain more cores. On the other hand, since GPUs are so parallel dI/dt events may be much bigger (e.g., if all the shaders in a GPU simultaneously begin executing a floating point kernel).
Even if the benefits are just half of what is possible in a CPU, a 5-10% decrease in power is significant for a 250W GPU.
Third, since adaptive clocking minimizes the impact of voltage droops AMD could remove package decoupling capacitors or package layers to reduce the cost of the overall platform.
Fourth, the adaptive clocking could be used to improve the transition between different voltage/frequency combinations by reducing the latency.
Summary
Overall, AMDs adaptive clocking paper was enlightening and enjoyable and stood out from the processor section. While it addresses a longstanding problem, the solution is new and an interesting approach to the challenges in power delivery.
The paper also demonstrated one of AMDs key differentiators, expertise in power management and clocking, that is critical for any computing platform from mobile to servers. The techniques described will first appear in AMDs Steamroller based platforms, but are expected to roll out across other IP blocks potentially including GPUs, ARM cores, and the
Jaguar core.