The most unique aspect of Denver is the dynamic code optimization. The core microarchitecture of the CPU is unique in that it has an in-order pipeline, but uses special software to reorder and optimize instruction traces. During repetitive code sequences, the Denver CPU collects dynamic runtime information during code execution and passes this information to the dynamic code optimizer; enabling the optimizer to assess more optimized ways for the code to be executed. The CPU uses hidden time slices to run the optimizer or can use the second core for optimizations for the active core.
The dynamic optimizer runs in its own private and protected state and is not visible to the operating system or any user code. The signed and encrypted dynamic optimizer code loads at boot into a protected part of main memory. By performing the reordering and register renaming in software, Denver eliminates the power hungry out-of-order control logic and yet it can achieve comparable results.
The profiler gathers info on program flow such as branch results (such as taken, not taken, strongly taken, and strongly not taken) and other hardware statistics tables and counters. The optimizer (Figure 1) recognizes opportunities to improve execution and then can rename registers, reorder loads and stores, improve control flow, remove redundant code, hoist redundant computations, perform loop unrolling, and other common optimizations. Because the run-time software performs optimization, the profiler can look over a much larger instruction window than is typically found in hardware out-of-order (OoO) designs. Denver could optimize over a 1,000 instruction window, while most OoO hardware is limited to a 192 instruction window or smaller. The dynamic code optimizer will continue to evaluate profile data and can perform additional optimizations on the fly.