Why is the report dated Oct 27, 2014?
Interesting. Seems like a different and possibly useful idea.
However, the report seems fishy. Someone correct me if I am wrong here.
Things to note.
Compared with a high-IPC processor such as Haswell, the front end is of similar complexity, but the scheduler in each physical core is much simpler, as it manages only a few function units (versus eight in Has-well). The data cache also has fewer ports and can cycle faster. Because the execution resources of both cores can apply to a single thread, however, even a two-core VISC processor can deliver total ALU operations or memory op-erations per cycle that match or exceed those of Haswell.
But only on a single thread. If the task can be parallelized and is implemented in this fashion the performance gain is 0.
Soft Machines has designed and fabricated a test chip that implements its VISC architecture using two physical cores. It refused to discuss the number of function units or other basic microarchitecture capabilities of these cores but char-acterized them as A15-class CPUs. We interpret this state-ment to mean that each core can execute three to four op-erations per cycle with moderate instruction reordering.
The entire benchmark results are measured against A15. Again, this is only valid if the processor is similar in size and complexity to A15.
Because it uses a pipeline with only 10 stages (including some extra stages for VISC scheduling), the CPU cannot match the high clock speeds of leading-edge x86 and ARM processors. We estimate the chip runs at several hundred megahertz. Even at this low speed, it completes some pro-grams in less time than a low-end Haswell processor
You may see power savings by running at a lower frequency but IPC must be put into perspective: its useless without a high frequency.
Despite its relatively simple design, the chip achieves spectacular performance. On the single-thread SPEC2006 test suite, the company reports an average IPC of 2.1, counting ARM instructions rather than VISC instructions. This IPC compares with 0.71 for Cortex-A15 and 1.39 for Haswell. (For consistency, Soft Machines measured the IPC on all three processors using GCC rather than Intel’s favorite compiler, ICC.) Thus, the VISC chip achieved three times the IPC of ARM’s highest-end CPU shipping today and 50% better IPC than Intel’s fastest mainstream CPU.
It is important to note that they are comparing the performance of a single A15 or haswell core to VISC on multiple cores (2 in this case, or 4 simulated in the bar graph). More die is needed and more power will be used.
Although these results are impressive, they require some caveats to put them into perspective. A shorter CPU pipeline reduces branch penalties and other pipeline haz-ards, thereby improving IPC compared with a longer pipe-line. In addition, a low CPU speed reduces the effective latency of caches and main memory (measured in CPU cycles), again improving IPC relative to a CPU with a faster clock. The latter effect might explain why the test chip appears to perform better on SPEC2006 than on SPEC2000, which has a smaller memory footprint.
Easily seen, especially with wider designs, designed for low frequency.
Although the test chip has only two physical cores, Soft Machines has run simulations on a four-core design. As one might expect, the performance gains diminish for the additional cores: the third core adds 20–30% to single-thread performance, and the fourth adds only 10–20%. In total, the four-core design delivers about twice the perfor-mance of a single core. The unused resources in the extra cores, however, can be devoted to additional threads. For example, a four-core design could run two threads at close to their maximum performance.
Performance-critical applications
Dual Virtual Core/A15 IPC Ratio
This seems great but I have to ask. How are you getting 4x IPC scaling in your tests when you say later that 4C only doubles IPC? It simply doesn't make sense. There seems to be a Dhrystone benchmark showing >4x IPC on a 2C design (last test, powerpoint). This simply doesn't make sense unless you are dealing with a wider core than A15. You are getting more than perfect scaling. Execution resource efficiency only goes down with more cores so maximal output per physical core cannot go up. Let me be frank. They are saying that two of their cores achieves 7x the IPC of a single A15 on some tests with average ~3-4 which is impossible unless the theoretical core performance is different or utilization of theoretical core resources increases dramatically by 2x up to nearly 3.5x.
Performance-critical applications will benefit from VISC, but the technology can also apply to low-power de-signs. As the test chip demonstrates, a VISC design can operate at a relatively low clock speed while achieving the same absolute performance as a traditional design operat-ing at a higher clock speed. Thus, it should use less power, particularly if the voltage is reduced as well. Soft Machines, however, declined to reveal the power consumption of its test chip.
Other details also remain undisclosed, including die area. Details of how the pro-cessor handles privileged operations, inter-thread synchronization, traps, and interrupts could all affect how well it runs certain ap-plications. Performance could vary widely across different workloads.
Okay. So it gets the same MT performance as an equivalent chip but when it needs it can redistribute cores to the same thread (reverse HT in effect) and get superiour IPC for a SINGLE core.
Edit: I'm also not seeing where they are getting the power savings from. 4x is a lot but take it into context. Running one core gets you an IPC gain of 50-60% but will use >50-60% more power (another core running at 50-60% load). Perhaps you can drop frequencies by 50-60% to save power but I doubt this will save that much power. Chips tend to perform quite linearly in terms of perf/power in their "sweet" ranges.