Maxon designed Cinebench to measure rendering performance, but we can look at how similar Cinebench is to other workloads too. It’s not representative of gaming performance, where the frontend is challenged by a ton of branches and a large instruction footprint. It’s also not completely bandwidth or compute bound, setting it apart from the likes of Y-Cruncher. Video encoding is a closer comparison, though video encoders tend to emphasize vector performance more than Cinebench does.
By itself, Cinebench 2024 is a moderate IPC benchmark with a sizeable instruction and data footprint. Code spills into L2, but the instruction stream is easier to follow than what we saw in games. Decoupled branch predictors can thus keep the frontend fed even in the face of L1i misses. On the data side, Cinebench 2024 spills out of L3 and requires a modest amount of DRAM bandwidth. High scheduler capacity across integer and FP operations help keep more memory operations in flight in the face of DRAM latency. In that sense, Cinebench 2024 can be seen as Cinebench R23 with more emphasis on DRAM performance. When hitting the execution units, Cinebench 2024 uses scalar and 128-bit packed floating point operations. Wider vector execution units are not useful. Scalar integer performance plays an important role in keeping the FP execution units fed.
In the end, Cinebench 2024 poses decent challenges to the frontend and backend. It has a more realistic instruction footprint than SPEC2017,
which has no subtest with more than 12 L1i MPKI. Maxon has also addressed Cinebench R15 and R23’s small data-side footprint, which could be mostly contained by a 8 MB last level cache. High core count systems could be constrained by memory bandwidth, which happens across a lot of other well-threaded applications. These characteristics make Cinebench a decent benchmark. There’s area for improvement though. It could be a better stress test if it more heavily leverages vector execution. Hopefully the next version of Cinebench is better vectorized.