About
cpuGFN19: Data allocation is 8 MB (
earlier post with links). Which means this application can easily run into a memory access bottleneck.
About
OCLcudaGFN19: I looked a little through some top regulars' hosts but didn't find any which ran GFN19 recently. So I started one random task on a Pascal GPU of my own. The following was logged:
Supported transform implementations: ocl ocl2 ocl3 ocl4 ocl5
[...]
OCL transform is past its b limit.
OCL3 transform is past its b limit.
OCL4 transform is past its b limit.
OCL5 transform is past its b limit.
Using OCL2 transform
The OCL transform is the only one which uses FP64. But as this one is not usable in GFN19, server GPUs and some AMD GPUs with a high FP64 throughput design will not have a special advantage here. The OCL2 transform performs INT32 operations.
(From
pschoefer's memory: OCL = FP64, OCL2 = 3*INT32, OCL3 = INT64, OCL4 = 2*signed INT32, OCL5 = unsigned INT32.)
Edit: On a GTX 1080Ti, Linux, no app_config.xml, OCLcudaGFN19 ran constantly at 100 % SM utilization and 80 % memory controller(?) utilization, and at or near its board power limit, at least while I watched.