Problem is that the frequency is 1/2 of actual memory frequency, in spite of what the OP stated in this post - please see LClk in the AMD produced slide below.
Using 3200Mhz RAM would give you DF bandwidth of 32Bytes * 800000 cycles = 25.6GBs is the maximum throughput that this design has to the memory, or the PCIe bus, or in between the CCX modules.
Those bottlenecks are the reason that the Gaming performance hits a ceiling with fast GPUs, particularly when you only have 2133 or 2400Mhz Ram installed. The IO hub expects to be getting 22.5GBs of when if you are using 2400Mhz Ram, there is only 19.5GBs available between the CPU and the IO hub, and that is assuming that the CPU/GPU is not actually trying to access memory, storage or swap threads at the same time it is also trying to send data to the GPU.
It also indicates that the Aida64 memory benchmarks are including the L1, L2 and L3 cache performance in what it claims is the memory performance benchmark results inflating the benchmark scores over what can actually be written to the RAM sticks. I can only assume that it is also doing that with Intel chips
If you increase the design to use 8 memory controllers (4 x the 2 that exist now), you get the 100GBs but each module is still going to be connected with 32Bytes per cycle interconnects unless there is a way to overclock the Data Fabric to run at a higher ratio of ram frequency.