You need to understand that:
a) A benchmark does not claim to use the most optimal algorithm for a given problem. It simply does not need to.
b) An optimal implementation of DGEMM on i7 4770 would use AVX most likely via Intel Math Kernel Libraries. Geekbench deliberately chose not to use NEON/SSE/AVX extensions.
As you might be able to see, your argument is void.
a) No it doesn't. But it should give a "realistic" representation. That means it should be somewhere near of a decent implemenation. If it would be at 120GFlops which is still low I wouldn't mind, but 20GFlops is just pure shit.
This is not a benchmark that measures that capabilities of a CPU but a benchmark that measures how poor the programming skills of the primlabs-employees are.
b) They use AES-instructions in their AES benchmark, they use SHA instructions in their SHA benchmark. The use special instructions there where they favour ARM. In the benchmarks where the favour x86 because of the wider SIMD- and FMA-units, they don't use them.
They don't include benchmarks that would favor x86 like a high quality random number generator, where x86 has special instructions.
You could argue that apps written by average programmers wouldn't contain that much SIMD-instructions at all and that would be true. But the algorithms they use in Geekbench like bzip2-, jpeg-, png-compression and -decompression, sobel, dijkstra, mandelbrot, sharpen filter, blur filter, sgemm, dgemm, sfft, dfft, n-body and raytrace are all algorithms where high quality implementations exist. These are also algorithms normal programmers don't implement theirselves, but instead use libraries which do support AVX & co. Many of these algorithms are not even used on mobiles devices, but only on desktop, server and hpc machines, i.e. raytracing.
c) The memory performance section is pure shit at all. Every benchmark in this section does the same - measure maximum memory bandwith. It doesn't matter if you add this numbers or scale them. The ALU is just not the bottleneck. And you don't have to be a genius to know that.
There are dozens of reviews out there that tested the performance impact of DDR4 in comparison of DDR3 and quad channel in comparison to dual channel. The result is always the same. Increasing the memory bandwith by a large amount (high double digit) increases the performance of real-world-application only by a low single digit.
In other words - maximum memory bandwith is complete irrelevant for real world applications (except for games, but we are testing CPU performance).
But it still contributes heavenly to the total score. Again in favour for small mobile chips where the difference between ALU-througput and memory bandwith is not as large as in bigger cpus.