The crypto is related to SIMD because the entire whitepaper was about the additional SIMD instructions being added which includes the crypto. ARM NEON crytpo only accelerates SHA and AES I believe, SPARC X+/X accelerates AES, DES, 3DES, RSA, SHA and DSA.
You're reading the same paper I did and I'm pretty sure the crypto instructions are not related to SIMD here. It uses an entirely decoupled crypto engine which isn't really tied into the ISA. Not only can any other SoC vendors implement this but they actually have been for mobile platforms for a while now, including Apple.
In one of my earlier posts I indicated that current NEON seemed to have parity with SSE3 with FMA and AES-NI so no NEON isn't just missing an AVX equivalent there's SSSE3, SSE4.1 and SSE4.2.
I don't know where you got this statement about SSE3 parity or what it's supposed to mean. Many of the instructions added in SSSE3 and SSE4.1 do actually have AArch32 NEON equivalents:
PABSB/W/D are equivalent to VABS
PALIGNR is equivalent to VEXT
PSHUFB is basically equivalent to VTBX
PMULHRSW is basically equivalent to VQDMULH.s16
PHADDW/D is equivalent to VPADD
PHADDSW can be synthesized with VPADDL and VQMOVN
MPSADBW has similar functionality to VABA/VABAL with some permutations (it's more horizontal)
PMULDQ is like VMULL.s32 with a different ordering
PMULLD is equivalent VMUL.s32
The BLEND instructions have the same functionality as VBIT/VBIF/VBSL albeit with an immediate form
The PMIN/PMAX instructions are equivalent to VMIN/VMAX
The ROUND instructions are partially the same as VCVT, depending on the round modes needed
INSERT and EXTRACT are equivalent to various VMOV instructions
The sign/zero extraction instructions are partially supported by VMOVL instructions (when performing a single doubling of width)
PACKUSDW is equivalent to VQMOVUN.s64
So that really just leaves a small number of instructions that don't really have equivalents (PSIGN, PMADDUBSW, PHSUB, PHMINPOSUW, DPPS/DPPD, PCMPEQQ, MOVNTDQA) and AFAIK AArch64 has packed 64-bit compares.
SSE4.2 only adds a handful of instructions, mainly CRC32, some fixed length strength instructions, and PCMPGTQ (again should be supported by AArch64)
Meanwhile I can list various instructions in NEON that are not present in SSE4.2 ISA CPUs.
It's questionable how much either set of missing instructions are "needed" for good performance, since a lot of them are very niche and were added to satisfy a small number of developers for some fixed task.
Frankly I think you should just stop commenting on it, it's clear you aren't actually that familiar with either ISA and made statements about them that you are trying to justify after the fact.
Apple makes expensive laptops and desktops used for creation. A lot of these creation type applications (DAW, photo editing, video editing, etc.) use these SIMD instructions. In fact I remember setting up a hackintosh years back and having to use a custom kernel to emulate SSSE3, SSE4.1 and SSE4.2 on my AMD CPU otherwise a lot applications would crash because they relied on these instructions.
Apple has different brands of laptops, not all of them are commonly heavily used for content creation, particularly lower end MBAs. Apple doesn't have to move all or nothing at once.