Floating point exist as a module right now as fast as any of the current application need, it is just there hasn't been really any demand for FPU performance in ARM applications. Most of the ARM chips have a DSP core to handle task that are normally used for FPU like signal processing or video decoding.
That is the coolest part of ARM. If I want a chip with just basic cpu processing power, I can get that. If I want one with the basics and Java in hardware , I can get that too. It is like a set of lego blocks where you can design the hardware to have the features you want.
That's one of the interesting things about the ARM ecosystem, particularly compared to x86. When you need AES to go fast on ARM system, you slap a little DMA-capable state machine next to your core. When you need AES to go fast on x86, you add AES instructions and then run AES on your branch-predicting, out-of-order core with a complex load-store unit which shuffles data through multiple levels of caches. I can't imagine x86 gets anywhere close to an external accelerator when it comes to energy efficiency. Plus, you're tying up the core doing crypto work when it could be used for something else. Does Intel put accelerators on Atom SOCs to improve energy efficiency even when they compete with features of x86? I could imagine Intel preferring to keep as much as possible in the instruction set of the "main" core to make it harder to switch away from x86.