ASM isn't important when it comes to performance. It isn't even the 10th step when you are looking at optimizing something.
I fully agree that for 99% of the applications, 99% of the time, ASM is not the solution. Lots of pain, lots of bugs, and the overall speedup may not be "enough".
Don't believe me? Then look at how little ASM is in performance critical things such as V8, Spidermonkey, the linux kernel, most VMs. Yet these things are constantly seeing performance improvements and gains.
The fact that these see performance improvements tells us nothing about how much ASM would have mattered? If some JavaScript app sees a 2x year-over-year performance improvement, does that mean that a low-level could not be 10x or 20x faster?
x264 has ASM for some platforms. Are you suggesting that the developers are wasting their time? If not, then we seem to be in agreement: ASM _can_ be the right thing in a specific scenario, but usually it is not.
http://git.videolan.org/?p=x264.git;a=history;f=common/x86;hb=HEAD
ffmpeg seems to have some ASM:
http://git.videolan.org/?p=ffmpeg.g...b157e6d4f69a70148a47071fc0b34d155f216;hb=HEAD
I don't know much about the applications that you are listing. It may well be that implementing them in e.g. C makes them "fast enough" and/or "about as fast as hardware allows". It is also possible that Linux kernel is willing to take some unknown performance hit in order to improve security, stability, recruit developers or some other nice-to-have that makes ASM a bad choice.
These are things that have performance at the top of their priority lists, yet they don't write things in ASM. Why? Because the gains are minimal/nonexistent for most application logic.
I think that is a hasty conclusion. I think that the (possible) lack of ASM in those applications is a reflection that:
1. Programmer time is expensive. Do any ASM, and the development cost increases.
2. Being able to run the same code across platforms is a neat way to have more customers.
3. Customers don't like their application crashing, lacking features or being 3 years late to market.
4. Many/most applications don't have a localized hotspot where the equivalent of 100 lines of C code takes >90% of the computation time
5. User satisfaction might scale somewhat lineary with execution time (within limits). I.e. if a operation takes 50% longer in Excel, the user would only be somewhat less happy. If you are a pace-maker customer, then being 50% late with a heart-beat should make you a grumpy customer.
The (possible) speedup of ASM (or more generally: optimization techniques such as intrinsics/pragmas/compiler switches/choice of compiler/...) would have to be pitted up against 1-4.
On top of that, it excludes the application from doing compiler optimizations on the ASM block.
Well, if it is faster then it is faster. If it aint, then it aint.
-k