Another thing I'm curious about: Does the application program have to be written in a way to take advantage of dual or quad channel RAM, or is this always done by the cpu regardless of what software is running?
The reason I ask is that I've never noticed much difference when I set up the RAM modules to run in dual channel .vs. single. Maybe I did something wrong though.
It's not so much that the application has to be written differently; more like the amount of data you're trying to process at a given time needs to be too large to fit in the CPUs onboard cache. Because accessing main memory results in the CPU sitting idle so long, most programmers try to avoid that scenario. As a result, most desktop applications (Windows, Office application, Games, etc.) aren't really limited by memory bandwidth.
Some applications, like Photoshop, do routinely process 100+ MBs of data at a time, but they're also doing a lot of CPU work at the same time, so it would be hard to catch the application being bottlenecked by memory bandwidth.
Theoretically, a memory limited application would be something that is performing a computationally easy task on a very large amount of data. I worked supporting data science efforts at my last gig, and those applications (processing multi-GB datasets) ARE very much limited by memory bandwidth. Going back to pre-"Big Data" for a real-world memory-limited application, database servers were always a big one.