So yeah, im beyond the complete saturation point in SATA 6G interface with 4 SSD's in R0.
If i want faster i need to go via nVME or use a SAS 12G dedicated controller card.
It really depends on a few other things.
As it sits right now MSFT / R0 probably isnt' going to be the fastest anyway with how Windows handles files.
The next thing that's potentially slowing things down is moving data through the DMI and competing with other devices passing data at the same time.
A HBA would bypass this issue with the DMI bottleneck if you put it into a CPU based slot. Though you could go controller based as well but, then that becomes a bottleneck as well sometimes depending on how may drives you connect to it.
I run a R10 on Linux though w/ 5 spinners 4+1 hot spare and get ~450MB/s out of them. I didn't test them in R0 but, they should be able to hit ~1125MB/s if the theory carries based on the disk controller throughput. I did at one point play with the idea of bouncing them to a PCIE card instead of the the onboard ports but, never got the card working properly and sent it back to CN. I've thought about going HBA as well but, don't really have the need as I'm taking the "if it isn't broke, don't fix it" approach. I break things plenty though when I get bored and think of things to tinker with.
Now, if you're using AMD CPU based systems there are some interesting options that are a lot cheaper than Intel due to being able to bifurcate the slots w/o a PLX controller being needed. You can get a card for ~$100 for 4 X NVME drives and split the top X16 into x4x4x4x4 and run them at full speed vs the Intel options that cost ~$600 for the card with the PLX on it and then bottleneck things to 3-6GB/s vs full speed on AMD. The issue with this for most is their precious GPU wants to take the top slot for CPU direct bandwidth though in most cases putting it in the 2nd PCIE slot would be just fine for bandwidth.