So, to prove I'm not talking out of my ass, I did some experiments.
Firstly, I created an 8GiB file on an ext4 filesystem on my OCZ Vertex 4 128GB (30% free space):
dd if=/dev/zero of=ssdfile.bin oflag=direct bs=1M count=8192
Then I used the fio benchmark to compare sequential and random writes. The configuration of fio is:
[global]
filesize=8192MB
norandommap
bs=4096b
ioengine=sync
thread
direct=1
group_reporting
invalidate=0
[benchmark]
numjobs=1
rw=randwrite
filename=ssdfile.bin
runtime=30
time_based
By alternating rw between randwrite and write I conducted sequential and random write experiments.
Let me describe some key observations about the experiment:
1) it uses direct I/O, so it bypasses all file-system buffers directly into the block layer of linux
2) I use the noop I/O scheduler to not do funny optimizations at the I/O scheduler level
3) it uses 1 thread, meaning there is no opportunity for the system software to merge writes in any layer (File-System, block, SATA).
4) all I/O requests are 4K, and this guarantees that both sequential and random experiments will generate an equal amount of SATA commands per GB of I/O traffic
I did 6 30-second runs alternating random writes and sequential writes in this manner:
write, randwrite, write, randwrite, write, randwrite
The results are:
===============================================================================
write
WRITE: io=4529.2MB, aggrb=154591KB/s, minb=154591KB/s, maxb=154591KB/s, mint=30001msec, maxt=30001msec
===============================================================================
randwrite
WRITE: io=3890.9MB, aggrb=132801KB/s, minb=132801KB/s, maxb=132801KB/s, mint=30001msec, maxt=30001msec
===============================================================================
write
WRITE: io=4476.8MB, aggrb=152800KB/s, minb=152800KB/s, maxb=152800KB/s, mint=30001msec, maxt=30001msec
===============================================================================
randwrite
WRITE: io=4217.7MB, aggrb=143958KB/s, minb=143958KB/s, maxb=143958KB/s, mint=30001msec, maxt=30001msec
===============================================================================
write
WRITE: io=4532.9MB, aggrb=154714KB/s, minb=154714KB/s, maxb=154714KB/s, mint=30001msec, maxt=30001msec
===============================================================================
randwrite
WRITE: io=4229.3MB, aggrb=144351KB/s, minb=144351KB/s, maxb=144351KB/s, mint=30001msec, maxt=30001msec
The bandwidth that matters is aggrb. To summarize, sequential writes are consistently 10MB/s faster, even though sequential and random workloads are alternated in a round robin manner.
Via iostat I can add the observation that for sequential workloads the throughput of the device was rock steady at 150MB/s, something like this:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 2.00 0.00 39032.00 0.00 152.48 8.00 0.69 0.02 0.00 0.02 0.02 69.00
For the random workloads the performance would fluctuate between 150MB/s and 120MB/s.
Now to conclude, there is absolutely no reason for the random workload to be slower than the sequential one. This is a sandboxed experiment, where all parameters are kept in check. The only actual difference between the two workloads are the File-System LBAs. In one case they are sequential, in the other they are random.
Now, I'm not in the mood to kill my SSD, OCZ SSDs are not famous for their lifetime, so I will not do any more write intensive experiments. Ideally I could do raw device (via a partition) experiments and add other parameters to the mix, but that's it for now.
This experiment DOES NOT use TRIM, which in my opinion would increase the difference in performance between a fragmented file-system and a defragmented one.
This experiment DOES NOT use large I/O requests, which will be much more frequent in a defragmented file-system than in a fragmented one, and would again increase the difference in performance.