Should i defregment ssd ?

dave_the_nerd · Jul 24, 2014

BD2003 said:
To me it seems rather simple, although maybe someone can explain where I'm misunderstanding.

You can look at *any* SSD benchmark, it's it's clear that sequential reads are much faster than random. Obviously there's a major difference between a random read and a fragmented file, but regardless, there's a difference there.

I fully understand that its orders of magnitude less of a problem on a SSD, but theoretically, fragmentation leads to lower performance, no?

They're completely different things.

Sequential access:

Computer: Controller, give me blocks 44245 - 44827 please.
SSD: Okay, here you go. *BLEEAARGH*

Random access:

Computer: Controller, give me blocks 1484 please.
SSD: Okay, here you go. *pfft*
Computer: Controller, give me blocks 44827 please.
SSD: Okay, here you go. *pfft*
Computer: Controller, give me blocks 11295 please.
SSD: Okay, here you go. *pfft*
Computer: Controller, give me blocks 90210 please.
SSD: Okay, here you go. *pfft*
Computer: Controller, give me blocks 49583 please.
SSD: Okay, here you go. *pfft*
Computer: Controller, give me blocks 60601 please.
SSD: Okay, here you go. *pfft*
Computer: Controller, give me blocks 48108 please.
SSD: Okay, here you go. *pfft*
Computer: Controller, give me blocks 10201 please.
SSD: Okay, here you go. *pfft*

mv2devnull · Jul 24, 2014

dave_the_nerd said:
Your computer gives everything a block number. On an HDD, those correspond to physical locations on a platter. (Which is why mechanical optimizations like defragging are helpful.)

The SSD shows the block numbers to the computer, and then maps each block number to a particular page of flash memory.

HDD's map block numbers too. They have a small reserve of blocks that they can take into use (remap) when original blocks become unusable. After that some blocks in the sequence are physically less "near" than before. Overall, HDD's do less mapping and such remap is more drastic to performance (particularly, because the reason to remap, a "bad block", might mean that more will follow.)

Regarding the sequential/random on SSD.
Prepare multiple SSD's so that each seem to contain an non-fragmented file and perform the sequential access test on them. The 'prepare' step is that each drive's controller must map the blocks of the file differently; sequential in same memory chip, striped over lines, sequential but in reverse, random, etc. Now the test truly measures the effect of the mapping within the drive and nothing else.

I don't defrag SSD's.

mikeymikec · Jul 24, 2014

_Rick_ said:
mikeymikec, sequential reads are up to 5 times faster than random reads.
I suggest you read my post further up, to understand what the difference is between a fragmented file system, and a defragmented file system.

At this point, the drive geometry matters much less, than for HDDs, but the FS virtual geometry still matters, as long as we linearly address the SSD (never mind how it translates those addresses - this is almost irrelevant, except when writing a FS defragmenter specifically for SSDs, where this internal address translation can be exlpoited to defrag with O(n) (n = number of files) writes to flash outside the addressing table.)

Surely during the process of TRIM'ing the OS's logical view (ie. block information) is updated to better match the SSD's view so that sequential reads are more common?

I'm using words like 'surely' at this point because I don't regard my knowledge of how SSDs work and how operating systems talk to them to be at the level I'd like it to be, so I'm not trying to state facts, I'm just taking what I believe to be true (based on what I've previously read) and reasoning this through logically.

BD2003 · Jul 24, 2014

dave_the_nerd said:
They're completely different things.

Sequential access:

Random access:

Ok, so where does fragmentation fit into that?

BD2003 · Jul 24, 2014

mv2devnull said:
HDD's map block numbers too. They have a small reserve of blocks that they can take into use (remap) when original blocks become unusable. After that some blocks in the sequence are physically less "near" than before. Overall, HDD's do less mapping and such remap is more drastic to performance (particularly, because the reason to remap, a "bad block", might mean that more will follow.)

Regarding the sequential/random on SSD.
Prepare multiple SSD's so that each seem to contain an non-fragmented file and perform the sequential access test on them. The 'prepare' step is that each drive's controller must map the blocks of the file differently; sequential in same memory chip, striped over lines, sequential but in reverse, random, etc. Now the test truly measures the effect of the mapping within the drive and nothing else.

I don't defrag SSD's.

But when a benchmark does a sequential read test, it's going to map however the SSD maps it, but no matter what, sequential is always faster.

So when a benchmark does a sequential read test, does it even follow that the blocks on the SSD are actually contiguous or organized in any way, and that's what leads to the higher performance? Or is it a function of the SSD knowing it's reading a large chunk ahead of time?

So if a file was hopelessly fragmented, completely randomly dispersed over the SSD from both the perspective of the OS and the internal drive mapping - if the OS sends a command to read that file, will I see performance in line with the seq. read speed, or the random read speed?

And is there *any* correlation between what the block view of a defrag program shows, and the internal mapping of an SSD?

dave_the_nerd · Jul 24, 2014

BD2003 said:
Ok, so where does fragmentation fit into that?

That's the thing. On an SSD, it doesn't.

On an HDD, the *BLEARGH* becomes a bunch of *pfft*s because the HD has to go here, then go here, then go there, then go there, to get all the pieces of the file. So that's why everybody has spent the last 30 years pissing themselves about defraying their stuff.

You're mistaking controller latency for fragmentation behavior.

_Rick_ · Jul 24, 2014

dave_the_nerd said:
That's the thing. On an SSD, it doesn't.

On an HDD, the *BLEARGH* becomes a bunch of *pfft*s because the HD has to go here, then go here, then go there, then go there, to get all the pieces of the file. So that's why everybody has spent the last 30 years pissing themselves about defraying their stuff.

You're mistaking controller latency for fragmentation behavior.

On an SSD it does, because the read requests that arrive at the controller will still be a bunch of addresses, and not a contiguous block of addresses.
Hence you lose a factor 5 (at least according to the last benchmarks) in read performance, since you bombard the controller with individual requests, instead of a single one.
Sure, on the flash side of things, speed is the same, but the SSD controller is quite simply slower when in "interactive mode", and this leads to real world performance loss. A factor 5 might not be much, compared to an SSD, but if your log-job runs for 150 minutes instead of 30 minutes, that factor five suddenly has a massive real world impact.

BD2003: very little correlation between hardware addresses and flash cell addressing. Local relationships are probably correct up to page-size, if you're lucky, but once you start using the SSD, and deleting/rewriting, load balancing and GC will mess it up completely. That spare area for example, won't show up as addressable blocks, but it's being cycled around the drive all the time.

As for sequential vs linear from reading a single file: I assume the difference comes through read-look-ahead, which is described here: https://www.mail-archive.com/forum@t13.org/msg02556.html (for the case of mechanical drives, where it's even more of a crucial function)

This would make a seek (i.e. moving to a non-sequential LBA) clean the buffer and restart the read look-ahead at the new position. The performance gains are simply from the controller already having the required block in the buffer, before the OS's command to read it actually arrives over SATA.

mikeymikec: I'm not aware of any OS that knows anything about the underlying hardware. Layering is pretty strictly enforced, and ZFS' approach of breaking the layers isn't gladly seen by kernel devs.
Trim merely means the FS sends the SSD a command, telling it that a certain block can be erased, instead of containing just zeros, or persisting as unlinked memory (disabling trim means you can undelete data, enabling trim means that anything once deleted is probably gone forever very soon).
No FS I know of will do anything beyond that, and even cannot do anything beyond that, since it does not know what the SSD will do with the command. Sometimes TRIM is scheduled for deferred execution, all of which is up to the "hardware" end. The FS merely sees a block device with X blocks of LBA, and has it's inode trees to know on which LBA it placed which bit of which file.
Good FS will try to avoid fragmentation by using a certain way of filling the free space in their LBA allotment (i.e. best fit, largest free space, instead of just using the first free LBA available and mercilessly fragmenting everything), but there's no magic beyond that, that I know of.

BD2003 · Jul 24, 2014

_Rick_ said:
As for sequential vs linear from reading a single file: I assume the difference comes through read-look-ahead, which is described here: https://www.mail-archive.com/forum@t13.org/msg02556.html (for the case of mechanical drives, where it's even more of a crucial function)

This would make a seek (i.e. moving to a non-sequential LBA) clean the buffer and restart the read look-ahead at the new position. The performance gains are simply from the controller already having the required block in the buffer, before the OS's command to read it actually arrives over SATA.

Isn't that inconsistent with the idea that a sequential read is a single command? Isn't look ahead there for the purpose of buffering blocks that haven't been requested? I don't see how that plays into a scenario where it's already been requested.

BoberFett · Jul 24, 2014

mikeymikec said:
Let's try explaining this another way.

A hard disk's method of retrieving files is physical / manual. A head has to be physically moved to the correct location on the platter (as well as the platter being in the right position for the head to read the data), kind of like a librarian sitting in a library. Ask the librarian to find a book and they'll walk to the correct location, pick the book up, come back to the front desk and give it to you.

Now let's say there's a big file, or in the librarian example, you're asking for the entire series of a particular encyclopedia. On a recently defragmented disk, or in a well-organised library, the contents of that encyclopedia (or large file) will be located in the same place (continuous block), so the librarian can just go to one location in the library with a trolley and pick up all the volumes of that encyclopedia in one go (and so the head does one data seek, finds it and reads it), rather than having to walk around the library because the library is untidy (or disk is fragmented, so the head is having to make lots of journeys). As for the disk, its read speed goes way up for a large file because it doesn't have to spend time seeking for bits all over the platter.

Defragmentation (or tidying the library) is good so that time isn't wasted seeking all the bits of a large file.

SSDs do not have a drive head. There is no movement inside the SSD. If a file has been splurged in a thousand fragments across the SSD, the retrieval time is the same as if the file was saved in a single fragment; the SSD still has to request the portions of the file in just the same way.

Defragmenting an SSD is just as pointless as hypothetically defragmenting RAM or the cache levels in the processor, except with an SSD there is a limited amount of writes it can do in its lifetime, and you're wasting those writes in an utterly useless exercise.

This. Defragging an SSD has as close to zero affect on performance as you can get.

_Rick_ · Jul 24, 2014

BD2003 said:
Isn't that inconsistent with the idea that a sequential read is a single command? Isn't look ahead there for the purpose of buffering blocks that haven't been requested? I don't see how that plays into a scenario where it's already been requested.

My cursory look over the ATA specs didn't reveal to me whether there's a sequential read command, or whether it's simply read-ahead working.
If there is, this still doesn't change much, since it's virtually the same effect - you'd be giving a linear sequence of LBA for a multi-sector read, I would assume.

BD2003 · Jul 24, 2014

Alright, let me try and rephrase this then...

Is the reason the ran/seq speeds differ so much a hardware, or a software issue? Is it because of the way the OS is sending commands to the drive, or because the drive has to work harder to pull data from random blocks?

BrightCandle · Jul 24, 2014

Its both a hardware issue and a software issue.

Hardware
--------
By requesting each block individually you swamp the cable with individual requests, and there is a lot of latency in the SATA protocol that mean that isn't brilliantly efficient. Its much more efficient to do a nice big DMA transfer as the channel can just send data. If you go have a detailed read of the SATA protocol you'll see there is quite a difference with how individual block verses sequential reads are setup.

Software
---------
The program that is measuring the speed has to talk through the operating system. To do a sequential read it sets up an InputStream with the operating system and then proceeds to read from the same stream continuously, simply just calling next with the same buffer over and over, probably something like 4k although maybe its megabytes in size.

However for a 4k read gets a random access file and then does seeks before calling reads with a smaller 4k buffer. So it seeks then reads and seeks and reads and has to keep going through that cycle all the way through the operating system.

The combination of the two dramatically decreases performance, and you can see a dramatic uptick with a queue depth of 32 which is effective making lots of calls in parallel.

BD2003 · Jul 24, 2014

BrightCandle said:
Its both a hardware issue and a software issue.

Hardware
--------
By requesting each block individually you swamp the cable with individual requests, and there is a lot of latency in the SATA protocol that mean that isn't brilliantly efficient. Its much more efficient to do a nice big DMA transfer as the channel can just send data. If you go have a detailed read of the SATA protocol you'll see there is quite a difference with how individual block verses sequential reads are setup.

Software
---------
The program that is measuring the speed has to talk through the operating system. To do a sequential read it sets up an InputStream with the operating system and then proceeds to read from the same stream continuously, simply just calling next with the same buffer over and over, probably something like 4k although maybe its megabytes in size.

However for a 4k read gets a random access file and then does seeks before calling reads with a smaller 4k buffer. So it seeks then reads and seeks and reads and has to keep going through that cycle all the way through the operating system.

The combination of the two dramatically decreases performance, and you can see a dramatic uptick with a queue depth of 32 which is effective making lots of calls in parallel.

Alright, so then most of these hangups are still at the protocol/controller level?

So then theoretically, if there was no SATA overheard, and the controller had infinite performance, would random and sequential performance be equal? Or does the organization of the data in the storage medium itself play a role, like with HDDs?

BrightCandle · Jul 24, 2014

The storage in the media for an ssd doesn't currently impact it. But improving the sata interface also isn't sufficient because there is a lot of latency associated with basic 4k reads. You have to consider that the app makes a random seek, then requests a 4k buffer from the os. The os goes into the driver and that goes down to the device and then we have to wait a eternity (for a CPU) for that to come back slowly across the wire into ram, interrupt the CPU and then finally the os can read it from the driver, pass it back up the stack and copy into the heap of the application. That has to complete before the next one starts, so there is just no way that is going to be as efficient as telling the data device a lot less times and set up a DMA transfer, because a lot of the latency disappears when its streamed as its only app to OS on an existing channel.

So its not fair to say its sata/protocol overhead, it is a software and hardware issue combined. Replacing SATA with nvm isn't going to solve, neither would putting ram in place of the ssd. You need to change a lot of things to improve it, any one of the aspects can likely be improved.

4k qd32 is a lot closer too sequential already, within about 1/2 of sequential speed. So already we can see the throughput is possible on 4k transfers but it has to be parallel. Its still less efficient to read into the ssd from 32 different random access setups than a single sequential transfer, but it's OS and protocol mostly and not the ssd device.

MongGrel · Jul 24, 2014

Just no, works more or less.

MagnusTheBrewer · Jul 24, 2014

BD2003 said:
And yet for the same reason that people buy fast ram, I'd like to keep my SSD operating as close as possible to it's theoretical max performance. I'm not particularly concerned about SSD life, I think it's a problem that's blown way out of proportion. A defrag every now and then isn't going to destroy it.

Honestly, it's more of a theoretical thing, more of a "why not?" kind of thing, not something to obsess over.

It is a theoretical thing or matter of perspective. I'm pragmatic and believe the interesting part of technology is what you can do with it. I have little interest or patience with those who strive to match actual performance with the math describing it.

_Rick_ · Jul 24, 2014

I wonder how much of the 4k latency hits inside the SSD, and how much outside.
I suspect that some of the remaining performance difference even when "spamming" the bus with parallel requests (i.e. high qd) comes from the cache size of the controller (talking L2 equivalent here). Since if the controller has to send for the LBA-to-local address mapping from DRAM or flash, there'll be another delay in the chain. I also wonder how much SRAM these controllers generally carry on board, and whether anyone does anything else but linearly pre-streaming the next LBA in a classic read-ahead scenario.

But yeah, DMA is definitely where a lot of the magic happens when you want big throughput. When the SATA controller can write directly into an assigned memory space, and just waits for the data from the disk, things are as good as they'll get. I'd expect a fragmented file would open one DMA "session" per fragment - the FS shouldn't fall back to block-operation unless specifically asked for -- I'd imagine this is only important for synchronized concurrent R/W access (yup, DBs, unbuffered/synced logs, that kind of thing).
Of course, a good I/O subsystem will try to get a lot of reads from cache, but that only helps to read non-dirty LBA's that hit the I/O subsys frequently enough.

I think the gist of it remains: don't defragment SSDs; but if you're aware of fragmentation negatively impacting you, design around it, using partitions, placeholders or independent disks. Workloads which could lead to dramatic fragmentation are imaginable (and therefore, probably real...)

BD2003 · Jul 24, 2014

_Rick_ said:
I wonder how much of the 4k latency hits inside the SSD, and how much outside.

I suspect that some of the remaining performance difference even when "spamming" the bus with parallel requests (i.e. high qd) comes from the cache size of the controller (talking L2 equivalent here). Since if the controller has to send for the LBA-to-local address mapping from DRAM or flash, there'll be another delay in the chain. I also wonder how much SRAM these controllers generally carry on board, and whether anyone does anything else but linearly pre-streaming the next LBA in a classic read-ahead scenario.

But yeah, DMA is definitely where a lot of the magic happens when you want big throughput. When the SATA controller can write directly into an assigned memory space, and just waits for the data from the disk, things are as good as they'll get. I'd expect a fragmented file would open one DMA "session" per fragment - the FS shouldn't fall back to block-operation unless specifically asked for -- I'd imagine this is only important for synchronized concurrent R/W access (yup, DBs, unbuffered/synced logs, that kind of thing).

Of course, a good I/O subsystem will try to get a lot of reads from cache, but that only helps to read non-dirty LBA's that hit the I/O subsys frequently enough.

I think the gist of it remains: don't defragment SSDs; but if you're aware of fragmentation negatively impacting you, design around it, using partitions, placeholders or independent disks. Workloads which could lead to dramatic fragmentation are imaginable (and therefore, probably real...)

Heh, to be honest, now I'm far more interested in how SSDs work than whether I should defrag them. Up until now I've thought of them like HDDs with super fast seek times, but clearly they're a different beast entirely.

To what extent can a SSD read in parallel? Leaving the bus out of it, just straight to it's internal buffer. Can it access each NAND chip at the same time, maybe even multiple parts of the same chip at the same time? Or is it ultimately still a serial device like a HDD?

Elixer · Jul 24, 2014

BD2003 said:
To what extent can a SSD read in parallel? Leaving the bus out of it, just straight to it's internal buffer. Can it access each NAND chip at the same time, maybe even multiple parts of the same chip at the same time? Or is it ultimately still a serial device like a HDD?

Usually, (depending on the controller & NAND specs) the more NAND chips, the faster the unit can be. That is why you see speed advantages from total storage of 128GB to 512GB for example. If they use the same kind of chips, then the 512GB version has more of them, and thus, it can use more chips to read/write from/to.

BrightCandle · Jul 25, 2014

The bigger drives all use 8 channels. But most of the data shows they continue to improve in performance all the way up to a queue depth of 32, which is 32 outstanding IO requests at 4k to the device. But its definitely flattening out at that point and there wouldn't be much value going to 64, not with the current software and hardware we have. But what I wonder about that is more around whether there is genuinely 32 requests to the SSD at the same time or are some of them sat in the OS and hence what is the real parallel request count to the device itself. Its at most 32 and its at least 8 (because the performance is a good 8x on most SSDs for high QD 4k reads, if not more).

The other thing here to note is there is certainly a difference in performance between the SSDs, even at 4k. So while the OS and everything else is in play there is definitely a significant amount of time waiting on the SSD, and the make/model of SSD definitely makes a difference. None of them so far has blown away the others at that size, they are tending towards a particular number which is probably what the CPU can do in terms of number of IO transactions. They aren't quite there yet but presumably they will get there with M.2 and PCI-E based SSDs.

rci2990 · Jul 25, 2014

just so you folks realize op has been trolling multiple threads.

mv2devnull · Jul 25, 2014

Enterprise storage vendors stress IOPS over bandwidth and volume and like to show numbers like these: http://www.8088.net/blog/index.php/...e-drive-ssd-15000-7200-5900-5600-rpm/?lang=en

(Naturally, the 100 IOPS that I measure from my SATA drive have a much smaller price tag than their primarily SSD-based boxes that boast 100k IOPS.)

dave_the_nerd · Jul 25, 2014

IOPS are most important for things like big eCommerce databases. If you're doing a small SAN for a bunch of video editing PCs, sequential is still more important.

It just depends on the application.

Makaveli · Jul 25, 2014

BonzaiDuck said:
Go into the Properties page of the SSD volume. Make sure "scheduled Defragmentation" is turned off.

If he is on windows 7 it should have already detected the SSD and turned it off.

Should i defregment ssd ?

Lifer

Golden Member

Lifer

Lifer

Lifer

Lifer

Diamond Member

Lifer

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

IN MEMORIAM

Diamond Member

Lifer

Lifer

Diamond Member

Member

Golden Member

Lifer

Diamond Member