New File System

SagaLore

Elite Member
Dec 18, 2001
24,036
21
81
Many years ago I was working on a project with a few other guys, we were building a new (theoretical) operating system. My specialty was in the file system and GUInterface. Fun fun...

One of the conclusions I came up with was to borrow some of the features of OS2's HPFS. I think it is important that the FS always saves files in a contiguous stream (no fragmentation). Also, instead of linking each block after another, the beginning of the file has a block index.

So a few days ago I was thinking about file systems again... probably because my CISSP study materials mentioned NFS and I've been defragmentting laptops.

I'm now thinking that since my FS uses contiguous files, you don't need to index each block - you just need to know the size of the file. Instead, I would include a parity index, so corrupted blocks could potentially be recovered. Additionally, a checksum of each file would be kept, to indicate any kind of corruption or tampering right away.

My newest idea is how files are treated. Our file systems today have fixed block sizes. What if, the fs had flexible block sizes? I'm visualizing a drive - at the beginning of the drive executable type files are kept, and have very small block sizes. They don't leave much room for change, because they should never be changed often. On the other end, is data type files. The blocks are extremely large, and binary is saved in reverse. So if you have a log file or document, it grows towards the beginning of the fs.

Discuss.
 

alienal99

Member
Nov 9, 2004
153
0
0
i am thinking of a drive like the one you have described. For some reason, when it was made, people put data at opposite ends of the platter, making it ridiculously slow due to head movement between executables and data. Besides that, it would be an interesting idea to have changing block sizes....keep working on it!
 

Matthias99

Diamond Member
Oct 7, 2003
8,808
0
0
Originally posted by: SagaLore
I'm now thinking that since my FS uses contiguous files, you don't need to index each block - you just need to know the size of the file. Instead, I would include a parity index, so corrupted blocks could potentially be recovered. Additionally, a checksum of each file would be kept, to indicate any kind of corruption or tampering right away.

How do you handle fragmentation? Also, any application that actually cares about data integrity will do its own checksumming, so a filesystem-level checksum will be redundant in a lot of cases.

My newest idea is how files are treated. Our file systems today have fixed block sizes. What if, the fs had flexible block sizes? I'm visualizing a drive - at the beginning of the drive executable type files are kept, and have very small block sizes. They don't leave much room for change, because they should never be changed often. On the other end, is data type files. The blocks are extremely large, and binary is saved in reverse. So if you have a log file or document, it grows towards the beginning of the fs.

Discuss.

Ugh. You could do it, but I doubt it would be worth the hassle. You'd have to index it or something to not have horrible performance when trying to figure out where a particular block is on disk, since it's no longer a linear relationship.

Also, some 'data' files are going to be quite small; if you use 'extremely large' blocks, you could be wasting a lot of space. It seems like it would make more sense to create multiple partitions with (different) fixed block sizes.

Incidentally, mainframe filesystems do something sort of like this -- essentially, each 'file' (called a 'record') can be a different size (within some limitations), and you can define devices with various record types/sizes. However, except for a few special device types, all the records in one device use the same format.
 

SagaLore

Elite Member
Dec 18, 2001
24,036
21
81
Originally posted by: alienal99
i am thinking of a drive like the one you have described. For some reason, when it was made, people put data at opposite ends of the platter, making it ridiculously slow due to head movement between executables and data.

That sounds like an OS problem, not an inherently bad file structure. You should fully load your executable into memory, then access the data.

The purpose of contiguous and reverse stream data is to prevent as much head movement as possible. If all of the blocks in a file change and need saved, then it can grow inward and write it all at once. It can also make intelligent choices for data growth - most likely a data file has other data files in front and behind it. So if the one in front of it is smaller, and it only needs to change a relatively small amount of blocks, then it makes more sense to move the other data file elsewhere and then grow the file where its at.
 

SagaLore

Elite Member
Dec 18, 2001
24,036
21
81
Originally posted by: Matthias99
How do you handle fragmentation? Also, any application that actually cares about data integrity will do its own checksumming, so a filesystem-level checksum will be redundant in a lot of cases.

There is no fragmentation. The files are contiguous.

The FS cares about integrity, therefore it will monitor it with checksums. I think it is better than having a block link to another block which links to another block which links to another block... in newer file systems, the link occurs forwards and backwards for integrity reasons. This is in case a block is corrupted or overwritten - although that could be detrimental to the data as a whole, at least the FS can rebuild its table and pinpoint the bad block(s). However, a contiguous file doesn't need this overhead.
 

sao123

Lifer
May 27, 2002
12,650
203
106
a pictures would be worth eleventy billion words...
diagram how it works for us slower folks.
 

Matthias99

Diamond Member
Oct 7, 2003
8,808
0
0
Originally posted by: SagaLore
Originally posted by: Matthias99
How do you handle fragmentation? Also, any application that actually cares about data integrity will do its own checksumming, so a filesystem-level checksum will be redundant in a lot of cases.

There is no fragmentation. The files are contiguous.

Are they also static? If so, it's not a problem, but filesystems for static data are generally not a very interesting area of research.

Let's say the user fills the 'data' area with 100MB files, then deletes every other one. The user now wants to create a 200MB file. The biggest free contiguous area on your disk is now only 100MB, although you have plenty of free space. Whoops.

The FS cares about integrity, therefore it will monitor it with checksums. I think it is better than having a block link to another block which links to another block which links to another block... in newer file systems, the link occurs forwards and backwards for integrity reasons. This is in case a block is corrupted or overwritten - although that could be detrimental to the data as a whole, at least the FS can rebuild its table and pinpoint the bad block(s). However, a contiguous file doesn't need this overhead.

Nothing wrong with contiguous files per se (other than issues with fragmentation), or putting checksums on the filesystem data itself (if that's what you meant). It sounded like you wanted to put checksums on the files themselves, which is going to be redundant in many cases, and all applications will have to pay for the overhead even if they don't really need it.
 

SagaLore

Elite Member
Dec 18, 2001
24,036
21
81
Originally posted by: Matthias99
Originally posted by: SagaLore
Originally posted by: Matthias99
How do you handle fragmentation? Also, any application that actually cares about data integrity will do its own checksumming, so a filesystem-level checksum will be redundant in a lot of cases.

There is no fragmentation. The files are contiguous.

Are they also static? If so, it's not a problem, but filesystems for static data are generally not a very interesting area of research.

Let's say the user fills the 'data' area with 100MB files, then deletes every other one. The user now wants to create a 200MB file. The biggest free contiguous area on your disk is now only 100MB, although you have plenty of free space. Whoops.

That is solved with a defragmenter application (which btw will work better than typical fs defragmenters since all it has to do is move contiguous streams of data to get back large tracks of free space).

Originally posted by: sao123
a pictures would be worth eleventy billion words...
diagram how it works for us slower folks.

Okay, I'll try and draw something up tonight.
 

mjia

Member
Oct 8, 2004
94
0
0
Originally posted by: SagaLore
That is solved with a defragmenter application (which btw will work better than typical fs defragmenters since all it has to do is move contiguous streams of data to get back large tracks of free space).

Regardless, there will always be a large performance hit, having to defragment frequently. It will also increase the wear on the drive.

Algorithms are used to place files such that fragmentation is minimized.

The performance impact of minor fragmentation is usually pretty minimal.

One interesting file system is the Elephant file system, the file system that never forgets. Basically, i keeps previous/delete versions of a file so you can revert back. They argued that the performance was still competitive. I don't think anyone really uses it though, but I think it's an interesting idea.
 

Genx87

Lifer
Apr 8, 2002
41,091
513
126
One interesting file system is the Elephant file system, the file system that never forgets. Basically, i keeps previous/delete versions of a file so you can revert back. They argued that the performance was still competitive. I don't think anyone really uses it though, but I think it's an interesting idea.

How does that work when your write over the area of the disk the deleted file was stored?
 

SagaLore

Elite Member
Dec 18, 2001
24,036
21
81
Originally posted by: Genx87
One interesting file system is the Elephant file system, the file system that never forgets. Basically, i keeps previous/delete versions of a file so you can revert back. They argued that the performance was still competitive. I don't think anyone really uses it though, but I think it's an interesting idea.

How does that work when your write over the area of the disk the deleted file was stored?

In sounds like versioning. It saves each changed file as many times as possible, until there is no more room. Then as new versions are saved, they overwrite the oldest versions. Kind of like circular logging.
 

sao123

Lifer
May 27, 2002
12,650
203
106
Originally posted by: SagaLore
Originally posted by: Genx87
One interesting file system is the Elephant file system, the file system that never forgets. Basically, i keeps previous/delete versions of a file so you can revert back. They argued that the performance was still competitive. I don't think anyone really uses it though, but I think it's an interesting idea.

How does that work when your write over the area of the disk the deleted file was stored?

In sounds like versioning. It saves each changed file as many times as possible, until there is no more room. Then as new versions are saved, they overwrite the oldest versions. Kind of like circular logging.



But if your files dont fragment... eventually your freespace does right? (which means lots of wasted space) Thats really why im interested in the diagram.
 

SagaLore

Elite Member
Dec 18, 2001
24,036
21
81
Originally posted by: sao123
But if your files dont fragment... eventually your freespace does right? (which means lots of wasted space) Thats really why im interested in the diagram.

Sure. But recovering the free space is easy to do when the files are contiguous. Partial defrag can occur when cpu cycles and/or hard drive IO is low.

Here is the diagram.
 

sao123

Lifer
May 27, 2002
12,650
203
106
Originally posted by: SagaLore
Originally posted by: sao123
But if your files dont fragment... eventually your freespace does right? (which means lots of wasted space) Thats really why im interested in the diagram.

Sure. But recovering the free space is easy to do when the files are contiguous. Partial defrag can occur when cpu cycles and/or hard drive IO is low.

Here is the diagram.


How do you handle a file which is bigger than your leftover fragments, but not larger than your remaining space? immediately defrag?
 

Matthias99

Diamond Member
Oct 7, 2003
8,808
0
0
Originally posted by: SagaLore
Originally posted by: Matthias99
Originally posted by: SagaLore
Originally posted by: Matthias99
How do you handle fragmentation? Also, any application that actually cares about data integrity will do its own checksumming, so a filesystem-level checksum will be redundant in a lot of cases.

There is no fragmentation. The files are contiguous.

Are they also static? If so, it's not a problem, but filesystems for static data are generally not a very interesting area of research.

Let's say the user fills the 'data' area with 100MB files, then deletes every other one. The user now wants to create a 200MB file. The biggest free contiguous area on your disk is now only 100MB, although you have plenty of free space. Whoops.

That is solved with a defragmenter application (which btw will work better than typical fs defragmenters since all it has to do is move contiguous streams of data to get back large tracks of free space).

But, in that case, you have to tell the user 'please run your defragmenter program first; the filesystem is incapable of handling your request right now'.

Even if you have it running as a daemon in the background all the time, you'll run into situations where it just can't keep up with frequent requests to delete/create -- or resize -- files. If you defrag your files so they're packed efficiently together, any request to increase the size of a data file will force you to immediately move that file somewhere else -- and also create a hole in your nicely defragmented data set.

Basically, I don't think contiguous files are a good choice for frequently-changing data. For largely static data, it has some advantages in terms of overhead.
 

SagaLore

Elite Member
Dec 18, 2001
24,036
21
81
Originally posted by: sao123
How do you handle a file which is bigger than your leftover fragments, but not larger than your remaining space? immediately defrag?

Sure, you would have to. If the system really can't find a place to put the file, then it has to move some other files out of the way to make up space - doesn't have to be a full defrag. But, really the system should already be proactively cleaning up the empty space.

These really aren't problems that I have to work out. This has already been done by HPFS.

The point of this thread is to discuss the method of putting exec files in the front and data in the back, and the flexible sizes of the blocks.
 

SagaLore

Elite Member
Dec 18, 2001
24,036
21
81
Originally posted by: Matthias99
But, in that case, you have to tell the user 'please run your defragmenter program first; the filesystem is incapable of handling your request right now'.

If you take the same drive, let's say it is 80GB, and it is filled with just enough remaining room for the file - it is already a huge performance hit to fragment the data in the leftover spaces then try to read from it again. By then, the IO is already very sluggish because of all the fragmentation which is inevitable and almost irreversible because there isn't enough free space to defrag. My opinion is that taking a moment to push a few files together to open up enough free space for a new file is still better performance than current implementations.

Even if you have it running as a daemon in the background all the time, you'll run into situations where it just can't keep up with frequent requests to delete/create -- or resize -- files. If you defrag your files so they're packed efficiently together, any request to increase the size of a data file will force you to immediately move that file somewhere else -- and also create a hole in your nicely defragmented data set.

Basically, I don't think contiguous files are a good choice for frequently-changing data. For largely static data, it has some advantages in terms of overhead.

You make good points which definitely need to be taken into consideration. I think the way HPFS handles it, is it does what it can to proactively keep the file system non-fragmented, but then will fragment the data as a last resort. I'm thinking - with the flexible block sizes and front-end classification appraoch, the exec files could be kept contiguous since they don't change or resize very often, but the data files could fragment first and clean themselves up later.

A good programmer would use this to their advantage for log files. If you know your log file is only suppose to be 4mb in size, it could be flagged as an exec file and filled with zeroes to start. This log file would be closer to the system files and contiguous.
 

CrispyFried

Golden Member
May 3, 2005
1,122
0
0
back in the dark ages I used a dec pdp-11 with 8" disks that couldnt handle fragmented files; there was a util named "squish" that rearranged the files so all unused space was at one end. it was very time consuming.. sorry no help in this post, just fwiw and a bit of history..

edit: oops that was probably a pdp 8, not 11 and it was an os8 command.
 

mjia

Member
Oct 8, 2004
94
0
0
Originally posted by: Genx87
One interesting file system is the Elephant file system, the file system that never forgets. Basically, i keeps previous/delete versions of a file so you can revert back. They argued that the performance was still competitive. I don't think anyone really uses it though, but I think it's an interesting idea.

How does that work when your write over the area of the disk the deleted file was stored?

You can find more info if you google it. Here is the acutal paper proposing the idea: http://www.cs.ubc.ca/spider/feeley/papers/1999-1.pdf

(BTW, I heard of this because I am also studying at UBC)
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |