Build or buy a NAS?

hasu · Mar 20, 2015

Will regular SAMBA sharing work with ZFS filesystem? Do we need to create new mount points for that to work? I noticed that ZFS has its own SAMBA sharing that it did not work like I wanted.

Emulex · Mar 20, 2015

does ZFS hold NTFS ACL's through samba?

CiPHER · Mar 20, 2015

@hasu: most people i help with use ZFS as simple NAS for their Windows/Mac based workstations, and just use Samba to get an X: drive letter acting as network drive. In other words; simple to use for Windows/Mac users!

ZFS doesn't have its own Samba sharing; but there is Solaris which has a kernel-level CIFS service that is tied to the 'sharesmb' ZFS attribute. But ZFS itself doesn't do anything with that.

On Linux/BSD there is only Samba. And Samba works on ZFS just as well it works on other filesystems. Even better in some cases.

@Emulex: ZFS supports NFSv4 ACL's. I'm not certain whether those work with ACLs from NTFS. But Samba can be tuned in many ways, for example to provide 'previous versions' to work with ZFS snapshots, or the vfs objects to implement a trash bin on the network drive. Probably windows ACLs would work too with some configuration. I never tried it.

hasu · Mar 20, 2015

Ok... this is what I am trying to achieve. Do you think it is possible or have any other comments?

Code:

zfs create mypool/KVM
zfs create mypool/NAS
zfs create mypool/NAS/Users
zfs create mypool/NAS/Media
zfs create mypool/System

mkdir mypool/NAS/Users/UserA
mkdir mypool/NAS/Users/UserB
mkdir mypool/NAS/Users/UserC
mkdir mypool/NAS/Media/Kids
mkdir mypool/NAS/Media/General

> KVM is to host disk images that are accessed locally
> Every child filesystem under NAS OR directories within will be shared using samba.
> "Users" is created in a separate file system to support more frequent snapshots and backups.
> All shares will have its own access rights

Ranulf · Mar 20, 2015

XMan said:
So after doing some research, it seems like the spare motherboard I have doesn't support ECC RAM, and a max of 4GB at that.

Maybe QNAP it is . . .

Meh, ecc ram is not absolutely required. The 4gb would be the bigger problem. Put the stuf together in a box and test out freenas or nas4free on it. See if you even like how it works first.

hasu · Mar 21, 2015

I installed FreeNAS Raid-Z2 on 9.3 on 7x3TB with no SSD cache but OS itself on an SSD. Machine has 12GB of DDR3 and running on Xeon L5640 (6 Core 2.2GHz). I am getting insane disk performance. When I created a 100GB file using dd, I got the following numbers:

104 857 600 000 bytes transferred in 34.961937 secs (2 999 193 098 bytes/sec)

i.e, 100 GB file created in 35 seconds with a transfer speed of 2.9 GB/s (almost 3 GB/s). How is it even possible? Even if I write to all the 7 hard drives parallely (like raid-0?), I need 428 MB/s per disk to achieve that combine throughput.

To prove that something is wrong, I created a 1TB file, but to my surprise it scaled very well!
1 048 576 000 000 bytes transferred in 357.327027 secs (2 934 499 551 bytes/sec)

How is it even possible?

Edit: When the compression was turned off I got the following results in line with that of Linux
104 857 600 000 bytes transferred in 288.559959 secs (363 382 364 bytes/sec)

CiPHER · Mar 21, 2015

RAM write-back.

Every modern OS should do similar things.

For example, you may have noticed that transferring files to a USB stick on Windows goes very fast at first, then slows down to the actual speed of the device. And when it's finished you still see the USB-stick flashing because its still writing data. This is the effect of write buffering.

This is because the application does not write to disk! The application uses API's provided by the operating system, which are handled by the filesystem and the virtual memory subsystem. Data read is cached, data written is buffered.

You are experiencing write buffering and when doing reads will experience file cache. Benchmarking is trickiy because before you know it, you are benchmarking your RAM and not your disk.

Can i see the dd-command that you used? And to what controller did you connect the disks? A RAID controller can do its own write-back. And if that is the case, you should know that a controller doing write-back can destroy the whole ZFS pool, no matter if you have a million mirror disks. Redundancy is not going to help you. Pretty much the only way to kill ZFS is to use a write-back controller that changes the order of writes across FLUSH request, which is a violation of the ATA-ACS2 specification. Not every RAID controller does this, but the more full-features ones do.

hasu · Mar 21, 2015

_CiPHER_ said:
RAM write-back.

Every modern OS should do similar things.

For example, you may have noticed that transferring files to a USB stick on Windows goes very fast at first, then slows down to the actual speed of the device. And when it's finished you still see the USB-stick flashing because its still writing data. This is the effect of write buffering.

This is because the application does not write to disk! The application uses API's provided by the operating system, which are handled by the filesystem and the virtual memory subsystem. Data read is cached, data written is buffered.

You are experiencing write buffering and when doing reads will experience file cache. Benchmarking is trickiy because before you know it, you are benchmarking your RAM and not your disk.

Can i see the dd-command that you used? And to what controller did you connect the disks? A RAID controller can do its own write-back. And if that is the case, you should know that a controller doing write-back can destroy the whole ZFS pool, no matter if you have a million mirror disks. Redundancy is not going to help you. Pretty much the only way to kill ZFS is to use a write-back controller that changes the order of writes across FLUSH request, which is a violation of the ATA-ACS2 specification. Not every RAID controller does this, but the more full-features ones do.

MY BAD!!! *********

I had compression turned on. The extraordinary performance figures were the manifestation of on the fly compression of a bunch of zeros!

The disks are attached to on board sata:
Motherboard: Rampage Gene III
CPU: L5640 6 Core 2.2GHz
RAM: 3 sticks of 4GB @1333 Mhz
Command: dd if=/dev/zero of=./zero.file bs=1M count=1000000 for 1TB file.

When the compression was turned off I got the following results in line with that of Linux
104 857 600 000 bytes transferred in 288.559959 secs (363 382 364 bytes/sec)

CiPHER · Mar 21, 2015

With 1TB you are pretty good. The general rule of thumb is at least 8 times your RAM size.

If you test with lower sizes however, you will notice the speed is higher. This is because part of the score is basically RAM-performance. The smaller your size is, the more percentage of that data will be caught by write-back. So you need a very large number to approach >99% where only minuscule amount is RAM-writeback.

hasu · Mar 21, 2015

_CiPHER_ said:
With 1TB you are pretty good. The general rule of thumb is at least 8 times your RAM size.

If you test with lower sizes however, you will notice the speed is higher. This is because part of the score is basically RAM-performance. The smaller your size is, the more percentage of that data will be caught by write-back. So you need a very large number to approach >99% where only minuscule amount is RAM-writeback.

In the end it looks like the performance is the same on both FreeNAS and Ubuntu. Now the big question is which one to go for! Both support all the features that I am looking for.

Since this is one time setup and not something which I can keep changing like my desktop, I need to make an educated decision! FreeNAS is much easier to setup and use, but Ubuntu is completely configurable, and I am more comfortable with it compared to FreeBSD.

CiPHER · Mar 21, 2015

hasu said:
In the end it looks like the performance is the same on both FreeNAS and Ubuntu.

That probably is an illusion. You test single throughput. Sure, this will give you comparable numbers. But on BSD platform the memory management of ZFS is much more mature, and difference in reality will occur. In many ways. I follow the commits of BSD HEAD development branch, and often see commits with performance increase in specific circumstances. Meaning, it is too simple to put performance in a single digit.

One thing i'm hot on, is metadata caching. You may experiment with it, with Linux the memory tuning is more important but can be done quite effectively.

Now the big question is which one to go for! Both support all the features that I am looking for. Since this is one time setup and not something which I can keep changing like my desktop, I need to make an educated decision! FreeNAS is much easier to setup and use, but Ubuntu is completely configurable, and I am more comfortable with it compared to FreeBSD.

Even if ZFS on Linux is less mature and less 'good' than on FreeBSD - which is true in my opinion - then it might still be good enough for you to use it, because you're more comfortable with Linux as platform for the things you wish to do with it.

I would also want to present the option of getting to know BSD better. It's a well documented (FreeBSD handbook) operating system that follows sane rules. It might broaden your horizon. On the other hand, these things are personal preference and if Linux is your thing, then by all means embrace it! You can use ZFS until as long Btrfs matures good enough to be used safely, which can still be quite a few years in my opinion.

But ZFS on Linux is 'good enough' to use, though i still recommend a BSD ZFS solution. It's simply the best implementation of ZFS. Linux has catched up quickly though, from the level of extreme-alpha to wannabe-mature. It needs some real maturing now before it can match the level BSD is at already, and BSD only improves, though less rapidly i guess. So in the end they will match even, probably. Just for the difference Linux has license issues with CDDL-licensed ZFS code. BSD has no such issues and ZFS is part of the base operating system. These limitations will always remain for Linux. That is why Btrfs is their only choice.

XMan · Mar 22, 2015

Ranulf said:
Meh, ecc ram is not absolutely required. The 4gb would be the bigger problem. Put the stuf together in a box and test out freenas or nas4free on it. See if you even like how it works first.

Good call!

Threw everything together last night and am running the latest FreeNAS off of a thumb drive. Pretty nice thus far. Transfers to the machine are a little on the poky side, but it streams Blu Ray MKV's nicely. May have to fiddle for a while and possibly add some more drives. I have a 650, 300, and 250 in it now. My existing solution has 3TB and a pair of 2TB.

hasu · Mar 23, 2015

How do you determine version/release number of ZFS on ubuntu and FreeNAS? I am trying to find out how far behind is ZFS on Linux compared to FreeBSD.

FreeNAS has no opton to display SMART outputs. Instead of doing bits and pieces on the OS and remaining on the GUI, I guess I should do everything command line. So I may have to stick with an OS that I am comfortable with.

selni · Mar 24, 2015

_CiPHER_ said:
Why RAID5 stops working in 2009.

This is exactly one of the dataloss issues that ZFS will counter.

32 megabytes is not all that significant. And in all configurations ZFS uses less memory than virtually all other filesystems, including NTFS, FAT32 and Ext4.

The difference is that ZFS can use memory very effectively because it is smarter than the VFS-layer of the operating system, because it distinguishes between recent data (MRU) and popular data (MFU) which normal VFS does not and will throw away valuable caches agnostically.

ZFS will work fine with less RAM, but it won't be able to utilise all the speed potential of your harddrives. So with few RAM, your array might do 300MB/s instead of 700MB/s. That is the effect of more RAM for ZFS. Generally not a problem because gigabit will cap the speed anyway. More RAM can cache metadata which is very effective for file searching etc.

Most ZFS builds i help people with are quite cheap low-end builds of 200 - 300 for the system excluding harddrives. You can even do about 100-150 if your budget is stretched. This is cheaper than a solution from the shelf, which generally is twice as expensive while you get slower hardware and legacy software.

It will only protect against memory corruption, which is rather rare. Without ECC, ZFS can detect all and can correct at least some of the corruption caused by RAM bitflips. If it couldn't fix it, at least you know which files are affected. Also note that - even with ECC - you are still not protected against 3 bitflips; only against 1 (correction) and 2 (detection).

People are *still* quoting that article?

If you take the assumptions at face value (URE really is 1e-14 and any single bit failure will cause a rebuild to fail), you'll calculate that the chance of surviving a RAID5 failure is low and even RAID1/10 is pretty bad. I recall something about that particular article getting the math wrong but it doesn't really matter - the assertion that the data loss chance looks scary when calculated like that is correct. What's not mentioned is it looks pretty scary for a single drive or RAID 1/10 too.

In practice though RAID arrays don't have anywhere near that sort of total failure chance (do the vast majority of your single drive RAID5 failures and a good percentage of your RAID1/10s result in total data loss?). Data loss happens sure, but it's the exception rather than an almost certainty on drive failure.

Why? The assumptions are wrong. At the very least a raid controller that refuses to rebuild anything else because of a single bit error is downright malicious, and whether 1e-14 is an accurate number is certainly not established.

CiPHER · Mar 26, 2015

hasu said:
How do you determine version/release number of ZFS on ubuntu and FreeNAS? I am trying to find out how far behind is ZFS on Linux compared to FreeBSD.

OpenZFS website had a feature-comparison i believe. But generally, the pool features do not vary much per platform. ZFS on Linux has pretty much the same on-disk features. That doesn't mean it supports the exclusive BSD features though; like automatic SWAP ZVOLs, booting from RAID-Z, TRIM on ZFS and more.

But ZFS isn't simply a program you install and then it works. It has to be integrated with the operating system. And this is where you get major differences. So you could have the same ZFS version with the same ZFS pool features, but still behave and perform very different, because the memory management differs greatly from Solaris to BSD to Linux.

ZFS on Linux still has stability problems. The only people who has failed ZFS pools recently, were running ZFS on Linux. More often, after a regular update (update manger) from Ubuntu the pool cannot be imported anymore, or problems like that. But those bugs do not cause permanent dataloss. It is only frustrating.

If you ask me, ZFS on Linux is where BSD was years ago. Then it 'simply worked' in virtually all cases, but still had issues and since then various dataloss bugs have been found and resolved. I would not advise ZoL - but it is making big progress and starts getting usable. But to say it is mature or production quality - well some have claimed it to be but i remain highly sceptical. The one major danger of using ZFS is its complexity. That can lead to bugs which only occur in very specific circumstances, but have catastrophic result. Best is to stick to the mainstream. And for ZFS that is a BSD-implementation.

FreeNAS has no opton to display SMART outputs.

It doesn't? That would surprise me. SMART was one of the first features to make it into ZFSguru.

selni said:
People are *still* quoting that article?

Because it is very relevant; and exposes an inherent weakness of legacy RAID and filesystems.

That is, the RAID and/or filesystem is not designed for disks which get unreadable sectors. RAID is designed to work with disks that either are 100% good, or bad. It doesn't really accept anything in between.

This is especially true for FakeRAID (like Intel/AMD/nVidia/VIA/ASMedia/Silicon Image/Promise/JMicron) and many hardware RAID. Those RAIDs will drop a disk from the RAID array even if one tiny sector cannot be read. Most drop disks if they exceed the timeout (of 10-25 seconds). Rebooting will not help: it will update the metadata of the other disks to make this chance permanent.

And this is why quite a lot of people have lots their data; not because any disk is bad but because the legacy RAID layer decides it is time to drop the disks from the array.

We can talk about the maths used by Robin Harris. But i find that not so interesting. Because the issue that the article exposes, is truly inherent to the technology. RAID simply does nothing to protect against bad sectors. That is why 'everyone' is replacing legacy RAID and filesystems with new-generation filesystems like ZFS (multi-platform, Btrfs (linux-only) and ReFS (Windows-only). Microsoft even wanted to introduce bitrot correction in DE 2.0 used in Windows Home Server (WHS). But it abandoned the project and went for ReFS instead. It is the least mature 3rd generation filesystem, which is designed to fight bitrot.

So:

legacy stuff = works great for binary failures (DEAD or ALIVE)

modern stuff = works great for disks that work fine only cannot read a few sectors.

Because the latter is becoming much more common. As you know, the uBER specification has stayed at 10^-14 meaning consumer drives get 1 bad sector every day on average when using 100% duty cycle. With a more normal duty cycle for consumer usage, this would translate to a bad sector every 3 to 6 months.

Because do not fool yourself: the uBER specificfation specifies bits, but in reality this means one uncorrectable bit = one unreadable sector of 4KiB. It also assumes that the unreadable sectors are related to duty cycle, which is not totally true. With 0,001% duty cycle you get much more than 0,001% unreadable sectors as opposed to the 100% duty cycle rated uBER specification.

If you take the assumptions at face value (URE really is 1e-14 and any single bit failure will cause a rebuild to fail), you'll calculate that the chance of surviving a RAID5 failure is low and even RAID1/10 is pretty bad. I recall something about that particular article getting the math wrong but it doesn't really matter - the assertion that the data loss chance looks scary when calculated like that is correct. What's not mentioned is it looks pretty scary for a single drive or RAID 1/10 too.

In practice though RAID arrays don't have anywhere near that sort of total failure chance (do the vast majority of your single drive RAID5 failures and a good percentage of your RAID1/10s result in total data loss?). Data loss happens sure, but it's the exception rather than an almost certainty on drive failure.

Why? The assumptions are wrong. At the very least a raid controller that refuses to rebuild anything else because of a single bit error is downright malicious, and whether 1e-14 is an accurate number is certainly not established.

Well that is what the manufacturer tells you. And virtually all ZFS users i know have had uBER bad sectors on their consumer-grade drives already. So it's not some theoretical thing this happens all the time.

In the past, with lower data densities, the uBER wasn't a big deal. Only about 10% of all unreadable sectors were due to insufficient bitcorrection (uBER) - and 90% were physically damaged sectors. Today, it is the other way around; 90% of unreadable sectors are due to uBER - while only 10% is physically damaged. This can be seen in the SMART data. Damaged sectors get replaced with reserve sectors (Reallocated Sector Count) while undamaged uBER sectors simply get overwritten and stay in use, only the Current Pending Sector is subtracted; removing all evidence of there ever been an unreadable sector.

Legacy RAID is not suitable for todays hardware. That is de rough conclusion and the whole point of that article. I would say it is very much valid; even if the math is off a bit.

selni · Mar 26, 2015

Despite the conclusions being demonstrably wrong?

What he's calculated is the chance of at least a single bit (which would kill the sector/stripe yes) error in an array rebuild, and then interpreted that as the chance of rebuild failure. That's simply not correct and doesn't line up with reality.

Don't get me wrong - there's certainly problems with error rates in current drives vs capacity and bitrot detection is a very good idea (and you should definitely consider it for a NAS), but the reality is that legacy RAID (even RAID5, despite many other problems with it) is still used all the time in relatively large systems and still works pretty well.

CiPHER · Mar 26, 2015

What he is saying is that if you have a traditional disk failure (dead disk) and during the rebuild a bad sector (due to uBER) pops up, you're in trouble. And for many RAID engines this is actually the case. Not for Linux mdraid or BSD geom raid or some proper hardware RAID. But other Hardware RAID (like Areca, very popular brand) and all onboard RAID ('fakeRAID') are susceptible to this kind of failure, where it is the RAID layer that fails and not really the disk. The array would be degraded but only one sector is missing; hardly forth forfeiting the whole array for. But some RAID engines are really stupid. Most are, actually. There is a lot of potential that doesn't get utilised.

But RAID is outdated. The whole concept is outdated. You have a separate layer that presents a virtual SCSI disk and then a filesystem that assumes one disk/partition will use that as if it were a normal disk. The filesystem has no knowledge about stripe boundaries (which for RAID5/6 are especially important) and has no knowledge of stripesize and required alignment, so often the alignment will be off for RAID5's.

It is logical evolution that 3rd generation filesystems (ZFS, Btrfs) do not assume a single disk, but actually are multi-disk filesystems that directly deal with the on-disk layout of the physical disks. This makes new things possible, such as a variable stripesize used for ZFS RAID-Z - which simply is theoretically impossible for RAID-arrays because by definition it would be totally cut off from filesystem logic.

My own conclusion is that legacy storage (many RAID + 2nd gen. filesystems like NTFS/Ext4/XFS) are well suited to cope with traditional disk failure, but are much more prone to the increasing risks of uBER bad sectors - sectors that physically are okay and will not be swapped with reserve sectors, but which cannot be read due to insufficient errorcorrect (ECC). For today's hardware, you need better disks (uBER 10^-15 or even -16) or you go the smart route and use 3rd generation filesystems. So either pay the bick bucks to get the legacy solution shining, or change platform to utilise more advanced software that allows you to use cheap consumer-grade disks without risk.

ashieru · Mar 27, 2015

i am starting to build and test a platform

i got myself a QC5000 asrock (AMD A5000), 4Gb ram, 4channel SATA onboard. will be adding 1 more PCIE SATA expander, prolly 4/8ports, will also be adding a mini-PCIE card that will add 2x SATA ports (replace the mini WIFI port)

in the process of testing i have also bricked a Asus M35 lol. not sure if it is the bios chip.
i was also trying to make a old rocket raid 2680 work on linux, it was hell ! no luck so far.

total SATA 10-14ports. so far max transfer rates tested is about 150mb/s, minimum about 60mb/s (small files)

just info for the rest out there. DLINK NAS 343 max transfer rate for me is about 26mb/s.
i have 2 DLINK NAS blocks, it was cheap, but as time goes by, i find that its heat management is really horrigible. for max rate DLINK NAS, jumbo frame = 9000, LLTD = off, internet to device = blocked

i have also tried to implement ZFS, but without any luck. the main problem is the compatibility of the QC5000 with freenas, etc. does anyone have any recommendation on ZFS on QC5000?

i have also tried SNAPRAID, i think it is fantastic.

so far the most stable LINUX on QC5000 is debian XFCE and XUBUNTU/LUBUNTU

appreciate any tips on the above (esp how to get ZFS stable on a A5000 platform)

hasu · Mar 28, 2015

I am still learning FreeBSD and ZFS, but so far I like it. Looks like ZFSguru is better than FreeNAS, in the sense that ZFSguru is actually focusing on the original requirement - managing disks and file systems. From my experience FreeNAS seems to have lost their goal. I like their plugins and virtual machines and all that, but to me if you cannot configure and manage the storage then everything else is meaningless. Just my 2 cents.

On my test setup, I have installed FreeBSD 10.1 on an SSD and installed ZFSguru web interface on top of it, and then created the pool (raid-z2 out of 7 harddrives) and file systems from within the ZFSguru web interface. An advantage of this setup is that I don't have to completely depend on ZFSguru if I want to do things differently.

crashtech · Mar 28, 2015

As a tangent to the hardware requirement portion of this thread, it is not hard to find retired server hardware with ECC and enough CPU power to get the job done, for instance gutting an old 1U server and stuffing it into a cheap ATX case with lots of drive space makes a nice, if space hogging NAS on the cheap. Key is just making sure you have an ATX motherboard and know what HSF setup to get. It's not pricey or complicated.

hasu · Apr 2, 2015

I went back and forth between Linux and FreeBSD and finally decided to build my NAS based on Linux along with ZFS on Linux kernel modules, and hope that they will continue to develop ZoL to make better future versions. Linux is much more polished and supported than FreeBSD. When I use FreeBSD and its ZFS, I feel that I am doing some free testing for some companies to keep their source code closed and still get free help from the community. I think I will stick with Linux and whatever they have to offer.

Build or buy a NAS?

Senior member

Diamond Member

Senior member

Senior member

Platinum Member

Senior member

Senior member

Senior member

Senior member

Senior member

Senior member

Lifer

Senior member

Senior member

Senior member

Senior member

Senior member

Junior Member

Senior member

Lifer

Senior member