ZFS Pool ZIL dying killed entire server?

XavierMace

Diamond Member
Apr 20, 2013
4,307
450
126
Well, I've had my first bad(ish) experience with ZFS and I'm a bit perplexed why it went bad. First off let me say I'm operating under the assumption it's my fault. I also have the box backed up so worst case I could have blown it out and rebuilt it but I'm trying to understand what went wrong so I don't repeat it in the future.

Setup:
Solaris 11 + nappit backing two ESXI 6.0 hosts
Crucial 128Gb MX100's for boot drive and ZIL
16x Seagate SATA spinners for pool in RAIDZ-2
All drives running off an IBM M1015 (LSI controller) in IT mode

Recently I've been noticing severely decreased write performance on the VM's being backed by the pool. Today it reached the point where I decided to dig into it more. vSphere shows command queuing latency meaning it's sending commands to the storage and it's taking too long to get a reply. I try copying a 30Gb file to the file server VM over the network. Initially starts out at ~115MB/s as expected/normal, but after about 15Gb, drops to nothing. Run a dd bench on the ZFS pool and write speed is severely degraded compared to normal. Check disk stats and see the ZIL is showing 70% busy and a wait% despite almost nothing going on with the storage. Turn sync off and speed goes back to normal. So, ZIL drive is toast. This isn't a production environment so I didn't go with a "good" SSD, but that said the MX100 wasn't terribly old either.

So I shutdown the SAN, pull the ZIL, hook it up to my primary PC in a USB enclosure. Crucial Toolbox says the drive is healthy (9TB written), but it is running an old firmware. So I upgrade the firmware. Since it's running as a ZIL, I know GC probably isn't working so I do a secure erase on it to see if I can revive it to previous speeds. This is probably my critical error here, but I'm trying to understand why it caused as many problems as it did.

Put the wiped ZIL drive back in it's original bay and power the SAN back on. Log into napp-it and the box is running slow as balls. Pages that normally load nearly instantly are taking 5+ minutes to load. Pool shows as UNAVAILABLE. Exported the pool so I could reimport with the -m flag. Box hangs on the import. Reboot the box, it hangs on boot (stuck at the spinning wheel). At this point I'm assuming I'll be rebuilding the SAN and restoring from backup. Hard reset the box, it boots normally back into Solaris. zPool is missing but box is otherwise running normally. Response times seem normal. Curious but I press on.

Reimport the pool with -m. It's now showing up as degraded (which is expected). Clear the pool, and it now lets me remove and readd the ZIL. Pool is now showing back online, which is great. However, the ESX hosts this storage is backing isn't seeing the LU's. I check the Solaris box and the LU's are gone. Check the drives and the LU files still exist. Import the LU's, recreate views. Hosts now see the storage back. Power on the VM's and everything is running normally save for the fact write speed is still hosed. Disable sync again and performance returns back to full speed. So ZIL is still hosed.

So, this leaves me two questions.

1) Why did the pool issues have a severe negative impact on OS performance? The OS isn't running off the pool.

2) Would offlining the pool, then removing the ZIL and adding it's replacement have prevented this issue?
 

gea

Senior member
Aug 3, 2014
221
12
81
I see mainly two problems,
especially as performance went well with the ZIL disabled

- If you need sync write with a good performance, you need a REALLY fast SSD regarding low latency, high steady write iops and powerloss protection like an Intel S3700 or an NVMe like an Intel P3600 or similar. Without a ZIL with Powerloss Protection you should ask why you want a ZIL that should protect you against a dataloss on a powerloss for the price of a much lower write performance. An average desktop SSD can give very low sync write values.

A "desktop class" SSD is a bad ZIL, do not expect a good performance

- A pool from a single raid z2 vdev has the same iops like a single disk - around 100-200 iops. As VM performance is mostly iops limited, you should avoid such a pool. If you really need a good performance for VMs, use SSD only pools (without ZIL/ L2ARC).

If you need capacity, the best compromise are pools from mirrors, example 8 x mirrors.

btw
You do not need to offline to remove a ZIL, just call Disks > Remove or Disks > Replace

You should also check iostat for a single weak disk as pool performance cannot be better that the weakest disk.

You should avoid dd as this is not a benchmark only a test for basic functionality.
You can try Pools > Benchmarks > Filebench > Singlestreamwrite or Fivestreamwrite if you want a quite good but short running benchmark

OS performance is not affected by data pool performance but as most operations are related to the data pool its performance is limiting most actions.

You may also check the firmware of the LSI
Firmware 20.00 - 20.00.04 are known for problems, current 20.00.07 is ok.
 

XavierMace

Diamond Member
Apr 20, 2013
4,307
450
126
I've got a UPS in the event of a power loss that will do network shutdowns. Performance wise, as I stated this is for home lab use to I'm not looking for top tier performance. While 24TB of SSD would be awesome, that's not in my budget. The issue is the performance has drastically decreased over the last year despite no hardware changes. At the time of the above performance issues there was only a single VM running off the pool (the file server). That VM was idle other than the single network file copy I was trying to do. Even the cheapest of SSD's can keep up with that when operating properly. I am looking to replace it with a more suitable SSD but my point still stands that the original performance with my current setup was quite acceptable for my usage.

I've been using iostat to phase out the slower spindles.

I will double check the firmware but I'm pretty certain I'm on 20.00.07, thanks for that info.
 

Viper GTS

Lifer
Oct 13, 1999
38,107
433
136
What do the SMART stats on that MX100 look like? Any chance you've worn it out?

Viper GTS
 

XavierMace

Diamond Member
Apr 20, 2013
4,307
450
126
That was my assumption but smart stats look fine and both Crucial's tool and CrystalDiskInfo report it as health. 9TB writes to the drive.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |