I set up ZFS on Ubuntu (via fuse) for a storage array at home and it has worked great for almost a year now, despite its 'beta' status. I log in and check the array every once in a while using:
zpool status
Which results in:
NAME STATE READ WRITE CKSUM
media ONLINE 0 0 0
raidz1 ONLINE 0 0 0
sda ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
errors: No known data errors
This is all fine, but I'd like to automate a way to check every once in a while to make sure my pool is error-free.
I have cacti, and zabbix available at my disposal. I suppose I could also write a program that greps that output and if it doesn't find the phrase "No known data errors", send me an email.
However, is there any package already made that does this, or specifications on how I can get some performance metrics from this array?
zpool status -x is the preferred way to check the pools status via a script. Its output is "all pools are healthy" unless there are issues making it a bit easier to use as a test. Otherwise as you get more than one pool your check script and greps will get more complex. So you can setup a cron job to run a script and make sure the status of that is "all pools are healthy" and send out an alert email of the output otherwise.
You could even setup a nagios plugin to do this for you. I'm assuming that zabbix should be extendable in the same ways.
Also make sure you're running zpool scrub regularly....I would set this up in a cron job as well. This will detect and correct any issues it finds in the pool in areas of the file system that is not accessed often and can catch and correct issues before they result in any data corruption.
To get performance metrics you can use zpool iostat [seconds between updates] I'm not sure how to tie that into cacti though but I'm sure it's possible.
assuming zabbix and gnu tools... add to zabbix agent daemon config file the following :
UserParameter=zpool.status,zpool status | grep -q "No known data errors" && echo 1 || echo 0
now, in zabbix add an item with key "zpool.status", create a trigger against it (using a function like ".last(0)=0") and you're done - trigger will fire whenever that string is missing from the zpool status output.
this also assumes that 'zpool' will be in the path of the zabbix user, and that this user will be allowed to run zpool. if not, specify full path and use sudo. another catch might be a default shell that doesn't support used syntax, in which case you can either rewrite the userparameter, or force it to use bash.
if possible, also do a iostat -xCzn and grep for any HW or Transport errors on the disks or controller.