user's questions -server

user

Asked: 2017-09-01 00:28:27 +0800 CST

What to do in response to repeat DRAM ECC error notifications for the same memory location?

3

I woke up this morning to what's a first for me; one of my systems had logged DRAM ECC error notifications. Three of them, in fact, for as far as I can tell the exact same memory location (obviously, the system isn't actually named localhost):

Aug 31 05:00:46 localhost kernel: [719099.816034] [Hardware Error]: CPU:0   MC4_STATUS[-|CE|MiscV|-|AddrV|-|-|CECC]: 0x9c6c40006b080a13
Aug 31 05:00:46 localhost kernel: [719099.816046] [Hardware Error]:         MC4_ADDR: 0x0000000641f49d20
Aug 31 05:00:46 localhost kernel: [719099.816051] [Hardware Error]: Northbridge Error (node 0): DRAM ECC error detected on the NB.
Aug 31 05:00:46 localhost kernel: [719099.816059] EDAC amd64 MC0: CE ERROR_ADDRESS= 0x641f49d20
Aug 31 05:00:46 localhost kernel: [719099.816070] EDAC MC0: CE page 0x641f49, offset 0xd20, grain 0, syndrome 0x6bd8, row 2, channel 0, label "": amd64_edac
Aug 31 05:00:46 localhost kernel: [719099.816075] [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)

The above was followed by an identical notification at system time 05:10:46 (719699.8160) and then one more at 05:20:46 (720299.8160) which also had Over on the CPU:0 MC4_STATUS line (status 0xdc6c40006b080813). So far the system has been stable since, with no further errors logged. System activity is normal, and the system in question has been running with ECC RAM since 2014 but never logged any ECC errors.

I wouldn't be too worried about a single correctable ECC error. The almost exactly ten minutes (down to a few microseconds, in fact) in between the errors being logged could be simply for RAM scrubbing happening every ten minutes; unfortunately, on this particular system, the scrub interval is not exposed as a setting. However, the three consecutive errors in the same memory location (same value for CE ERROR_ADDRESS) does have me a little bit concerned.

Update: The host in question has logged several more since I originally posted this question, all with the same value for CE ERROR_ADDRESS.

How seriously should I take this? What's a good response; order replacement RAM right away and schedule to install it ASAP, treat this as just a momentary glitch, or be on toes to replace RAM if it happens again but no specific action right now?

user

Asked: 2017-07-11 05:19:42 +0800 CST

systemd unit doesn't start on boot on Debian 9, but starts fine when started manually after boot and on boot on Debian 8

1

I have a systemd unit intended to establish a SSH tunnel between two servers. The server which has the unit runs Debian 9. This is what the .service file looks like, except for a few Documentation directives which I have elided here for brevity (they are not the problem, and they are parsed just fine by systemd):

# cat /etc/systemd/system/ssh-tunnel-remote1.service
[Unit]
Description=SSH tunnel for services on remote1
After=network-online.target
[Install]
WantedBy=networking.target
[Service]
Type=simple
User=ssh-remote1
Group=ssh-remote1
Environment=AUTOSSH_POLL=90
ExecStart=/usr/bin/autossh -M 0 -q -N -p 15539 -o "PubkeyAuthentication yes" -o "PreferredAuthentications publickey" -o "IdentityFile /home/ssh-remote1/.ssh/id_rsa" -L 9999:127.0.0.1:X ssh-tunnel@remote1.example.com
Restart=always
PrivateTmp=true
#

(Note: the X in the -L is a real port number.)

On the server where this service is being run, /usr/bin is on /, so it's not an issue of the file system not being mounted when the service is being started.

The After=network-online.target should be plenty enough for DNS to be available, and even if that was the problem, you'd think that systemd would restart the service when it fails.

The service itself looks like it is enabled:

# find /etc/systemd -name ssh-tunnel-remote1\*
/etc/systemd/system/networking.target.wants/ssh-tunnel-remote1.service
/etc/systemd/system/ssh-tunnel-remote1.service
#

but systemctl list-units doesn't seem to know about it:

# systemctl list-units -t service --all | grep ssh-tunnel-remote1
#

I have tried various permutations of systemctl daemon-reload, systemctl reenable ssh-tunnel-remote1, systemctl enable ssh-tunnel-remote1, systemctl disable ssh-tunnel-remote1 and reboot.

Seemingly no matter what I do, after booting, the service shows up as inactive (dead):

# systemctl -o verbose  -l status ssh-tunnel-remote1
● ssh-tunnel-remote1.service - SSH tunnel for services on remote1
   Loaded: loaded (/etc/systemd/system/ssh-tunnel-remote1.service; enabled; vendor preset: enabled)
   Active: inactive (dead)
#

However, it starts just fine if I do it manually:

# systemctl start ssh-tunnel-remote1
# systemctl status ssh-tunnel-remote1
● ssh-tunnel-remote1.service - SSH tunnel for services on remote1
   Loaded: loaded (/etc/systemd/system/ssh-tunnel-remote1.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2017-07-10 13:01:11 UTC; 55s ago
 Main PID: 17835 (autossh)
    Tasks: 2 (limit: 4915)
   CGroup: /system.slice/ssh-tunnel-remote1.service
           ├─17835 /usr/lib/autossh/autossh -M 0 -q -N -p 15539 -o PubkeyAuthentication yes -o PreferredAuthentications publickey -o IdentityFile /home/ssh-remote1/.ssh/id_rsa -L 9999:127.0.0.1:X ssh-tunnel
           └─17838 /usr/bin/ssh -q -N -p 15539 -o PubkeyAuthentication yes -o PreferredAuthentications publickey -o IdentityFile /home/ssh-remote1/.ssh/id_rsa -L 9999:127.0.0.1:X ssh-tunnel@remote1.example.

Jul 10 13:01:11 localhost systemd[1]: Started SSH tunnel for services on remote1.
Jul 10 13:01:11 localhost autossh[17835]: port set to 0, monitoring disabled
Jul 10 13:01:11 localhost autossh[17835]: starting ssh (count 1)
Jul 10 13:01:11 localhost autossh[17835]: ssh child pid is 17838
# telnet 127.0.0.1 9999
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
<usable connection here>
Connection closed by foreign host.
#

Immediately after a reboot, journalctl -xru ssh-tunnel-remote1.service simply prints -- No entries --. Searching manually through the output of journalctl also doesn't show it anywhere at all. In contrast, after starting the service manually, the same command outputs something very similar to:

-- Logs begin at Mon 2017-07-10 12:46:14 UTC, end at Mon 2017-07-10 13:10:24 UTC. --
Jul 10 13:01:11 localhost autossh[17835]: ssh child pid is 17838
Jul 10 13:01:11 localhost autossh[17835]: starting ssh (count 1)
Jul 10 13:01:11 localhost autossh[17835]: port set to 0, monitoring disabled
Jul 10 13:01:11 localhost systemd[1]: Started SSH tunnel for services on remote1.
-- Subject: Unit ssh-tunnel-remote1.service has finished start-up
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- Unit ssh-tunnel-remote1.service has finished starting up.
-- 
-- The start-up result is done.

This is a homegrown .service file, but it's working fine on another server that runs Debian 8.

I have tried placing it both under /etc/systemd/system and /lib/systemd/system, with no apparent difference.

When executed from the command line as su -l ssh-remote1 -c '/usr/bin/autossh -M 0 -q ...', autossh and ssh runs fine in the foreground and the tunnel is available.

I'm practically certain that I am missing some simple difference between Debian 9's systemd 232 and Debian 8's systemd 215, but what? What incantations are required to make that service start on boot on Debian 9?

user

Asked: 2016-06-22 12:12:28 +0800 CST

With IPv6, should we be assigning distinct IP addresses to each host name served over HTTP(S)?

9

With IPv4, it's pretty much a given that unless there is some specific need that warrants IP-based virtual hosting, name-based virtual hosting should be done to avoid needlessly exhausting the address space.

However, given that for IPv6 the current recommendation is that even home sites should receive multiple /64s worth of address space, is it not reasonable, absent operational practices in the specific situation which would make this difficult or prohibitive, to assign distinct IPv6 address to each web site, even when those web sites are co-hosted on the same server?

Assuming that a good address management infrastructure of some kind is in place such that one can handle the assignment of addresses, what might be good arguments for or against giving each web site its own IPv6 address?

For completeness, the relevant part of the above-referenced section from the RFC is (emphasis mine; note that this is for comparison only, and this quote does not make the question one about home networks):

At the same time, it might be tempting to give home sites a single /64, since that is already significantly more address space compared with today's IPv4 practice. However, this precludes the expectation that even home sites will grow to support multiple subnets going forward. Hence, it is strongly intended that even home sites be given multiple subnets worth of space, by default. Hence, this document still recommends giving home sites significantly more than a single /64, but does not recommend that every home site be given a /48 either.

Also for completeness: The relevant network does not yet have any IPv6 assignment, and I don't know the exact size of the assignment that might be made, but I'm hoping to get IPv6 set up and running within the next 6-12 months and would like to plan ahead a little to be ready when that happens.

user

Asked: 2016-03-08 13:07:52 +0800 CST

Do any OpenSSH 6.7 `preauth` error log entries warrant specific human attention?

2

A Linux (specifically Debian Jessie) server that needs to be exposed to the Internet is spitting out various OpenSSH 6.7 preauth errors in the logs. For example, I'm getting (timestamps elided for clarity):

error: Received disconnect from A.B.C.D: 3: com.jcraft.jsch.JSchException: Auth fail [preauth]
fatal: Unable to negotiate a key exchange method [preauth]
fatal: no matching cipher found: client ... server ... [preauth]
Received disconnect from A.B.C.D: 11: Normal Shutdown, Thank you for playing [preauth]
Received disconnect from A.B.C.D: 11: ok [preauth]

and so on.

I'm not terribly worried about the probes themselves; the system is kept up to date, the OpenSSH configuration is fairly well hardened according to current best practice, and there are additional protections (e.g. fail2ban) in place.

Is there any reason why any preauth OpenSSH log entries would warrant specific human attention?

The answers to the question What does “Normal Shutdown, Thank you for playing [preauth]” In SSH logs mean? indicates that the specific case in that question is safe to ignore; my question is more generic.

user

Asked: 2015-11-05 03:46:41 +0800 CST

Does the max-80%-use target suggested for ZFS for performance reasons apply to SSD-backed pools?

2

The Solaris ZFS Best Practices Guide recommends keeping ZFS pool utilization below 80% for best performance:

Keep pool space under 80% utilization to maintain pool performance. Currently, pool performance can degrade when a pool is very full and file systems are updated frequently, such as on a busy mail server. Full pools might cause a performance penalty, but no other issues. If the primary workload is immutable files (write once, never remove), then you can keep a pool in the 95-96% utilization range. Keep in mind that even with mostly static content in the 95-96% range, write, read, and resilvering performance might suffer.

A common suggestion for how to implement this seems to be to make a file system or volume that is not used to store any data, but which has a size reservation of about 20% of pool capacity.

I can absolutely see, with ZFS' copy-on-write behavior, how this would help with rotational storage, because rotational storage tends to be fairly heavily IOPS-constrained so giving the file system room to make large contiguous allocations makes a lot of sense (even if they wouldn't be used as such all the time).

However, I'm not sure the 80% target makes as much sense with solid state storage, which besides being a good bit more expensive per gigabyte doesn't have anywhere near the IOPS constraints of rotational storage.

Should SSD-backed ZFS pools be restricted to less than about 80% capacity utilization for performance reasons just like HDD-backed pools, or can SSD-backed pools be allowed to fill up more without significant adverse impact on I/O performance?

user

Asked: 2015-10-25 05:05:52 +0800 CST

In OpenSSH DEBUG1 output on connecting, what does the number in the `Server accepts key` line refer to?

0

When I connect to remote systems, specifically in this case using OpenSSH 6.0p1 Debian-4+deb7u2, if I use the -v switch to see what's going on one of the lines printed is:

debug1: Server accepts key: pkalg ssh-rsa blen 567

This looks to me like the server has accepted the identification public key (which is mentioned as offered on the immediately preceding line), which is great.

However, what does the 567 at the end refer to? "blen" sounds like it could be "bit length", but 567 isn't related to any bit length that I know of, even if converting bits to bytes.

Google was distinctly unhelpful, most likely because this stanza appears in more or less every ssh -v somewhere.example.com output that anyone has ever posted, but at least shows that the number varies (149, 277 and 279 are all on Google's first page of hits, when restricting to ssh-rsa exchanges).

user

Asked: 2015-10-24 00:46:01 +0800 CST

Why is Debian Jessie systemd reloading my Apache server every morning?

7

After upgrading a web server from Debian Wheezy to Debian Jessie, the following log entries appear in the system log every morning. The times vary somewhat, but it seems to always happen at approximately the same time (plus/minus maybe 10-15 minutes at most). Nothing similar (that I can recall) happened before the upgrade.

Oct 23 06:25:02 hostname systemd[1]: Reloading LSB: Apache2 web server.
Oct 23 06:25:04 hostname apache2[1545]: Reloading web server: apache2.
Oct 23 06:25:04 hostname systemd[1]: Reloaded LSB: Apache2 web server.
Oct 23 06:29:10 hostname rsyslogd0: action 'action 17' resumed (module 'builtin:ompipe') [try http://www.rsyslog.com/e/0 ]
Oct 23 06:29:10 hostname rsyslogd-2359: action 'action 17' resumed (module 'builtin:ompipe') [try http://www.rsyslog.com/e/2359 ]

Looking at the output of service apache2 status:

● apache2.service - LSB: Apache2 web server
   Loaded: loaded (/etc/init.d/apache2)
   Active: active (running) since Fri 2015-10-09 21:33:36 UTC; 1 weeks 6 days ago
  Process: 21467 ExecStop=/etc/init.d/apache2 stop (code=exited, status=0/SUCCESS)
  Process: 1545 ExecReload=/etc/init.d/apache2 reload (code=exited, status=0/SUCCESS)
  Process: 21489 ExecStart=/etc/init.d/apache2 start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/apache2.service
           ├─ 1625 /usr/sbin/apache2 -k start
           ├─ 1626 /usr/sbin/apache2 -k start
           ├─ 4686 /usr/sbin/apache2 -k start
           ├─ 7745 /usr/sbin/apache2 -k start
           ├─ 7746 /usr/sbin/apache2 -k start
           ├─ 7747 /usr/sbin/apache2 -k start
           ├─ 7748 /usr/sbin/apache2 -k start
           ├─ 7753 /usr/sbin/apache2 -k start
           ├─ 7760 /usr/sbin/apache2 -k start
           ├─ 7771 /usr/sbin/apache2 -k start
           └─21505 /usr/sbin/apache2 -k start

Oct 21 06:25:02 hostname.fqdn systemd[1]: Reloading LSB: Apache2 web server.
Oct 21 06:25:08 hostname.fqdn apache2[32200]: Reloading web server: apache2.
Oct 21 06:25:08 hostname.fqdn systemd[1]: Reloaded LSB: Apache2 web server.
Oct 22 06:25:03 hostname.fqdn systemd[1]: Reloading LSB: Apache2 web server.
Oct 22 06:25:05 hostname.fqdn apache2[16779]: Reloading web server: apache2.
Oct 22 06:25:05 hostname.fqdn systemd[1]: Reloaded LSB: Apache2 web server.
Oct 23 06:25:02 hostname.fqdn systemd[1]: Reloading LSB: Apache2 web server.
Oct 23 06:25:04 hostname.fqdn apache2[1545]: Reloading web server: apache2.
Oct 23 06:25:04 hostname.fqdn systemd[1]: Reloaded LSB: Apache2 web server.
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

I'm not sure if I'm looking in the right place, the only part of /run/systemd/generator.late/apache2.service that looks remotely relevant is the mention of the ExecReload command, which is echoed in the service apache2 status output.

Why is systemd reloading the web server with such regularity, despite nobody doing anything on the server, and how do I make it stop?

user

Asked: 2015-06-16 11:43:34 +0800 CST

Just installed LSI 9211; no drives showing up to Linux

6

I just added a LSI 9211-8i to a system running Debian Wheezy (on the Linux kernel). All software is up to date and the kernel is 3.2.65-1+deb7u2 x86_64 according to uname.

The card came straight out of the packaging and into the host after visual inspection that didn't uncover anything that was clearly wrong with the card (though I have no known good card to compare against). This, along with the fact that the kernel is speaking to the card (see below) leads me to believe that the card itself is slightly more useful than a dud.

Physically installing the card posed no problems. The card being PCIe x8 didn't need the full length of the PCIe x16 slot I had available, but as far as I can tell that should not be a problem if the host and card are speaking to each other at all. The motherboard has two PCIe x16 slots, one of which is listed as "x4 performance". Since the card is obviously being detected at some level, I do not believe anything like the graphics-card-only x16 slots is at play here.

To the 9211's internal ports I hooked up two 8077-to-4x8482 breakout cables, connecting each to two HDDs (leaving unused the other two plugs on each) with no PMP or anything similar in between. One of the two 8077 ports (in the unlikely case it makes a difference, the one farther from the PCIe slot) was slightly finicky, but the cable clicked into and locked in place without arguments once I slided it in at the right angle. I looked more closely around the area of that port but could find no evidence of physical damage to the card.

The system was noticably noisier on boot compared to what it was before I installed these new drives, which leads me to believe that the card is, at the very least, supplying power and spinning up the drives. The drives subsequently spun down.

I expected the card to make some utterances during the boot process, and was rather surprised to get nothing of the sort (no "Press Ctrl-C to start LSI Logic Configuration Utility" prompt). I looked through the motherboard's BIOS setup, but could find no relevant switches that needed to be flipped for off-board BIOSes or HBAs. Hammering Ctrl+C during the boot process up to GRUB (to try to invoke the card's on-board configuration utility) did not produce any visible results.

The mpt2sas module was loaded automatically on boot, and seems to talk to the card just fine:

[    1.692606] mpt2sas version 10.100.00.00 loaded
[    1.698699] mpt2sas 0000:08:00.0: enabling device (0000 -> 0002)
[    1.698717] mpt2sas 0000:08:00.0: setting latency timer to 64
[    1.698721] mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (32967612 kB)
[    1.698761] mpt2sas0: IO-APIC enabled: IRQ 16
[    1.698764] mpt2sas0: iomem(0x00000000d0440000), mapped(0xffffc90013ea8000), size(16384)
[    1.698766] mpt2sas0: ioport(0x0000000000001000), size(256)
[    2.139165] mpt2sas0: Allocated physical memory: size(3379 kB)
[    2.139168] mpt2sas0: Current Controller Queue Depth(1483), Max Controller Queue Depth(1720)
[    2.139170] mpt2sas0: Scatter Gather Elements per IO(128)
[    2.360461] mpt2sas0: LSISAS2008: FWVersion(20.00.00.00), ChipRevision(0x03), BiosVersion(07.27.01.00)
[    2.360464] mpt2sas0: Protocol=(Initiator), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
[    2.360563] mpt2sas0: sending port enable !!
[    4.895613] mpt2sas0: host_add: handle(0x0001), sas_addr(0x500605b00963d470), phys(8)
[   10.024028] mpt2sas0: port enable: SUCCESS

lspci shows that the card is being detected and identified:

$ lspci | grep LSI
08:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
$

However, and this is where it gets interesting, neither lsblk nor udevadm info --exportdb shows any of the new HDDs, insofar as I can tell. They are also (obviously, given udevadm) not showing up in any of the /dev/disk/by-* directories.

I tried running udevadm trigger just in case there was something iffy with the boot sequence ordering, but that did not change anything and did not add anything at all to the system log (i.e., the most recent portion of the output of dmesg was the same before and after running that command).

I am not inclined to believe that both of the brand new breakout cables are somehow broken.

Physically unplugging both of the breakout cables from the card (to remove the HDDs and cables from consideration in the case) did not make any discernable difference.

I followed these instructions to install the most recent version of MegaRAID Storage Manager on my system. (Basically, take the rpms, use alien --scripts to convert them to debs, and then dpkg --install the debs.) After that, with the drives plugged in and /etc/init.d/vivaldiframeworkd started, running /usr/local/MegaRAID Storage Manager/StorCLI/storcli64 show all prints the following:

Status Code = 0
Status = Success
Description = None

Number of Controllers = 0
Host Name = my-host
Operating System  = Linux3.2.0-4-amd64

At this point I am somewhat running out of ideas. If there's any other information I can provide that might help answering this, just let me know. I'm almost starting to think that this is somehow a motherboard issue after all.

With the ultimate goal of using them for a ZFS pool, what incantations, magic utterances, sacrifices or other relevant rituals do I need to perform for the drives connected to the 9211 to show up in Linux?

UPDATE: After physically switching places of the graphics card and the 9211, the 9211's BIOS now shows up on boot and I was able to enter the configuration utility. It still shows no disks attached (even in the SAS Topology view), however, despite disks very definitely being attached and cables firmly seated on both ends. (I have not, however, created any RAID array using the card's configuration utility.) What's more is that the card reports that it has been "disabled". At this point I'm almost willing to chalk down my initial problems to a crappy motherboard, and my current problems to IR vs IT firmware on the 9211 itself. I will try flashing the card to IT firmware later and see how that goes; I plan on using IT firmware anyway because of ZFS, so there's no harm to doing so that I can see.

user

Asked: 2015-05-24 04:28:05 +0800 CST

Are there any security benefits to deploying custom SSH DH groups to client-only systems?

17

One suggested mitigative strategy against Logjam-related attacks on SSH is to generate custom SSH Diffie-Hellman groups using something like (the below being for OpenSSH)

ssh-keygen -G moduli-2048.candidates -b 2048
ssh-keygen -T moduli-2048 -f moduli-2048.candidates

followed by replacing the system-wide moduli file with the output file moduli-2048. (ssh-keygen -G is used to generate candidate DH-GEX primes, and ssh-keygen -T to test the generated candidates for safety.)

This is pretty clearly a reasonable thing to do on SSH servers that otherwise would be using well-known groups that lend themselves well to precomputation, but are there any security benefits to deploying custom SSH DH groups onto client-only systems? (That is, systems that connect to SSH servers, but never act as an SSH server themselves.)

I am primarily interested in answers relating to OpenSSH on Linux, but more generic answers would be appreciated as well.

user

Asked: 2015-03-14 04:42:01 +0800 CST

Does the LSI 9211-8i add any data structures of its own when used in pure HBA (JBOD) mode?

1

I'm looking at adding some disks to one of my systems, for which I need to add an offboard HBA. Looking around, I came across the LSI MegaRAID SAS 9211-8i (LSI part number LSI00194) which looks quite interesting. LSI also seem to be popular in general, including right here.

Since I run ZFS, I plan on using the HBA itself as just a dumb controller, letting ZFS handle everything related to storage-level redundancy and recovery. The host OS is Linux (Debian/Linux to be precise).

While I doubt LSI will stop producing these cards any time soon, it would still be nice to know: does the 9211-8i add any data structures of its own to the disks when used in JBOD mode? In other words, can I unplug a disk from the 9211-8i, plug it into an alternate HBA, and everything "just keeps working"? Or will the OS then see some form of garbage on the disk that is not exposed through the LSI, which might interfere with ZFS' usage of the volume?

user

Asked: 2014-11-02 04:24:15 +0800 CST

Why is ZFS not doing anything with my disk's duff sector?

8

I was under the impression that if an I/O error occurs during a read from a ZFS pool, two things will happen:

The failure will be recorded in either the READ or CKSUM statistic of the relevant device, propagating upwards toward the pool level.
- Redundant data will be used to reconstruct the requested block, return the requested block to the caller and if the duff drive is still functional rewrite the block to it, OR
- An I/O error will be returned if redundant data is not available to correct for the read error.

It appears that one of the disks in my mirror setup has developed a bad sector. That by itself is not alarming; such things happen, and that's exactly why I have redundancy (a two-way mirror, to be exact). Every time I scrub the pool or read through the files in a particular directory (I haven't bothered yet to determine exactly which file is at fault), the following pops up in dmesg, obviously with varying timestamps:

Nov  1 09:54:26 yeono kernel: [302621.236549] ata6.00: exception Emask 0x0 SAct 0x9c10 SErr 0x0 action 0x0
Nov  1 09:54:26 yeono kernel: [302621.236557] ata6.00: irq_stat 0x40000008
Nov  1 09:54:26 yeono kernel: [302621.236566] ata6.00: failed command: READ FPDMA QUEUED
Nov  1 09:54:26 yeono kernel: [302621.236578] ata6.00: cmd 60/a8:78:18:5a:12/00:00:5c:01:00/40 tag 15 ncq 86016 in
Nov  1 09:54:26 yeono kernel: [302621.236580]          res 41/40:a8:18:5a:12/00:00:5c:01:00/00 Emask 0x409 (media error) <F>
Nov  1 09:54:26 yeono kernel: [302621.236585] ata6.00: status: { DRDY ERR }
Nov  1 09:54:26 yeono kernel: [302621.236589] ata6.00: error: { UNC }
Nov  1 09:54:26 yeono kernel: [302621.238214] ata6.00: configured for UDMA/133

This is a fairly up to date Debian Wheezy, kernel 3.2.0-4-amd64 #1 SMP Debian 3.2.63-2 x86_64, ZoL 0.6.3. Package versions are current at debian-zfs=7~wheezy, libzfs2=0.6.3-1~wheezy, zfs-dkms=0.6.3-1~wheezy, zfs-initramfs=0.6.3-1~wheezy, zfsutils=0.6.3-1~wheezy, zfsonlinux=3~wheezy, linux-image-amd64=3.2+46, linux-image-3.2.0-4-amd64=3.2.63-2. The only package pinning that I know of is for ZoL, for which I have (as provided by the zfsonlinux package):

Package: *
Pin: release o=archive.zfsonlinux.org
Pin-Priority: 1001

Running hdparm -R on the drive reports that Write-Read-Verify is turned on (this is a Seagate, so has that feature and I use it as an extra safety net; the additional write latency is not a problem since my interactive use pattern is very read-heavy):

/dev/disk/by-id/ata-ST4000NM0033-9ZM170_XXXXXXXX:
 write-read-verify =  2

Even given the clear indication that something is amiss, zpool status claims that there is no problem with the pool:

  pool: akita
 state: ONLINE
  scan: scrub repaired 0 in 8h16m with 0 errors on Sat Nov  1 10:46:03 2014
config:

        NAME                        STATE     READ WRITE CKSUM
        akita                       ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            wwn-0x5000c50065e8414a  ONLINE       0     0     0
            wwn-0x5000c500645b0fec  ONLINE       0     0     0

errors: No known data errors

This error has been showing up in the logs regularly for the last several days now (since Oct 27) so I'm not terribly inclined to write it off as merely a fluke. I run the disks with quite short SCTERC timeouts; 1.5 seconds read (to recover quickly from read errors), 10 seconds write. I have confirmed that these values are active on the drive in question.

smartd keeps pestering me (which in itself is a good thing!) about the fact that the ATA error count is climbing:

The following warning/error was logged by the smartd daemon:

Device: /dev/disk/by-id/ata-ST4000NM0033-9ZM170_XXXXXXXX [SAT], ATA error count increased from 4 to 5

For details see host's SYSLOG.

Running smartctl --attributes on the drive in question yields the following:

smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   076   063   044    Pre-fail  Always       -       48910012
  3 Spin_Up_Time            0x0003   091   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       97
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   092   060   030    Pre-fail  Always       -       1698336160
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       9887
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       98
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   095   095   000    Old_age   Always       -       5
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       10
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   058   052   045    Old_age   Always       -       42 (Min/Max 20/45)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       61
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       492
194 Temperature_Celsius     0x0022   042   048   000    Old_age   Always       -       42 (0 11 0 0)
195 Hardware_ECC_Recovered  0x001a   052   008   000    Old_age   Always       -       48910012
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

Nothing glaringly out of the ordinary there. Note that this is an enterprise drive, so five years warranty and rated for 24x7 operation (meaning it's meant to be reliable for over 40,000 hours of operation, compared to the just under 10,000 hours under its belt so far). Notice the number 5 in attribute 187 Reported_Uncorrect; that's where the problem is. Also note the fairly low Start_Stop_Count and Power_Cycle_Count values of under 100 each.

Not that I think it's relevant in this case, but yes, the system does have ECC RAM.

Non-default properties of the root file system on the pool are:

NAME   PROPERTY              VALUE                  SOURCE
akita  type                  filesystem             -
akita  creation              Thu Sep 12 18:03 2013  -
akita  used                  3,14T                  -
akita  available             434G                   -
akita  referenced            136K                   -
akita  compressratio         1.04x                  -
akita  mounted               no                     -
akita  mountpoint            none                   local
akita  version               5                      -
akita  utf8only              off                    -
akita  normalization         none                   -
akita  casesensitivity       sensitive              -
akita  usedbysnapshots       0                      -
akita  usedbydataset         136K                   -
akita  usedbychildren        3,14T                  -
akita  usedbyrefreservation  0                      -
akita  sync                  standard               local
akita  refcompressratio      1.00x                  -
akita  written               0                      -
akita  logicalused           2,32T                  -
akita  logicalreferenced     15K                    -

and correspondingly for the pool itself:

NAME   PROPERTY               VALUE                  SOURCE
akita  size                   3,62T                  -
akita  capacity               62%                    -
akita  health                 ONLINE                 -
akita  dedupratio             1.00x                  -
akita  free                   1,36T                  -
akita  allocated              2,27T                  -
akita  readonly               off                    -
akita  ashift                 12                     local
akita  expandsize             0                      -
akita  feature@async_destroy  enabled                local
akita  feature@empty_bpobj    active                 local
akita  feature@lz4_compress   active                 local

These lists were obtained by running {zfs,zpool} get all akita | grep -v default.

Now for the questions:

Why isn't ZFS reporting anything about the read problem? It's clearly recovering from it.
Why isn't ZFS automatically rewriting the duff sector that the drive is clearly having trouble reading, in turn hopefully triggering a relocation by the drive, given that sufficient redundancy exists for automatic repair in the read request path?

user

Asked: 2014-05-18 07:12:12 +0800 CST

Why did rebooting cause one side of my ZFS mirror to become UNAVAIL?

14

I just recently migrated a bulk data storage pool (ZFS On Linux 0.6.2, Debian Wheezy) from a single-device vdev configuration to a two-way mirror vdev configuration.

The previous pool configuration was:

    NAME                     STATE     READ WRITE CKSUM
    akita                    ONLINE       0     0     0
      ST4000NM0033-Z1Z1A0LQ  ONLINE       0     0     0

Everything was fine after the resilver completed (I initiated a scrub after the resilver completed, just to have the system go over everything once again and make sure it was all good):

  pool: akita
 state: ONLINE
  scan: scrub repaired 0 in 6h26m with 0 errors on Sat May 17 06:16:06 2014
config:

        NAME                       STATE     READ WRITE CKSUM
        akita                      ONLINE       0     0     0
          mirror-0                 ONLINE       0     0     0
            ST4000NM0033-Z1Z1A0LQ  ONLINE       0     0     0
            ST4000NM0033-Z1Z333ZA  ONLINE       0     0     0

errors: No known data errors

However, after rebooting I got an email notifying me of the fact that the pool was not fine and dandy. I had a look and this is what I saw:

   pool: akita
  state: DEGRADED
 status: One or more devices could not be used because the label is missing or
         invalid.  Sufficient replicas exist for the pool to continue
         functioning in a degraded state.
 action: Replace the device using 'zpool replace'.
    see: http://zfsonlinux.org/msg/ZFS-8000-4J
   scan: scrub in progress since Sat May 17 14:20:15 2014
     316G scanned out of 1,80T at 77,5M/s, 5h36m to go
     0 repaired, 17,17% done
 config:

         NAME                       STATE     READ WRITE CKSUM
         akita                      DEGRADED     0     0     0
           mirror-0                 DEGRADED     0     0     0
             ST4000NM0033-Z1Z1A0LQ  ONLINE       0     0     0
             ST4000NM0033-Z1Z333ZA  UNAVAIL      0     0     0

 errors: No known data errors

The scrub is expected; there is a cron job setup to initiate a full system scrub on reboot. However, I definitely wasn't expecting the new HDD to fall out of the mirror.

I define aliases that map to the /dev/disk/by-id/wwn-* names, and in case of both these disks have given ZFS free reign to use the full disk, including handling partitioning:

# zpool history akita | grep ST4000NM0033
2013-09-12.18:03:06 zpool create -f -o ashift=12 -o autoreplace=off -m none akita ST4000NM0033-Z1Z1A0LQ
2014-05-15.15:30:59 zpool attach -o ashift=12 -f akita ST4000NM0033-Z1Z1A0LQ ST4000NM0033-Z1Z333ZA
#

These are the relevant lines from /etc/zfs/vdev_id.conf (I do notice now that the Z1Z333ZA uses a tab character for separation whereas the Z1Z1A0LQ line uses only spaces, but I honestly don't see how that could be relevant here):

alias ST4000NM0033-Z1Z1A0LQ             /dev/disk/by-id/wwn-0x5000c500645b0fec
alias ST4000NM0033-Z1Z333ZA     /dev/disk/by-id/wwn-0x5000c50065e8414a

When I looked, /dev/disk/by-id/wwn-0x5000c50065e8414a* were there as expected, but /dev/disk/by-vdev/ST4000NM0033-Z1Z333ZA* were not.

Issuing sudo udevadm trigger caused the symlinks to show up in /dev/disk/by-vdev. However, ZFS doesn't seem to just realize that they are there (Z1Z333ZA still shows as UNAVAIL). That much I suppose can be expected.

I tried replacing the relevant device, but had no real luck:

# zpool replace akita ST4000NM0033-Z1Z333ZA
invalid vdev specification
use '-f' to override the following errors:
/dev/disk/by-vdev/ST4000NM0033-Z1Z333ZA-part1 is part of active pool 'akita'
#

Both disks are detected during the boot process (dmesg log output showing the relevant drives):

[    2.936065] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    2.936137] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    2.937446] ata4.00: ATA-9: ST4000NM0033-9ZM170, SN03, max UDMA/133
[    2.937453] ata4.00: 7814037168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[    2.938516] ata4.00: configured for UDMA/133
[    2.992080] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    3.104533] ata6.00: ATA-9: ST4000NM0033-9ZM170, SN03, max UDMA/133
[    3.104540] ata6.00: 7814037168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[    3.105584] ata6.00: configured for UDMA/133
[    3.105792] scsi 5:0:0:0: Direct-Access     ATA      ST4000NM0033-9ZM SN03 PQ: 0 ANSI: 5
[    3.121245] sd 3:0:0:0: [sdb] 7814037168 512-byte logical blocks: (4.00 TB/3.63 TiB)
[    3.121372] sd 3:0:0:0: [sdb] Write Protect is off
[    3.121379] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[    3.121426] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    3.122070] sd 5:0:0:0: [sdc] 7814037168 512-byte logical blocks: (4.00 TB/3.63 TiB)
[    3.122176] sd 5:0:0:0: [sdc] Write Protect is off
[    3.122183] sd 5:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[    3.122235] sd 5:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Both drives are connected directly to the motherboard; there is no off-board controller involved.

On impulse, I did:

# zpool online akita ST4000NM0033-Z1Z333ZA

which appears to have worked; Z1Z333ZA is now at least ONLINE and resilvering. At about an hour into the resilver it's scanned 180G and resilvered 24G with 9.77% done, which points to it not doing a full resilver but rather only transferring the dataset delta.

I'm honestly not sure if this issue is related to ZFS On Linux or to udev (it smells a bit like udev, but then why would one drive be detected just fine but not the other), but my question is how do I make sure the same thing doesn't happen again on the next reboot?

I'll be happy to provide more data on the setup if necessary; just let me know what's needed.

user

Asked: 2013-11-16 15:15:03 +0800 CST

How to add rules in local fail2ban filter definition?

10

I have installed fail2ban as packaged by Debian on a server under my control. Since I have some failregexes from before, I'm putting those into the local filter definition file so they will be considered as well. Hence, I end up with e.g. /etc/fail2ban/filter.d/sshd.conf and /etc/fail2ban/filter.d/sshd.local. This is the way it is recommended to be set up and it appears to be working just fine for what it is.

However, in the .local file, I'm actually replacing the whole list of failregexes from the .conf file. The documentation doesn't seem to indicate there is any other way of doing it, and to get it to work, I've simply copied the distribution-supplied .conf file to a .local file and made some additions.

It would be really nice if I can simply amend the list, benefiting from the work of the upstream and Debian maintainers in staying abreast of changes to the distribution-maintained log entry filter regexes.

The only real workaround I can think of is to actually create two jails, one using the distribution-provided configuration and one using my own. This would appear to have the (fairly significant) downside that they are treated as independent jails (which you'd expect with such a setup).

Surely I can't be the only one wanting to just add a few failregexes of my own to an already existing collection, with a minimum of maintenance hassle.

Is it possible to amend the lists of failregex and ignoreregex within a fail2ban filter definition through a site-local or host-local file, without making any changes to the corresponding global or distribution-supplied file? If it is, then how to do it?

user

Asked: 2013-09-15 11:43:11 +0800 CST

How to run a command once a ZFS scrub completes?

12

I would like to use cron to schedule periodic scrubs of my ZFS pool, and at some reasonably short time after the scrub finishes, email a status report to myself. The purpose of this is to catch any problems without having to manually look for them (push rather than pull).

The first part is easy: just set up a cron job to run zpool scrub $POOL as root at whatever interval is reasonable in my particular situation.

The second part, I'm not quite so sure how to do. zpool scrub returns immediately and then the scrub is run in the background by the system (which is certainly desirable behavior if the scrub is initiated by an administrator from a terminal). zpool status gives me a status report and exits (with exit code 0 while the scrub is running; it hasn't finished yet so I don't know if the exit status changes once it's done, but I doubt it). The only parameter documented for zpool scrub is -s for "stop scrubbing".

The main problem is detecting the change of status from scrubbing to finished scrubbing. Given that, the rest should fall into place.

Ideally, I'd want to tell zpool scrub to not return until the scrub finishes, but I don't see any way to make it do that. (It would make it almost too easy to simply cron zpool scrub --wait-until-done $POOL; zpool status $POOL.)

Failing that, I'd like to ask the system whether a scrub is currently in progress, preferably in a way that doesn't too much risk breaking with an upgrade or configuration change, so that I can act on whether or not a previously running scrub has finished (by executing a zpool status when the scrub status goes from scrubbing to not scrubbing).

This particular setup is for a workstation system, so while a monitoring tool such as Nagios probably has add-ins that would solve the problem, it feels rather overkill to install such a tool for just this one task. Can someone suggest a lower-tech solution to the problem?

user

Asked: 2013-08-21 01:57:48 +0800 CST

What are the performance implications of running VMs on a ZFS host?

11

I'm considering migrating from ext3 to ZFS for data storage on my Debian Linux host, using ZFS on Linux. One killer feature of ZFS that I really want is its data integrity guarantees. The ability to trivially grow storage as my storage needs increase is also something I'd look forward to.

However, I also run a few VMs on the same host. (Though normally, in my case only one VM is running on the host at any one time.)

Considering ZFS's data checksumming and copy-on-write behavior, together with the fact that the VM disk images are comparatively huge files (my main VM's disk image file currently sits at 31 GB), what are the performance implications inside the VM guest of such a migration? What steps can I take to reduce the possible negative performance impact?

I can live with less data integrity guarantees on the VM disk images if necessary (I don't do anything really critical inside any of the VMs) and can easily separate them from the rest of the filesystem, but it would be nice if I don't have to (even selectively) turn off pretty much the feature that most makes me want to migrate to a different file system.

The hardware is pretty beefy for a workstation-class system, but won't hold much of a candle to a high-end server (32 GB RAM with rarely >10 GB in use, 6-core 3.3 GHz CPU, currently 2.6 TB usable disk space according to df and a total of about 1.1 TB free; migrating to ZFS will likely add some more free space) and I'm not planning on running data deduplication (as turning on dedup just wouldn't add much in my situation). The plan is to start with a JBOD configuration (obviously with good backups) but I may move to a two-way mirror setup eventually if conditions warrant.

user

Asked: 2012-01-20 02:46:03 +0800 CST

Make Tomcat 7.0 flush logs to disk?

3

How do I get Apache Tomcat 7.0, running as a service on Windows Server 2008, to flush its logs to disk without restarting the service daemon?

I did find delay on tomcat writing to logs in windows server 2008 but logging.properties contains no bufferSize directive at all.

What to do in response to repeat DRAM ECC error notifications for the same memory location?

systemd unit doesn't start on boot on Debian 9, but starts fine when started manually after boot and on boot on Debian 8

With IPv6, should we be assigning distinct IP addresses to each host name served over HTTP(S)?

Do any OpenSSH 6.7 `preauth` error log entries warrant specific human attention?

Does the max-80%-use target suggested for ZFS for performance reasons apply to SSD-backed pools?

In OpenSSH DEBUG1 output on connecting, what does the number in the `Server accepts key` line refer to?

Why is Debian Jessie systemd reloading my Apache server every morning?

Just installed LSI 9211; no drives showing up to Linux

Are there any security benefits to deploying custom SSH DH groups to client-only systems?

Does the LSI 9211-8i add any data structures of its own when used in pure HBA (JBOD) mode?

Why is ZFS not doing anything with my disk's duff sector?

Why did rebooting cause one side of my ZFS mirror to become UNAVAIL?

How to add rules in local fail2ban filter definition?

How to run a command once a ZFS scrub completes?

What are the performance implications of running VMs on a ZFS host?

Make Tomcat 7.0 flush logs to disk?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?