Mikhail T.'s questions -server

Mikhail T.

Asked: 2021-04-22 07:23:49 +0800 CST

Systemd and Disaster-Recovery Stand-By systems

2

We're using systemd to run various services in production. (Duh...)

We're building out a matching "disaster-recovery" site, which will have the same application installed -- with the same systemd-units to bring up its various components in case of a disaster.

This DR-environment is "hot", ready to take over within a short time (the shorter the better) -- thus becoming production itself. Then, when the "disaster" is resolved, the other environment is to become the DR.

My question is, how to keep those systemd-services ready-to-start, but not actually starting until a certain condition becomes true?

In order to conclude, that a particular site is currently the primary (production), a command (amIthePrimary) needs to run and exit with a 0 exit-code. The check is easy and fast -- and can be performed as often as once a minute. But, because it requires running a command, there is no Condition for it provided by systemd.

Do I put that command into every unit's ExecPre, or will that become a noisy error, needlessly annoying the administrators? Do I put it into a unit of its own, with all other services Require-ing it?

Separately, once the condition is true -- and the services start -- how do I continue checking it, so they will all shut down, should it become false again?

Mikhail T.

Asked: 2020-01-05 16:19:26 +0800 CST

Recovering a concatenated disk (ccd)

2

I need to recover data from an old filesystem, which was located on two small (by today's standards) drives -- using a concatenated disk driver (known as ccd in FreeBSD).

I have the drives online (no disk errors) and have already dumped their images.

My problem is, I do not remember the interleave factor I used 10 years ago... How would I recover that? The scan_ffs finds some old superblocks on each image, but those aren't usable until I properly interleave them together again.

Are there any tell-tale signs of ccd, that would help me detect the interleave value I used?

Update: Using some tools I was able "recover" some files from one of the drives -- including MPEG-files, which, predictably, have regular holes in them -- ffplay is doing its best to play them, but complains about lots of errors and the streams constantly stutter.

I think, by measuring the sizes of the holes, I'll be able to figure out the interleave. Can they be measured, though?

Mikhail T.

Asked: 2019-04-08 14:17:54 +0800 CST

Can Jenkins utilize the user's Kerberos ticket?

3

I'm setting up a new Jenkins server. It will authenticate users against the corporate AD. Most of the tasks we have in mind require logging-in to other hosts (via ssh).

Can Jenkins be configured to, upon a user's login:

Obtain a Kerberos ticket (kinit).
Make that ticket available (as file, location set by an environment variable) to any Jenkins job run by that user -- so that access to the other hosts can still be controlled via .k5users/.k5login.

What add-ons/plugins should I look at?

Mikhail T.

Asked: 2018-08-04 06:37:50 +0800 CST

How should I use the new Optum drive?

2

I have a zpool consisting of 7 2TB HDDs of different vintage in a raidz setup. Currently, there is neither ZIL nor L2ARC device configured. The server has 12GB of RAM and no swap.

The different filesystems on the pool include /var/spool/imap and /var/db/pgsql. The users aren't many -- just family-members, but sometimes the usage can be heavy, such as when the anti-spam database is retrained (reading the "spam" folders under Cyrus-IMAP and feeding the PostgreSQL DB), or when a free-text search runs through all of the IMAP-messages.

I got a good deal on a new Intel Quantum 32GB device, and am wondering, how to best use it. One obvious thing is to add a Separate Intent Log (SLOG) device. But 32GB seems too much for an infrequently-used pool of 12TB.

The general opinion seems to be, I don't have enough RAM for a meaningful L2ARC. The current ARC-stats are:

ARC: 1680M Total, 441M MFU, 1113M MRU, 32K Anon, 31M Header, 95M Other
     1125M Compressed, 1942M Uncompressed, 1.73:1 Ratio

Should I split my new device into a smaller ZIL (4GB?), and use the rest for, say, Cyrus' indices and ccache?

Mikhail T.

Asked: 2017-10-31 15:20:32 +0800 CST

Can a service-check be limited to hosts with a custom attribute defined?

2

For those of our hosts, which have a public IP-address, we define a custom attribute in the host-definition: _PADDR:

define host {
        ...
        address 10....
        _paddr  53....
}

Can we then -- without creating specific groups or other entries -- limit a service check only to those of the hosts, which have the custom attribute defined?

Using Icinga-1.13.3.

Mikhail T.

Asked: 2017-10-19 09:56:08 +0800 CST

How to access inventory_hostname from a filter_plugin?

3

My custom filter plugin uses ansible.utils.display to inject a warning in Ansible's output in some circumstances:

    if 'hold' in mydetails:
        display.warning('%s can be upgraded from %s to %s, but is on hold' %
                        (myp, mydetails['version'], latest))
        continue

This works and the warning is displayed nicely, but lacks the hostname -- which could get rather confusing, when one's inventory is large. I could insert the hostname there myself, but I want to use exactly the string already known as inventory_hostname. How would I access that variable?

Mikhail T.

Asked: 2017-10-09 02:23:57 +0800 CST

How to generate entries for different mechanisms with saslpasswd2?

2

I'm struggling to create the satisfactory set of records in sasldb2.db. If I use the regular

saslpasswd2 -c user

I get exactly one record, according to sasldblistusers2:

user@my.example.com:    userPassword

whereas this page leads me to believe, there ought to be a line for each mechanism (DIGEST-MD5, CRAM-MD5, and so on).

If I add -n to avoid storing the plain-text (I only really need the CRAM-MD5):

saslpasswd2 -n -c user

then sasldblistusers2 finds no records to list at all. My saslpasswd.conf consists of two lines:

mech_list:      cram-md5 digest-md5 ntlm plain
log_level:      9

I tried this on FreeBSD using cyrus-sasl-2.1.26_12 and Ubuntu with 2.1.25... What am I doing wrong?

I need CRAM-MD5 because, without further reconfiguring, my sendmail only lists that and DIGEST-MD5 as the acceptable AUTH-mechanisms. And the iPhones, apparently, do not support DIGEST-MD5. And I'm only doing all of this for the sake of a couple of iPhones -- the normal computers already authenticate themselves with the client SSL-certificates issued by my own authority.

Ok, apparently, the CRAM-MD5 authentication has been succeeding all along -- despite not being listed by sasldblistusers2. I created a new question -- why does sendmail refuse relaying despite authentication's success.

Mikhail T.

Asked: 2017-09-09 09:39:12 +0800 CST

How to trigger a service-check by a change in a host-status?

2

We have an array of servers, any of which could go down generating a medium-priority notification:

define host {
        host_name       foo1
        contacts        medium-priority
        use     default-host
}
...

However, we'd like a higher-priority notification whenever more than two such servers are in trouble. To that end, we've set up a separate service-definition using Nagios'/Icinga's check_cluster-utility:

define service {
        service_description     foo-cluster
        servicegroups   cluster-checks
        display_name    Foo Cluster
        check_command   check_cluster_host!Foo Cluster!0!3!$HOSTSTATEID:foo1$,$HOSTSTATEID:foo2,...$HOSTSTATEID:fooN$
        contacts        high-priority
        hostgroup_name  clusters
        notes   Check, that no more than 2 hosts in group foo are in trouble
        use     default-service
}

The above will probably work, but I'd like for this service-check to be triggered not by time, but only by a change in the status of any of the "underlying" hosts...

We generate Icinga's config-files with Ansible and so can construct complex dependencies programmatically -- but can such triggering be implemented at all?

Mikhail T.

Asked: 2017-09-08 09:01:25 +0800 CST

Why would I need/want idomod, ido2db, etc.?

2

I'm reworking an Icinga setup I inherited. One of the things it perpetually complains about is ido2db not running (and idomod unable to connect to it).

Before figuring out, how to configure/fix it, I'm trying to find out, if we even need it in the first place. Unfortunately, all of the documentation I am able to find online talks about how to configure the functionality, not why do (or not do) it...

In particular, here is the most recent error on the subject in the log:

[1504809535] idomod: Still unable to connect to data sink.  83915 items lost, 5000 queued items to flush. Is ido2db running and processing data?

I'm guessing, the "items" are the check-results, etc. What feature(s) are we not benefiting from because of the cited losses?

Mikhail T.

Asked: 2017-08-30 06:52:35 +0800 CST

How to trigger a custom error from inside a Jinja template?

5

Though Ansible itself has a way for triggering a custom error, I can not find anything similar for Jinja.

My current method uses a syntax error:

{%  if 'ansible_mounts' in hostvars[host] %}
# {{ host }} knows its mount-points
{% else %}
# {% error!! No ansible_mounts listed for host - fact-gathering must've failed %}
{% endif %}

but those are rendered poorly at run-time -- one needs to look inside the template-file and search for the error (the rendering does not even include the line-number!).

Is there a way to output a neat failure message from inside Jinja-template?

Mikhail T.

Asked: 2017-08-08 12:45:49 +0800 CST

Can sendmail forward e-mail immediately instead of queueing?

3

Some of the e-mail passing through my server is forwarded to external accounts.

Unfortunately, my upstream SMTP-server is very picky about spam -- and rejects some of the legitimate messages as such. When this happens to the forwarded mail, I get the bounces (as the postmaster) -- not the originators.

I understand, that this is because sendmail queues the messages locally, disconnects from the relay, and only then proceeds to forward them further. If the further forwarding breaks for any reason -- such as because the next relay misidentifies the message as spam -- my sendmail is left to hold the pieces.

Can things be configured so that the forwarding begins immediately instead (as soon as the forwarding destination is determined)? The status -- success or failure -- can then be communicated directly to the previous relay still on the line...

If sendmail can not do it, can any other MTAs? Thanks!

Mikhail T.

Asked: 2017-07-27 19:42:37 +0800 CST

How to make sendmail respect +-notation for virtual users?

3

I have different forwarding needs for different domains, which all point to my mail-server:

user1@example.com   foo@example.org
user2@example.com   bar@example.net
@example.com        mylocalaccount

This all works... However, some of these users wish to use the +-notation to give different vendors different addresses, such as user1+vendor@example.com. And this part is not working -- all such e-mails end up delivered to the catchall mylocalaccount instead of being forwarded properly.

How do I make user+foo@example.com be forwarded to the same destination as user@example.com?

I tried adding entries like

user1+*@example.com    foo+%2@example.org

but that didn't fix the problem...

Here are the debug-traces:

Without the detail:

% sendmail -d60.5 -bv g@example.com
map_lookup(dequote, me, %0=me) => NOT FOUND (0)
map_lookup(dequote, g, %0=g) => NOT FOUND (0)
map_lookup(virtuser, g@example.com, %0=g@example.com, %1=g) => gexample@example.net (0)

... works.

With the detail:

% sendmail -d60.5 -bv g+meow@example.com
map_lookup(dequote, me, %0=me) => NOT FOUND (0)
map_lookup(dequote, g+meow, %0=g+meow) => NOT FOUND (0)
map_lookup(virtuser, g+meow@example.com, %0=g+meow@example.com, %1=g+meow) => NOT FOUND (0)
map_lookup(virtuser, @example.com, %0=@example.com, %1=g+meow) => me (0)
map_lookup(dequote, me, %0=me) => NOT FOUND (0)
map_lookup(user, me, %0=me) => me<> (0)
g+meow@example.com... deliverable: mailer local, user me

... does not work -- comes to the catch-all local account "me".

Mikhail T.

Asked: 2017-02-02 10:00:42 +0800 CST

How to tell Apache to reply with 403 instead of 401?

2

We have some rules for a subtree of Locations, which involve Require-ing ldap-group and expr-s.

The user is duly challenged to supply login-credentials, which are verified.

However, even when the credentials are correct and the access is denied due to other reasons (such as belonging to a wrong group or coming from an incorrect IP-address), the server's response is always 401 -- instead of 403.

As a result, the browsers keep prompting users to "try again"... Can I tell Apache (2.4) to use 403, if the information supplied in the Authorization-header checks-out, and it is some other rule, that rejects the request?

Again, I know, why, after the authentication succeeds, the authorization is denied for some of the users -- it is supposed to. I just need to communicate to such users, that: "Yes, we believe you are who you say you are, but you aren't allowed to access this location."

It appears, mod_rewrite is the only method to induce a 403-response -- can a mod_rewrite expression check membership of an LDAP-group or forcibly change the status from 401 to 403?

I asked this question on the WebMaster's site, but got no answers -- folks there seem more content-oriented.

Here is my the relevant snippet of my current config:

<Location /foo>
         Require ldap-group CN=foo,OU=Groups,DC=example,DC=net
</Location>

When the supplied username/password are verified, but the requirement is not satisfied, I need to return a 403... 401 is being returned currently.

Mikhail T.

Asked: 2015-01-29 12:44:44 +0800 CST

How to reliably list ephemeral storage of an AWS instance?

3

How can I list the actual block-device name(s) of ephemeral storage available in my EC2 instance?

After some trials and errors, it appears that such devices are connected as /dev/xvdn (and /dev/xvdm if there are two) -- is there some way to reliably list them from inside the instance?

fdisk -l lists all devices -- without anything obviously distinctive about /dev/xvdn. Same goes for output of lsblk. (We aren't using Amazon's own AMI-instances, so there is no -p flag for lsblk...)

Request for http://169.254.169.254/latest/meta-data/block-device-mapping/ephemeral0 returns sdj, but there is no /dev/sdj, so that seems useless... Is there anything better?

The minor device-number seems to be 208 -- can one rely on that?

Mikhail T.

Asked: 2014-11-25 16:02:47 +0800 CST

Why would Puppet revoke a client's certificate?

3

We started getting an error from one of the Puppet-agents:

Could not send report: SSL_connect returned=1 errno=0 state=SSLv3 read finished A: sslv3 alert certificate revoked

Indeed, according to puppet cert list $h on the server, the certificate was revoked. I cleaned it on the master, deleted the /var/lib/puppet/ssl on the client and all was fine.

I then ran puppet cert list --all | grep revoked -- and found over 20 other clients listed as "revoked" too. Spot checking the list I found, that puppet-agent did not have a problem on any of these others.

My questions:

What would cause Puppet to "revoke" a particular client's certificate? It certainly was not done by a human administrator...
Why would such revokations not break things for most clients -- but only for some?

Using puppet-2.7.25 on the clients (RHEL6) and 2.7.18 on the server (RHEL5). Thanks!

Mikhail T.

Asked: 2014-10-29 12:03:02 +0800 CST

Why did git stop working after server disabled SSLv3?

5

Like most others, our repository server needs to disable SSLv3 (and v2) ASAP.

However, doing so seems to break our git-clients -- at least, on RHEL5 (connections from my FreeBSD desktop work fine). Even the most recent git (2.1.2) fails, and upgrading OpenSSL libraries to the latest from the vendor did not help.

However! The same git-client works just fine against github.com -- and github.com already has SSLv3 disabled too. By trial and error, I set our server's (Apache) SSL-configuration to match that of github:

SSLProtocol     ALL -SSLv2 -SSLv3
SSLHonorCipherOrder On
SSLCipherSuite  "AES128-SHA AES256-SHA RC4-SHA"

By running sslscan against our server and github, I get the identical list of ciphers accepted and rejected. But git continues to fail:

    % git clone https://git.example.net/git/puppet-hiera
    Cloning into 'puppet-hiera'...
    * Couldn't find host git.example.net in the .netrc file, using defaults
    * About to connect() to git.example.net port 443
    *   Trying 10.89.8.27... * connected
    * Connected to git.example.net (10.89.8.27) port 443
    * successfully set certificate verify locations:
    *   CAfile: /etc/pki/tls/certs/ca-bundle.crt
      CApath: none
    * Unknown SSL protocol error in connection to git.example.net:443
    * Closing connection #0
    fatal: unable to access 'https://git.example.net/git/puppet-hiera/': Unknown SSL protocol error in connection to git.example.net:443

Now, the only perceptible difference remaining between our server's SSL and GitHub's is that sslscan is able to output details of GitHub's certificate, but fails to obtain those from our server.

When I connect to our git-server from my FreeBSD desktop, the same git clone command works. Instead of failing, after outputting CApath: none, I see:

      CApath: none
    * SSL connection using AES128-SHA
    * Server certificate:
             subject: C=US; postalCode= ............

and the cloning succeeds. How do I configure our server so that git works with it even from the old RHEL5-systems -- as it does against GitHub-servers?

Update: trying to access our server with simply curl, I got a similar error over SSL-compatibility. However, I was able to overcome it by invoking curl with an explicit --tlsv1 option (also known as -1). So, the software on RHEL5 systems is capable of the necessary protocols and ciphers -- how do I make it use them by default instead of trying the old ones and failing?

Mikhail T.

Asked: 2014-09-09 15:25:09 +0800 CST

Why is GlusterFS so slow here?

5

We've set up a mirroring pair of GlusterFS servers. No special tuning, whatever came "out of the box" with GlusterFS-3.5.1 in the official RHEL6 RPM, that's what we have.

The cluster works, but the performance is pretty awful. For example, extracting a large tarball (firefox-31.0.source.tar.bz2) via GlusterFS on localhost takes a whopping 44 minutes here. Extracting the same file directly -- on the same disk -- takes less than 2. There is a similar disparity in removing the created trees (takes 10 minutes via gluster)...

Of course, it is to be expected, that the mirroring needs to take place, etcaetera, a network-using filesystem will be slower -- but 30 times slower? Simply copying the large file over is fast -- so it is not the bandwidth we are lacking. While the untar-ing is running, I see both the glusterfs (client) and the glusterfsd (server) processes consuming a lot of CPU (about 10% each), but the system remains about 70% idle -- both gluster-processes are a lot busier than the extracting bzip2 and tar are... What are they doing?

Is there some tuning I can do to dramatically improve performance here? Or should I try ceph (or gfarm?) instead of gluster? Or are they all terrible with a large number of small files? Thank you!

Mikhail T.

Asked: 2014-09-06 05:11:01 +0800 CST

How to monitor GlusterFS clients?

4

We are doing Ok (we'd like to think) monitoring our GlusterFS servers via Icinga. We'd like to monitor the clients too.

Other than making sure, there is a glusterfs process running for each glusterfs-entry in /etc/fstab, what else can be done? We'd like to avoid superficial reads/writes on the mounted volumes, if possible -- can the health of a mount be monitored without adding additional loads, however small?

Any other thoughts? Thanks!

Mikhail T.

Asked: 2014-07-29 13:20:57 +0800 CST

How can I require an array of resources in puppet?

10

In my Puppet-manifest I need to exec a command, but only after an array-driven collection of another exec finished. Like this (pseudo-code):

  define foo() {
    exec { "touch $name": }
  }

....
  {
    $bars = [ "a", "b", "c" ]
    foo { $bars : }
    exec { "echo Done" :
        require => [ Foo["a"], Foo["b"], Foo["c"] ]
    }
  }

How do I implement the same dependency as given above without repeating each element of list $bars by hand?

Mikhail T.

Asked: 2013-08-14 10:17:14 +0800 CST

What is the Mean Time to Failure (MTTF) of a RAID5?

3

Given the MTTF T of an individual drive (say, 100000 hours) and the average time r it takes the operator to replace a failed drive and the array-controller to rebuild the array (say, 10 hours), how long will it take, on average, for a second drive to fail while the earlier failure is still being replaced thus dooming the entire N-drive RAID5?

In my own calculations I keep coming up with results of many centuries -- even for large values of N and r, which means, using "hot spares" to reduce the recovery time is a waste... Yet, so many people choose to dedicate a slot in a RAID-enclosure to hot spare (instead of increasing capacity), it baffles me...

Systemd and Disaster-Recovery Stand-By systems

Recovering a concatenated disk (ccd)

Can Jenkins utilize the user's Kerberos ticket?

How should I use the new Optum drive?

Can a service-check be limited to hosts with a custom attribute defined?

How to access inventory_hostname from a filter_plugin?

How to generate entries for different mechanisms with saslpasswd2?

How to trigger a service-check by a change in a host-status?

Why would I need/want idomod, ido2db, etc.?

How to trigger a custom error from inside a Jinja template?

Can sendmail forward e-mail immediately instead of queueing?

How to make sendmail respect +-notation for virtual users?

Without the detail:

With the detail:

How to tell Apache to reply with 403 instead of 401?

How to reliably list ephemeral storage of an AWS instance?

Why would Puppet revoke a client's certificate?

Why did git stop working after server disabled SSLv3?

Why is GlusterFS so slow here?

How to monitor GlusterFS clients?

How can I require an array of resources in puppet?

What is the Mean Time to Failure (MTTF) of a RAID5?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?