Thing that confuses me about configuration management tools such as Puppet is that they're great and straightforward for setting up server from scratch, but what every documentation neglects is the case where I have existing data (e.g. PostgreSQL database) that I want to use on the newly provisioned server.
The perfect example is migration between hosting providers. Sure Puppet will set up the new server with (almost) same configuration as the previous one, but the setup code (Puppet manifests) assumes that this new database server is at its genesis whereas there are precious business data that needs to be restored.
Setting up infrastructure for the first time is surely a task worth automating, but accidents happen and when the need for restoring the server arises, I'm left with semi-manual setup:
- Puppet: Setup the server up to the point before data directories are initialized and services are started
- Me at shell: Restore data from backups to proper places
- Puppet: Continue with the rest, i.e. start the services
This cannot be that uncommon. Setting up servers for the first time with no existing data is one time event but recovering them can happen many times, and I execpt it will happen at least once (that's why we backup precious data).
What's the common practice in writing Puppet manifests to address the scenario that the node can be setup for second time and needs to restore data?
How do you deal with migration of servers managed by Puppet?
Googling this is hopeless. You mostly encounter pages describing how to restore Puppet DB/server but that piece of infrastructure mostly is not what your business generates profits from, right?
I realize that Puppet server has a fileserver, but keeping the backup data available there and setting proper access control feels very tedious to maintain compared to managing configuration with Puppet manifests.
I assume enterprises use full network backup solutions like Bacula or Amanda, or perhaps custom backup solutions based on Borg, Restic, Burp, Duplicati or plain rsync. Do you integrate these into your Puppet manifests? How?
If you do this for fully automatic disaster recovery purposes, it might be worthwhile, but otherwise you have to think about if you actually save anything by investing the significant amount of time you need to reliably (!) automate what should be a very rare one-off operation (if you have to do this all the time, something's very wrong in your environment). Having Puppet doesn't mean you have to automate everything at all costs ...
That said: If you need it, it's not difficult: Just have a backup solution that supports doing this from Puppet - this support can be just running a script that does what you would be doing manually otherwise inside a Puppet run (make sure to run it only once...).
Using Configuration Management (CM) tooling such as Puppet, Ansible, Chef or Salt needs a fundamental change in the way you think about configuration.
If you think about configuration as a one-time event, you will never see (or realise) any benefit from using CM.
The main power of CM is the ability to express your desired configuration as code. Once you have that ability, your server configuration is:
Testable. You can write unit tests to make sure you're not introducing any bugs, all without restarting a service or provisioning an entire test/staging environment. This saves a lot of time and money.
Reproducible. You can create one server, two servers, ten servers or a hundred servers using the same base configuration, with no overhead. You write the code once, and apply it many times. If you have a standard set of configuration options (for example, SSH daemon settings, hardening steps) CM can automate those away.
Versionable. Rolling out a change is as simple as making a Git commit. Rolling back a change is as simple as
git revert
. You have a verifiable, audit-able data source (your Git history). You can branch your CM codebase, make changes, get peer review and merge those changes into yourmaster
branch.