In January 2012 Randal Schwartz gave a talk called Introduction to Git that I very much enjoyed, but I was left confused about his admonition not to use git to track individual files with unrelated or separate history.
Page 4 of his slides contain the following bullet points. . .
But not for...
- Tracking file permissions and ownership
- Tracking individual files with separate history
- Making things painful
. . . and here is what he says, around 4 minutes in:
It is not optimized for any kind of metadata about the files. It does not, it does not, track file permissions or file ownership. That is not its job. It's managing source files, and sources don't have owners, sources don't have permissions. Sources are used in recipes to build the real thing that you are going to deploy. So git does not have technology about that. People have tried to build structures on top of it to do that, with some degree of success, but really let's look at basically what git is meant for and not what people are building on top of it.
It's also not meant for tracking individual files with unrelated or separate history. For example, you may think, "Oh, I really like this. I want to track all my et cetera with it. But et cetera is really a bunch of separate, unrelated files. You wouldn't be doing branching and merging. . . add this change to that change and eventually want to back out both of them at the same time. It doesn't really work that way. So it's not good for individual files.
I still use RCS to track individual files in my /etc. It's really fast, it's cheap, and I can get back to the data I need to. It's also not optimized for making things painful. Ok? It's designed to be easy and stuff.
Personally, I have no plans to track my /etc with git, but I recently started using git to track /var/named (BIND configs). Given what Randal has to say above, should I stop using git for this purpose? Is there a downside, a problem I'm not anticipating, a gotcha? So far everything is working just as I expect, and I've had no problems, but I actually hesitated to start using git to track /var/named because of the warning above.
I'm influenced by the scm_track_enabled feature of Cobbler which says "enables a trigger which version controls all changes to /var/lib/cobbler when add, edit, or sync events are performed. This can be used to revert to previous database versions, generate RSS feeds, or for other auditing or backup purposes. git is the recommend SCM for use with this feature." My use of git with /var/named is very similar to this.
(As a side note, I've never used RCS and can't imagine bothering to learn it, even if it is easy, when I already have a decent understanding of git.)
Specifically, I'm wondering about using git to track /var/named, but I welcome answers about best practices for using git to track directories like it.
This is straying into "quite subjective" territory. But...
I see nothing wrong with using git to track this sort of thing. Yes, it was built for tracking large, complex collections of inter-related source files. That doesn't mean it isn't just as suited to tracking small collections of non-related files.
Personally, I use etckeeper to track
/etc
on my servers, using the git back-end. Works great. When installed from Ubuntu's apt repo, it comes with nice hooks into apt, so it'll automatically kick off a commit coincident with each new package install. Pretty handy.Anyway, give it a try, it's not going to hurt anything, and if you end up wanting to switch to another version control system in the future, there are plenty of migration tools available.
Subjective territory indeed. I dislike etckeeper, what it does is it creates a very messy history of /etc. Now for /var/named (I assume this is where your zones are defined) it might be better, especially if you only have a few zones, but it would still be an ugly hack.
I find that the "correct" way of doing it is to use Puppet or similar other configuration management software. Instead of configuring your servers, you write Puppet code that configures your servers. I use mercurial (but you may use git) to manage the repository of Puppet code. It's useful to branch and merge (e.g. you are working on a new configuration, which you are not certain is tested enough, and then you must urgently fix a problem in a production server; in this case you branch from the stable version and later you merge). The combination of Puppet+DVCS is very useful and very clean, but Puppet is a small beast that needs a little time to tame, and bind is particularly tricky to do right with Puppet because of the zone serial numbers.
Update:
I have been thinking about it and I tend to conclude that there is nothing wrong in using git to keep track of /var/named, and that it's not an ugly hack. Yes, the files are related. Yes, you might want to clone in order to make a serious revision of your configuration, branch, in order to maintain your production configuration while revising, and merge, when you finish the revision (although all this might be unusual).
A bit off topic, but there are two things about bind's configuration that I dislike. First, in order to add a zone I need to add/modify two files: named.conf, and the zone file. This is unnecessarily complicated. I'd prefer a setup similar to apache's, with zone files in /etc/bind/zones-available, symlinked from /etc/bind/zones-enabled, without the need to touch named.conf. Second, the serial numbers are a nuisance. We mostly automate their maintenance; in Puppet-managed machines I have a shell script that fixes them, and in others I use a vim plugin. The fact that we automate them means they could well be absent. File timestamps could be used instead. The serial numbers will create some ugliness in the repository history, but this is a problem of bind, and not a problem of git, or of the practice of using git for tracking the bind configuration.
Ask yourself...
Would I ever "revert" to a particular commit in git, if it covered all of /etc?
Would I ever "merge" a commit to my current state, if it covered all of /etc?
I'm guessing the answer is NO, NO, NO.
If so, git is the wrong tool. Git is tracking a tree of files, not a single file. This is why I argue for RCS for single files.