I'm about to deploy ~25 servers running Debian. The machines will have different roles - web servers, Java appservers, proxies, MySQL boxes. The environment will probably not grow much in the future - maybe 2-5 more servers in next 2 years.
I'll probably use fai for system installation, but I'm unsure if it's worth to add also cfengine or puppet centralized configuration management for such small scale.
Does configuration management make sense for an environment this size?
I would recommend using a mixture of Debian pre-seeding, where you give the installer a text file that answers all the questions it would ask, and Puppet.
THe reason for using the preseeding, rather than FAI is that you don't have to set up an image first and deal with keeping it up to date. You will end up with an install very similar to what you would have if you did them all by hand. When you come to install a new release, you will have to update a config file with the changes, rather than having to rebuild a new image.
A configuration management tool is particularly useful where you have several servers performing the same role and you want them to be identical, e.g. webserver cluster. However, they can also be useful for configuring the base install of all servers. You're going to want to install particular packages on all your servers, like ntpd and a MTA. You're going to want to change a config file on all your servers. An additional benefit is that you can keep your manifests in something like subversion and keep a record of what changed on a server and who did it and why. Configuration management can also be a life saver in the case of a server failure and you need to rebuild it quickly. Install the OS (using FAI or preseeding), install puppet and away it goes, built back exactly as it was before. Obviously you'll need to keep backups of data.
Configuration management requires dedication to make sure you only make changes using it and will have an upfront cost setting things up, but once you have a working setup you won't regret it.
Puppet is the more modern of the two tools you've mentioned. I really recommend it to anyone. The configuration is a declarative language and is easy to build up higher level constructs. There is also a very large community around it and there are always people welcome to help on the mailing list or the IRC channel.
I'd recommend CFengine for any environment which is more than 2-3 boxes and where you have some concept of 'templates' or servers performing specific roles.
Why? Simply put it reduces mistakes, you have a tool which will ensure file/directory permissions are correct everywhere in the environment and when you come to roll out more servers, the tool handles absolutely everything and never makes any mistakes.
Contrast with even a skilled System Administrator rolling out a web server at the end of a twelve hour shift when things already went wrong.... Are they likely to remember that nasty little configuration file which needs to go in /etc/random/location/foo/bar otherwise the application will silently fail to do something rather important, like bill customers? :)
Tools like CFengine are also a great way to perform environment-wide security updates. Dropping a Nagios configuration (NRPE) onto all boxes is also a doddle. Whether you're dealing with five boxes or five hundred boxes you will save time with CFengine.
It is probably worth noting that my environment is a little larger, however I've also deployed CFengine for smaller environments than you note, hence the recommendation!
Probably your next question will be CFengine vs Puppet? That's a more difficult decision, and I've always gone CFengine due to (in the early days) some immaturity from Puppet, particularly around error logging.... these days I'm really not sure - have a play 'n see? Looking back to my specific issues with Puppet, they were SSL certificate related, painfully still recall the time I spent 3 hours diagnosing server <-> client connectivity issues in irc.freenode.net/#puppet with some hefty RTFM and RTFS only to find an error, not being logged, and Luke said, "Ah that's really difficult to fix" and never did. :(
In addition to cfengine and puppet, there's also chef. I would strongly suggest using one of these tools as things always will grow in unexpected directions. This helps manage things in a centralized location.
The important thing to recognize is that chances are you won't get everything but if you can at least get 90% there, it's a start. Besides, it's fun and will make your life easier in the long run. Lastly, it's a good skill to have going forward.
I'm using cfengine since 5 years to install debian (from woody up to lenny nowadays). With etch I build a custom debian-installer. Thanks to preseed one single question comes up: "whats the hostname". After this cfengine configures the whole server (dns+dhcp with dnssec, samba, ntpd, default (Samba) users and passwords, ssh, openvpn, apache vHosts, backup with rsnapshot on LVM, custom webminmodules etc).
Even when I install just one server I use cfengine-scripts from my toolbox like this:
I like cfengine, because the cf2-scripts are somewhat human readable.
so its definetly worth it to work with tools for automatic configuration management.
/thorsten
It's got to be worth it even for a small site. Its all about consistency as you grow. And you know that your site is going to grow. Best to start while your still small. Cfengine is awesome. Especially the version 3, which can handle all the package managers across the field, and its real lightweight and secure and it "just works". Puppet just didn't deliver what it claimed. Haven't tried Chef.
The advantage of cfengine over the others is it's ultra lightweight but actually has more capabilities. It's security is like ssh, rather than the web certificates used by puppet. When I told my boss about cfengine he thought it was science fiction :) If you're looking for something futuristic, try reading some of Marc Burgess's research papers. Cool stuff.
The number one tool I wish I had when running a small site is 'push-button' builds. It makes patching, updates, and rebuilds easier, which can address a myriad of other problems in the future.
No ssh properly installed on all boxes? no curl/wget/vim either? what about other in-house tools you'd like to have on each box?
Having central management of your servers is one of the first tools you should have working to make future efforts much easier.
I agree with every one here. You should start to learn and set up a working infrastructure when you are not to large. Because then you are prepared when you grow.
Depending on what you want to run, I would go for FAI, cfengine and pre-seeding for Debian/Ubuntu. FAI can work with many different tools, so it is a good start for any Debian-like distribution. With FAI (and cfengine) class-controlled configuration, you can easy divide your installations into small modules, which you then can select which to use for each of your machine. In this way, it will be usefull even if you have many different machines. It is actually more usefull, as you will document your installation with these scripts. And when you install on a new machine, you will not forget anything.
Yes, you SHOULD have some machines to test on at befor you deploy your changes in a live installation. But with configuration script like this, you will not forget to do any step in the live installation.