I am looking at automated deployment solutions for my team and have been playing with Chef for the past few days. I've been able to get a simple web app up an running from a base Red Hat VM using chef-solo.
Our end goal is to use Chef (or another system) to automatically deploy application topologies to the cloud as we run builds. Our process would basically run like this:
- Our web app code, dependencies, and chef cookbooks are stored in SCM
- A build is executed and creates a single package for images to acquire and test against
- The build engine then deploys new cloud images that run a chef client to get packages installed.
- The images acquire the cookbooks from SCM or the Chef server and install everything to get up and running
What are the benefits and/or use cases for getting a Chef Server running?
Are there any major benefits to have a Chef Server hold and acquire the cookbooks from SCM vs. using chef-solo and having a script that will pull the cookbooks from SCM?
I am going to orient this answer as if the question was "what are the advantages of chef-solo" because that's the best way I know to cover the differences between the approaches.
My summary recommendation is in line with others: use a chef-server if you need to manage a dynamic, virtualized environment where you will be adding and removing nodes often. A chef server is also a good CMDB, if you need one. Use chef-solo if you have a less dynamic environment where the nodes won't change too often but the roles and recipes will. Size and complexity of your environment is more or less irrelevant. Both approaches scale very well.
If you deploy chef-solo, use a cronjob with rsync, 'git pull', or some other idempotent file transfer mechanism to maintain a full copy of the chef repository on each node. The cronjob should be easily configurable to (a) not run at all and (b) run, but without syncing the local repository. Add a nodes/ directory in your chef repository with a json file for each node. Your cronjob can be as sophisticated as you wish in terms of identifying the right nodefile (though I would recommend simply $(hostname -s).json. You also may want to create an opscode account and configure a client with hosted chef, if for no other reason than to be able to use knife to download community cookbooks and create skeletons.
There are several advantages to this approach, besides the obvious "not having to administer a server". Your source control will be the final arbiter of all configuration changes, the repository will include all nodes and runlists, and each server being fully independent facilitates some convenient testing scenarios.
Chef-server introduces a hole where you use the "knife upload" to update a cookbook, and you must patch this hole yourself (such as with a post-commit hook) or risk site changes being overwritten silently by someone who "knife upload"s an obsolete recipe from the outdated local repository on his laptop. This is less likely to happen with chef-solo, as all changes will be synced to servers directly from the master repository. The issue here is discipline and number of collaborators. If you're a solo developer or a very small team, uploading cookbooks via the API is not very risky. In a larger team it can be if you don't put good controls in place.
Additionally, with chef-solo you can store all your nodes' roles, custom attributes and runlists as node.json files in your main chef repository. With chef-server, roles and runlists are modified on the fly using the API. With chef-solo, you can track this information in revision control. This is where the conflict between static and dynamic environments can be clearly seen. If your list of nodes (no matter how long it might be) doesn't change often, having this data in revision control is very useful. On the other hand, if you're frequently spawning new nodes and destroying old ones (never to see their hostname or fqdn again) keeping it all in revision control is just an unnecessary hassle, and having an API to make changes is very convenient. Chef-server has a whole features geared towards managing dynamic cloud environments as well, like the name option on "knife bootstrap" which lets you replace fqdn as the default way to identify a node. But in a static environment those features are of limited value, especially compared to having the roles and runlists in revision control with everything else.
Finally, recipe test environments can be set up on the fly for almost no extra work. You can disable the cronjobs running on a server and make changes directly to its local repository. You can test the changes by running chef-solo and you will see exactly how the server will configure itself in production. Once everything is tested, you can check-in the changes and re-enable the local cronjobs. When writing recipes, though, you wouldn't be able to use the "Search" API, meaning that if you want to write dynamic recipes (eg loadbalancers) you will have to hack around this limitation, gathering the data from the json files in your nodes/ directory, which is likely to be less convenient and will lack some of the data available in the full CMDB. Once again, more dynamic environments will favor the database-driven approach, less dynamic environments will be fine with json files on local disk. In a server environment where a chef run must make API calls to a central database, you will be dependent on managing all testing environments within that database.
The last can also be used in emergencies. If you are troubleshooting a critical issue on production servers and solve it with a configuration change, you can make the change immediately on the server's repository then push it upstream to the master.
Those are the primary advantages of chef-solo. There are some others, like not having to administer a server or pay for hosted chef, but those are relatively minor concerns.
To sum up: If you are dynamic and highly virtualized, chef-server provides a number of great features (covered elsewhere) and most of the chef-solo advantages will be less noticeable. However there are some definite, often unmentioned advantages to chef-solo especially in more traditional environments. Note that being deployed on the cloud doesn't necessarily mean you have a dynamic environment. If you can't, for example, add more nodes to your system without releasing a new version of your software, you probably aren't dynamic. Finally, from a high-level perspective a CMDB can be useful for any number of things only tangentially related to system administration and configuration such as accounting and information-sharing between teams. Using chef-server might be worth it for that feature alone.
Disclosure: I work for Opscode.
The major benefit of Chef Server over Solo is the ability to use search with your infrastructure. The classic example is a load balancer with web servers. The load balancer can automatically update its configuration as web servers are added and removed to the infrastructure, simply by searching for them. Solo is just that, single machines, while Chef Server brings the ability to query for things like "all machines with more than 4 gigs of RAM" or "the database master".
Chef Server also gives you the ability to manage your infrastructure without copying tarballs around, to visualize your infrastructure with the management console and to manage which machines are running which versions of cookbooks with Environments. There are other benefits, but those are the ones off the top of my head. If you want to try out the Chef Server without installing it, just sign up for a Opscode Hosted Chef account, the first 5 nodes are free.
chef-server manages cookbooks and configuration data for your nodes.
You add cookbooks to chef-server using the knife tool and then give each node a run list of recipes so they grab the cookbooks they need when the node sets itself up using chef-client. Since chef-client runs in the background, your nodes will check periodically if there are updated cookbooks in your chef-server.
chef-server also holds your configuration data & variables so you can change settings for your nodes to pick up automatically. This is good for more dynamic settings that shouldn't or can't be hardcoded in your recipes.