I'm changing the way that our DHCP/DNS stuff works at work. Currently we've got 3 DNS servers, and a DHCP box. All of them are VMs.
There's a circular dependency where stuff booting requires NFS, which requires DNS. So when we reboot stuff, things might come back subtly broken until the DNS is up, and we restart some services.
What I want to do is have a few low power servers, probably dual core Atoms or similar, running from SSDs, so that they boot damn fast. I want to make the whole thing boot as near to instantaneously as possible.
Ideally I'd like to use Ubuntu 11.10, or Debian 6 as the OS. I'm not interested in Gentoo or compiling my own kernel. This needs to be reasonably supportable by myself.
Other than SSD drives, what other optimization steps can I take to improve boot speed?
Isn't this a situation where you should engineer around the circular dependencies? Set power-on delays in the server BIOS. You have multiple DNS servers, so that's a plus. DNS caching? Would this be as simple as using IP addresses or host files for your NFS or storage network? You didn't mention the particular virtualization technology, but it's possible to set VM boot priority in VMWare, for instance... Is this across multiple host servers?
Otherwise, SSD-based boot drives can help. Use a distro with Upstart boot processes. Trim down daemons.
Depending on your UPS status, this could be one of the few use-cases where an ACPI hibernate may be a good idea. Generally restore-from-hibernate beats out a boot-from-scratch, especially in the case of low-RAM SSD-based systems. If you have the ability, the 'shutdown' step for your UPS software can be set to hibernate the DNS server.
I can recommend a very tiny NetBSD system on SSDs, but if you have your heart set on Linux there are two options that spring immediately to mind:
There's also the option of really tiny custom/embedded solutions like this one ($99 ARM-based system on a module with a 1-second(ish) bootup time. It isn't commodity hardware but it could be tucked away in a quiet corner of a datacenter and left to just run forever...
In most setups DNS is the most important infrastructure service. If it breaks everything else will break, too. The conclusion is that the DNS-server(s) should not depend on other servers.
If you really need NFS for booting - make your DNS-servers those NFS-servers (this is breaking a rule, too) - but make sure to export ro only and make sure you can`t put your NFS-servers in the danger of a DoS-attack.
Propably the better solution is a different (HA) approach for providing the needed NFS-service for booting, thus breaking the circular dependency (nscd may help on the NFS-servers as well).
Update 2011-11-17 on NFS: From one of your comments I see that NFS is being used for /home-dirs. Local technical users should not have those. Anything else should be mounted via autofs whith bg,hard,intr.
You might want to use bootchart to see what are the boot time hotspots.
There's also readahead: https://fedorahosted.org/readahead/ , which I haven't tried.