I'm planning to set up a small Hadoop cluster where the slave nodes boot and run from a central PXE server, to simplify deployment and updates, and to enable all of the disks on the slaves to be (almost) monopolized by HDFS.
However, I suppose I'll still have to reserve some space on the slave nodes for /tmp and /var/log, I don't want to just put these in a ramdisk because I'd like them there for debugging after crashes (and because RAM is scarcer than disk).
So the machines might boot off the remote PXE server, mount their / read-only from there, then mount /dev/sda1 through sdd1 for the HDFS data partitions, /dev/sda2 for /tmp, and /dev/sdb2 for /var/log.
My question is, are there any other directories that will need to be writable? Assuming we get Hadoop etc. to log into /var/log.
(And is this a sensible architecture in general?)
EDIT: don't worry about swap, I'm planning to make these swapless, the OOM killer is preferable to thrashing.
You can study livecd layout of your distro, but likely you need /var instead of /var/log and in some distros there are files in /etc that must be writable. /home as well unless you put home dir elsewhere.
"(And is this a sensible architecture in general?)...."
I wouldn't say that your idea is wrong but seems interesting. In a nutshell your setting up a diskless architecture but still using the localdisk. To me your adding extra loops.
"to simplify deployment and updates .. "
If your goal is to make it more centralized and manageable. I would use some sort of automated distribution engine. Like in my case, I use puppet. The code is already available on github. Just customize to your needs. That should take care of your simplification & manageability. I built couple of clusters in no time using my puppet manifests.
Here is a simple solution for Slackware 14.2 (BSD init, not systemd) booting via PXE with the root filesystem served read-only over NFS.
I simply modified
/etc/rc.d/rc.S
to copy the folders that need to be writable (/etc
and/var
) into a tmpfs filesystem, mount temporary filesystems over the original folders, and move the copies into the temporary filesystems:The above was inserted at the top of
rc.S
, preceded only by the line that sets thePATH
variable. This runs at the very start of init, prior to any other partitions being mounted or services starting.Any changes to these in-memory copies are simply discarded on reboot.