I'm putting together a Linux box that will act as a continuous integration build server; we'll mostly build Java stuff, but I think this question applies to any compiled language.
What filesystem and configuration settings should I use? (For example, I know I won't need atime for this!) The build server will spend a lot of time reading and writing small files, and scanning directories to see which files have been modified.
UPDATE: Data integrity is a low priority in this case; it's just a build machine ... the final artifacts will be zipped up and archived elsewhere. If the filesystem on the build machine gets corrupted and loses all data, we can just wipe and re-image; builds will continue running as before.
Fastest filesystem? tmpfs mounted out of available RAM, with
noatime
set.This is only viable if you have a procedure for checking out everything needed to build your source tree (since the contents of a tmpfs filesystem will go away when you reboot), and if source and objects fit into a reasonable corner of your available RAM (with enough left over to run your compiler & linker without swapping). That said you cant beat working out of RAM for speed..
Use ext4fs as the base filesystem with a few speedup options like
noatime,data=writeback,nobh,barrier=0,commit=300
Then union mount a tmpfs ramdisk on top of that so that files written during the builds get the benefits of the ramdisk. Either change the build procedure to move the resulting binaries off the tmpfs at the end of the build, or merge the tmpfs back into the ext4fs before unmounting.
To the answer of Michael Dillon i can add that you can create ext4 filesystem with few options :
-i 8096 gives you more inodes per size, useful because building environments create a lot of files.
For sources it'd preferable to have compression-on-fly support, which is Reiser4 or Btrfs. Both are "not for production" yet, although I have heard of people using both FSes heavily and happily. :-)
The next choice (I usually do) is Reiser3, not Ext3. Ext3 can be a bit faster nowadays, but Reiser3 doesn't have i-nodes format-time limits, supports on-line changing of "data=" option. It has "tail" support allowing tighter tiny files packing, but if you're concerned about speed, "notail" it.
Both XFS and JFS would be a pain for "lots of small files" case, specially if you'd need rm'ing them.
(Forgotten to mention EXT4: Yeah, it's even faster, then EXT3. But all the above-mentioned EXT3's limitations are EXT4's too).
The operations you describe give some key hints as to what the ideal file-system needs to be able to do:
Btrfs and Ext4 are three of the above, and the fourth is questionable. Ext4 is probably mature enough for that, but btrfs isn't done baking yet.
noatime
helps make the meta-data operations more efficient, but when you're creating a bunch of new files, you still need meta-data ops to be screamingly fast.That's when underlying storage starts becoming a factor. XFS meta-data operations tend to concentrate in a few blocks, which can strain operations. The Ext-style filesystems are better about getting the meta-data closer to the data its describing. However, if your storage is sufficiently abstract (you're running in a VPS, or attached to a SAN) it doesn't matter significantly.
Each filesystem has little speedups that can be done to eek out a few more percentage points. How performant the underlying storage is will greatly impact how much gain you'll see.
In storage parlance, if you have enough I/O Operation overhead in your storage, filesystem inefficiencies start to not matter so much. If you use a SSD for your build partition, filesystem choice is less important than what you're more comfortable working with.
For lots of small files, I'd recommend Reiser over ext3, xfs, jfs..., although I've heard that ext4 is a lot better (i.e. opposite of what poise says) than its previous incarnations for this pattern of access.
Reiser pushes a lot of the files structure up the inode tree - so it works really well when dealing with small files.
However the differences in behaviour between the leading filesystems is relatively small compared to the benefits you'll get by having enough physical memory to cache/buffer effectively.
This is a crappy way to solve the problem - even though its relatively simple. If it is that important, think about writing an inotify handler to index the mods.
OTOH, if you're using flash SSD (which will give you very low seek times) I'd recommend using a fs which distributes write more effectively for longevity reasons - e.g. JFFS2