I have 2 1TB disks not in any RAID configuration. I'd like files I need to store to be placed on one of the disks depending on the capacity of the disks, and when accessing the file I suppose I'd need to find the file via a database containing a file map, or by using a hash. Are there any Linux utilities that provide this, or should I just create a PHP script?
Thanks
Considering how cheap a 1TB disk is, get another and create a RAID5.. Redundancy and storage.
Three years late, but still relevant. I had the same problem and FakeRaid was simply out of the question. Use AUFS. It will join the drives under a single drive. The mfs setting will put new files on the drive with the most free space. There is also rr which is round robin and pmfs which will put files onto the drive that has the folder already and has the most free space. I personally use pmfs. My setup works like so.
The fstab:
I added the init.d script (Due to drive mount times being too slow to keep up with aufs mount):
That gives me 10 mounts under /mnt. I like it this way because I use SnapRAID which you'll have to download and compile (there are guides for it). I use this on a Samba server, so the only thing everyone else sees is just the Archive folder. Make sure to make the directory other wise you'll get a mount error.
Greyhole will distribute your files across multiple drives. It will also allow you to specify redundancy, so that certain files have redundant copies stored on multiple drives. It is targeted at the home server or workstation and not as a production enterprise solution.
It sounds like all you care about is being able to utilize all 2TB of storage without having to manually place files on one drive or another. Either LVM or RAID0 can solve this problem for you at the expense of increased risk of failure. For LVM, you would make each 1TB drive an LVM physical volume and put them both in a single volume group. After that you could create logical volumes that up to 2TB in size. For RAID0, you'd just create the RAID device.
I don't know of a way to transparently merge separate filesystems into a single storage pool. This sort of sharding isn't uncommon, it's just typically implemented at the application rather than the storage layer. Engineyard has a paper describing filesystem sharding tactics and processes.
The right way is to use LVM.
Personally I just put most of my media collections on one disk, other things on the other disk.
If you plan to scale in future say to 10 hardrives across multiple servers consider using clustred filesystem such as http://en.wikipedia.org/wiki/GlusterFS