I'm debating about using LVM for a media/file server because I would like to combine multiple physical hard disks into one volume. I do not wish to use any RAID in my LVM so my question is:
If one of the multiple hard disks in my volume were to go down would I lose all my data or would I just lose the data that was stored on that individual disk?
Also, if I were to just lose the data on the individual disk, would it be as simple as replacing that disk and restoring what was on it from a backup to recover?
No you will lose data stored on whole LVM
No it isnt that simple
You can read here similar question LVM and disaster recovery
Simple: You are looking for mhddfs.
It pretends to be one large filesystem, writes to the disks in the order they where mentioned and eventually moves large files to a different device, if the first one was too full. It can actually also use subfolders on the disks, allowing the same functionality.
The individual disks have to be mounted first and remain accessible. It does not alter the filesystems at all and does not care which filesystem is in place (as long as free space is correctly reported by the filesystem). In case a disk is lost, you'll have to remount your mhddfs again (on the fly) and the data on that disk is gone.
Usage:
or in
/etc/fstab
Complex&Powerful: You want unionfs.
While mhddfs is nice and extremely simple, I've had problems with file permissions when granting others access via SSH. I couldn't find any solution, but found unionfs.
Unionfs also allows you to mount several folders across different filesystems into one, but does it's magic on permissions. You can merge several read only folders and one writable one together, so it appears as one. People you shared your merged folder with can then write to a read-only folder - as it appears to them - but the files end up in the single writable one. Linux boot CDs work like this, the writable disk is a ramdisk. People can even delete files in read only folders, which does not really delete the file, but creates a hidden whitelist file in their write-directory. If you catch all the options, you can basically use your filesystem as a poor mans SVN.
If you use the SVN-like options too much, you might miss data existing twice (improbable in your scenario, but possible), while your writable folder fills up with tiny, hidden whitelist-files. Other than that, it keeps your disks clean and individually usable. What happens if a file is too large for a disk, I don't know yet.
Usage:
where
=rw
makes the folder read and writable and=ro
makes it read only, even if the permissions would state otherwise. Inetc/fstab
this isIf you're just connecting multiple devices together there wouldn't be any redundancy, so you could lose the data. But if you're using a media/file server for a business, you shouldn't lose anything because you have everything backed up to a backup server/tape drive.
Why are you avoiding RAID? The point of RAID is availability; if you don't want to lose time due to disk failure, you can use a RAID 1 configuration, which can also speed up your reads. They're not too expensive, pay for themselves the first time you have a disk failure, and if you are REALLY avoiding having to pay for a card you can set up Linux to use software RAID although it takes a little more care in the setup and troubleshooting to make sure you replace the correct drive.
Otherwise you'd have to jump through some hoops to try to recover what data you can from the remaining disks. It would be possible, but you're kinda' asking for a lot more trouble than you should have. Get a good backup in place, and reconsider the RAID.
If you are using one file system spanning all LVM volumes, the whole file system will be damaged as the FS doesn't know about the underlying physical volumes and won't create structures aligned to it. It may be possible to rescue some of the parts on the working disks, but there is no guarantee for that.
And just recovering the files of the damaged disk won't work either for the same reason.
I think a much simpler route would be to configure mdadm for your media partition. If you don't have the hardware for "real RAID" going the mdadm route would be considerably easier, and seem to meet your requirements for redundancy and simple disk replacement.
For more information: http://en.wikipedia.org/wiki/Mdadm
If you use mdadm and RAID 5, you'd be able to lose one drive, and have the array functional, albiet you'd experience performance degradation.
I think the important thing to understand which hasn't been mentioned is a file in a filesystem is not necessarily sitting in one spot on a disk. It's broken up into blocks which may reside anywhere inside the filesystems. The first 4K if your file might be on disk1, the next disk2, etc. You can imagine the mess of trying to recovery anything if you lost a chunk of the filesystem.
Btrfs is a good choice here; you can have metadata resilient to the loss of one disk (the "raid1" chunk profile); the data on the other disks will still be reachable (just so we're clear, that translates to files full of holes wherever the missing disk is referenced). This is done by running btrfs balance with a filter: