According to this paper on Facebook's Haystack:
"Because of how the NAS appliances manage directory metadata, placing thousands of files in a directory was extremely inefficient as the directory’s blockmap was too large to be cached effectively by the appliance. Consequently it was common to incur more than 10 disk operations to retrieve a single image. After reducing directory sizes to hundreds of images per directory, the resulting system would still generally incur 3 disk operations to fetch an image: one to read the directory metadata into memory, a second to load the inode into memory, and a third to read the file contents."
I had assumed the filesystem directory metadata & inode would always be cached in RAM by the OS and a file read would usually require just 1 disk IO.
Is this "multiple disk IO's to read a single file" problem outlined in that paper unique to NAS appliances, or does Linux have the same problem too?
I'm planning to run a Linux server for serving images. Any way I can minimize the number of disk IO - ideally making sure the OS caches all the directory & inode data in RAM and each file reads would only require no more than 1 disk IO?