We have a huge directory tree, many levels deep and with a (very) large number of small files at each level.
From time to time we have large file data changes where parts of the tree are replaced and the permissions for these parts will need to be reset. It is not possible to foresee which parts will change next time.
The files currently reside on a Windows NTFS partition.
Resetting permissions needs to be done recursively from the root. This takes the better part of a working day, where the actual requirement is near instant change (or the business suffers).
I have tried GUI. I have tried robocopy. I have tried powershell. I have tried a Go library (Wrapper for the Windows API), as Go has a reputation for being fast, but it turned out little was gained.
I have contemplated letting applications work through symlinks, where the symlinks have the restrictive permissions and the data has (very) permissive permissions. But the root problem would remain: after a data replacement we would still need to trawl through the tree and set these permissive permissions.
Setting group acl:s is not a solution, we use that already. The replacement data has a different set of permissions and these need to be replaced.
Windows is not a requirement, we also run Linux. If yet other OS platforms could solve the task of setting massive file permissions and making the files accessible through common file sharing protocols (http, smb et c), such an OS could be considered.
So my question is: Using Windows NTFS as a baseline, are there file system/OS combinations which provide significantly faster recursive file system operations (setting permissions), whilst still serving the files through a common file sharing protocol?
Procedural suggestions are also welcome - have I and my colleagues just missed some obviously more simple solution than replacing the file system or OS?
EDIT based on constructive comments: yes we have a dev team inhouse and can potentially leverage anything between sysadmin hacks (me) to well designed code (devs).
EDIT2 Answering questions from @GregAskew (approximations as on holiday until monday)
How many ACE's are in the ACL?
- About 8 ACE’s per ACL.
Is the file system optimized for performance (short file names disabled, last access time disabled)?
- No, I was not aware of these optimizations and will try them out.
What are the maximum number of files in a directory>
- Will have to measure.
What is the state of file system fragmentation and directory index fragmentation before setting the ACL?
- Unknown, will investigate.
What is the allocation unit size of the volume? What operating system version is hosting the volume?
- Currently a VMware vSAN hosting a Windowws 2008 R2 Std, upgrade imminent to Win2016 Std.
Are you setting the permissions locally (based on another comment, it seems like you are doing this over the network)?
- We are setting the permissions locally in the VM, but then let them DFS-replicate for redundancy (excruciating and this will be redesigned). We have full control over design, it is just the implementation of the initial file delivery which is out of our control (but based on a comment we may try to change this). The question is just about changing the local file permissions (I realize the underlying SAN is network connected but will gladly take constructive suggestions along those lines all the same).
The standard (since the 1990s at least) process for avoiding frequent recursive changes to ACL is to use “access groups” to assign permissions.
So users get put into role groups, and role groups are put into access groups. Permissions are actually granted using the access group, and never granted directly to users or role groups.
In your case, you would want an “access group” for each level of your folder structure as far down as you need unique permissions.
When new data comes in, you create the groups “AccessReadFolderX” and “AccessWriteFolderX”, then set permissions on a new folder (blocking inheritance from its parents). You then copy the new data into this folder.
With this method, you never change the ACLs on the filesystem. Instead you just modify the nesting of groups and users in AD.
TL;DR answer: use faster disks.
Seriously, that's the simple answer. If you want to have "a huge directory tree, many levels deep and with a (very) large number of small files at each level", in order to reset permissions on any part of the directory tree, you need to perform IO operations on each file. That takes time, and if you're using something like S-L-O-W 5400-rpm SATA disks for storage, that takes even more time. Slow disks like that are limited to about 40-50 IO operations per second, and there's nothing you can do to improve that. If you have to update millions of files at about 15-20 files per second per disk, that's going to take time. The filesystem really doesn't matter much when that's the job you have to do.
Good 7,200 RPM RPM SATA drives can get about 70 IO operations per second, and really good, fast SAS drives can get 200-300 IO operations per second. SSDs can do thousands.
And because file metadata tends to be spread all over the disk on just about all filesystems, there's not much any filesystem can do to improve the performance, unless you get into expensive, complex filesystems such as IBMs GPFS or Oracle's QFS - whatever they're calling them now. HP's Ibrix also might work for you, if they're still selling it. But those filesystems are expensive, and require significant expertise to administer.
You can try to limit the IO operations NTFS makes by setting
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem\NtfsDisableLastAccessUpdate
to1
from it's default of0
. That will at least disable access time updates when you're trawling through the directory tree updating permissions. It might help a bit.A better answer is to design a system that doesn't require massive changes to a massive data store. Because that's a really, really bad design when "the actual requirement is near instant change (or the business suffers)".
Assuming that all files need the same permission from a given root, it may help to write an own "acl setter" in c++ or c#. It should check the permission first if it is already correct before writing it and work with asynchronous threads.