In one of our compute clusters, we have systems with unique hardware resources to which access is controlled by device-file permissions. Each node has two or four of these, and multiple CPU cores. We'd like to be able to schedule different users' jobs on the same node and restrict access to the properly-assigned resources. (Some queues might even be CPU-only, with no access.)
For a while, we were running with a "hey, pay attention and play nice" policy, but that's hard for everyone to keep straight even with the best intentions. So instead we just schedule the entire node for a given user at a time. This is wasteful for single-threaded, single-process tasks.
With Torque, one can run a prologue script as root before the job starts. This could be made to set the device permissions appropriately. But we're running (née Sun) Grid Engine. That has per-queue prolog
scripts, but they runs as the user to whom the job belongs (like Torque's prologue.user
), which is no help here.
Is there something obvious I'm missing (I hope), or an alternate way to approach this? I realize that I have the source code and therefore can do anything, but I'm hoping there's a standard way I'm just missing.
Thanks!
The prolog script can actually be run as any user.
From
man queue_conf
:So setting
prolog root@/path/to/prolog
should have it execute as root.