My supercomputing center recently moved from SGE to pbs/Torque. Now, when I schedule my array jobs, only half of the jobs in the array get scheduled. When they finish, the other half get scheduled. This happens despite the fact that they are largely under utilized.
For example, I just scheduled an array with 10 jobs. Here is the qstat output 10 minutes later:
[myuserna@sub ~]$ qstat -t
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
3100[1].systemm2 ...-to-work.sh-1 myuserna 00:07:40 R short
3100[2].systemm2 ...-to-work.sh-2 myuserna 00:07:32 R short
3100[3].systemm2 ...-to-work.sh-3 myuserna 00:09:55 R short
3100[4].systemm2 ...-to-work.sh-4 myuserna 00:09:44 R short
3100[5].systemm2 ...-to-work.sh-5 myuserna 00:09:07 R short
3100[6].systemm2 ...-to-work.sh-6 myuserna 0 Q short
3100[7].systemm2 ...-to-work.sh-7 myuserna 0 Q short
3100[8].systemm2 ...-to-work.sh-8 myuserna 0 Q short
3100[9].systemm2 ...-to-work.sh-9 myuserna 0 Q short
3100[10].systemm2 ...to-work.sh-10 myuserna 0 Q short
[myuserna@sub ~]$
Any clues how to fix the scheduler?
Here is the relevant portion of the scheduler config:
create queue short
set queue short queue_type = Execution
set queue short Priority = 10000
set queue short max_user_queuable = 500
set queue short max_running = 200
set queue short resources_max.walltime = 24:00:00
set queue short resources_default.nodes = 1
set queue short max_user_run = 50
set queue short enabled = True
set queue short started = True
#
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = systemm2
set server acl_roots = root@*
set server managers = [email protected]
set server operators = [email protected]
set server default_queue = route
set server log_events = 511
set server mail_from = adm
set server resources_default.walltime = 01:00:00
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server mom_job_sync = True
set server keep_completed = 300
set server submit_hosts = submit-1
set server submit_hosts += submit-0
set server auto_node_np = True
set server next_job_number = 6217
set server max_job_array_size = 512
set server max_slot_limit = 5
Check with your administrator. It is possible to limit the number of slots in use per user per queue.
Update: okay, now you've updated the question to show
which I'm pretty sure answers the question.