gridengine questions - Page 1

luxifer

Asked: 2011-10-18 00:24:23 +0800 CST

Howto set up SGE for CUDA devices?

5

I'm currently facing the problem of integrating GPU-Servers into an existing SGE environment. Using google I found some examples of Clusters where this has been set up but no information on how this had been done.

Is there some form of howto or tutorial on this anywhere? It doesn't have to be ultra verbose but it should contain enough information to get a "cuda queue" up and running...

Thanks in advance...

Edit: To set up a load sensor about how many GPUs in a node are free, I've done the following:

set the compute mode of the GPUs to exclusive
set the GPUs to persistent mode
add the following script to the cluster configuration as load sensor (and set it so 1 sec.)

#!/bin/sh

hostname=`uname -n`

while [ 1 ]; do
  read input
  result=$?
  if [ $result != 0 ]; then
    exit 1
  fi
  if [ "$input" == "quit" ]; then
    exit 0
  fi


  smitool=`which nvidia-smi`
  result=$?
  if [ $result != 0 ]; then
    gpusav=0
    gpus=0
  else
    gpustotal=`nvidia-smi -L|wc -l`
    gpusused=`nvidia-smi |grep "Process name" -A 6|grep -v +-|grep -v \|=|grep -v Usage|grep -v "No running"|wc -l`
    gpusavail=`echo $gpustotal-$gpusused|bc`
  fi

  echo begin
  echo "$hostname:gpu:$gpusavail"
  echo end
done

exit 0

Note: This obviously works only for NVIDIA GPUs

artif

Asked: 2011-05-06 14:10:19 +0800 CST

How to reserve complete nodes on Sun Grid Engine?

6

How do you use SGE to reserve complete nodes on a cluster?

I don't want 2 processors from one machine, 3 processors from another, and so on. I have a quadcore cluster and I want to reserve 4 complete machines, each having 4 slots. I cannot just specify that I want 16 slots because it does not guarantee that I will have 4 slots on 4 machines each.

Changing the allocation rule to FILL_UP isn't enough because if there are no machines that are completely idle, SGE will simply "fill up" the least loaded machines as much as possible instead of waiting for 4 idle machines and then scheduling the task.

Is there any way I can do this? Is there a better place to ask this question?

pufferfish

Asked: 2011-05-04 00:13:24 +0800 CST

Kill an SGE job "already in deletion", as user

8

Is there a way that my users can kill their own jobs that are stuck in the dr state?

qstat -f <jobid>

as the user, returns

job <jobid> is already in deletion

yet when run as root it does get deleted

David B

Asked: 2010-09-24 16:25:15 +0800 CST

How can I set the maximum number of running jobs per user on SGE?

9

We're using SGE (Sun Grid Manager). We have some limitations on the total number of concurrent jobs from all users.

I would like to know if it's possible to set a temporary, voluntary limit on the number of concurrent running jobs for a specific user.

For example user dave is about to submit 500 jobs, but he would like no more than 100 to run concurrently, e.g. since he knows the jobs do lots of I/O which stuck the filesytem (true story, unfortunately).

Is that possible?

Peter Smit

Asked: 2009-08-01 00:08:09 +0800 CST

What are the differences between wall clock time, user time and cpu time

23

We are running computing jobs with GridEngine. Every jobs returns 3 different times:

Wall clock time
User time
CPU time

What are the differences between these three? Which of these three is most suitable to compare the performance of two applications/scripts

Howto set up SGE for CUDA devices?

How to reserve complete nodes on Sun Grid Engine?

Kill an SGE job "already in deletion", as user

How can I set the maximum number of running jobs per user on SGE?

What are the differences between wall clock time, user time and cpu time

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?

Questions[gridengine](server)