I'm using Ubuntu 10.04 and trying to install Sun Grid Engine from Ubuntu repesitory. It works on single machine, I can submit jobs etc. But I can't make it working with any other machine. I added another execution host and installed gridengine-client gridengine-common gridengine-exec
but it somehow can't communicate with master. I even turned off all firewalls to make sure it isn't causing a problem.
When I try qstat -f
on master node I get:
queuename qtype resv/used/tot. load_avg arch states
---------------------------------------------------------------------------------
standard@neuron1 BIP 0/0/2 0.04 lx26-amd64
---------------------------------------------------------------------------------
standard@neuron2 BIP 0/0/2 -NA- -NA- au
When I restart deamon on neuron2 node I get:
error: can't find connection
error: can't get configuration from qmaster -- backgrounding
When I try to run qstat -f
from n2 (neuron2) node I get:
error: commlib error: access denied (server host resolves destination host "n1" as "neuron1")
error: unable to contact qmaster using port 6444 on host "n1"
I have two hostnames for this machine and it looks like the first error has something to do with it, but it would be strange if it is causing this kind of problem. I tried telnet n1 6444
and it connects.
Does anybody know what is going on here? Am I missing something?
Ok, the problem was indeed with doubled host names. When I removed one from it started working. I will dig it and try to find why it is that way.