We have an automation tool which tries to log in through ssh and send commands, which works fine when server is running. On the other hand while server is booting up, our tool check if the ssh port (22) is open, and if it is open it tries to connect to server and send commands.
However, when the server is in bootup sequence and our automation tool checks if the port 22 open, it tries to connect to server using ssh client but server rejects or ssh client returns error "ssh port is not open".
We have tried to investigate this issue with telnet and saw that, while in the bootup sequence, sshd starts and opens the port 22 and start listening but it is somehow closes again the port and opens it up again in a while. And that is the exact same time our automation tool tries to login.
My question is; how can we make sure that ssh port is succesfully open and ready to take commands ?
Thank you for your time to answer, Best regards
First it seems that the automation tool is not verifying the exit status of ssh. I would try to fix the problem there.
One solution is to try to fill a bug for the team that created the tool.
Another solution would be to wrap the ssh command in a script that would do this transparently. E.g. create a script in /opt/myproject/ssh_wraper.sh
Here you can have something like:
You could try experimenting with the exit status you get from something like
ssh user@host "echo 0 > /dev/zero"
If the command completes successfully, you would get a
0
(indicating that the system was ready). A failed attempt would result in an exit code of255
.You might want to consider using
-o ConnectTimeout=
and-o ConnectionAttempts=
, too.I'd agree with Steve, too, though. Maybe just wait a little longer. Depeding on how aggressively your tool tries to probe for the port, increase the delay before trying to attempt a login.
You could put a loop ahead of the login to wait until the port is open.
If you don't want the risk of an endless loop if the condition never reaches true, you could set an 'or' value some how. Exercise is left to the reader. :)
Would the easist solution to be have the automation tool attempt to login, if it fails, to wait x mins and try again?
All the while the server is booting, odd things like this can happen.
Something you can try is to add a script to the boot sequence of your server (in /etc/rc.local for example) that will turn off the firewall on port 22. This script (as stated in the comments of the /etc/rc.local) will be executed after all the other init scripts. So as long as your sever hasn't finished it's boot sequence, port 22 is still unreachable, behind the firewall. It has the advantage of leaving the automation tool unmodified.
Based on a RHEL6 OS. Maybe the init scripts are different on your distribution.
This is what I do when I start a server in AWS and wiat for it to be available for SSH connectivity, I am using bash to do this: