I'm running Ansible 2.0 on SLES 11 SP4 against about 430 machines and it is very slow, I can't really tell why it is so slow, but it goes much faster if I limit the number of machines in the inventory. It took about 7 hours to run a 3 task playbook (including gathering facts) and the 3rd task was a local action. It takes about as much time to gather 2 machines facts files when I'm running inventory of all 430 as it does to fully process 6 machines.
And it uses 99.9% of the CPU right off the bat:
root 11646 99.8 0.4 220188 61016 pts/1 Rl+ 07:24 6:41 \_ /usr/bin/python /usr/bin/ansible-playbook /etc/ansible/playbooks/checkhostnames.yml ...
root 11651 0.1 0.4 187396 58828 pts/1 Sl+ 07:24 0:00 \_ /usr/bin/python /usr/bin/ansible-playbook /etc/ansible/playbooks/checkhostnames.yml ...
root 11652 0.1 0.4 187812 59216 pts/1 Sl+ 07:24 0:00 \_ /usr/bin/python /usr/bin/ansible-playbook /etc/ansible/playbooks/checkhostnames.yml ...
root 11653 0.1 0.4 188052 59428 pts/1 Sl+ 07:24 0:00 \_ /usr/bin/python /usr/bin/ansible-playbook /etc/ansible/playbooks/checkhostnames.yml ...
root 11654 0.1 0.4 186148 57496 pts/1 Sl+ 07:24 0:00 \_ /usr/bin/python /usr/bin/ansible-playbook /etc/ansible/playbooks/checkhostnames.yml ...
root 11655 0.1 0.4 186552 57924 pts/1 Sl+ 07:24 0:00 \_ /usr/bin/python /usr/bin/ansible-playbook /etc/ansible/playbooks/checkhostnames.yml ...
root 11656 0.4 0.2 154948 25828 pts/1 Sl+ 07:24 0:01 \_ /usr/bin/python /usr/bin/ansible-playbook /etc/ansible/playbooks/checkhostnames.yml ...
Which is scary since I was really hoping that this would optimize our serialized ssh processes, looks like it's just gonna suck up all the resources.
when I strace the main pid, it just appears to be running stat
on the inventory file over and over and over again.
I'm keeping all my host vars in one inventory file that I generate from a database. I tried using a dynamic inventory, but that took too long to even initialize (I'm guessing it's hitting the sql query over and over again)
So, is there a trick to running it against lots of machines?
I have already tried all the tricks in https://www.ansible.com/blog/ansible-performance-tuning
I've also tried breaking it up by putting host_vars for each host in their own file - I figured strace was telling me that it was parsing my 500k inventory file constantly. But that didn't help too much.
I switched my playbook to just echo hello, no gathering facts
when I run an inventory file with only 3 hosts in it I get
real 0m1.996s
user 0m0.400s
sys 0m0.112s
when I run an inventory file with all 430 hosts and limit to just the first 3 I get it done in (note, these are different hosts - but the same make of machine):
real 0m11.989s
user 0m13.693s
sys 0m0.552s
and when I run an inventory file with all 430 hosts with no limit (and ctrl-c after the 3rd one, I get:
real 2m50.961s
user 2m56.495s
sys 0m0.764s
So, it makes me think that not a lot is really going on behind the scenes and something is intensely blocking.
First of all, you need to consider caching the facts.
Take a look here for how to:
http://docs.ansible.com/ansible/playbooks_variables.html#fact-caching
You will see an amazing performance on gather-facts, even with caching to a file.
Then you may consider of improving the level of parallelism with -f
to something bigger than 5