I'm not sure whether the problem is on client or server. All my machines are Ubuntu 12.04 x64. I'm on Digital Ocean, and I have to configure iptables to "survive".
Server Story
My Chef Server is on 2GB RAM machine with no swap. It's latest Chef 11 from omnibus. I tried accessing Web UI, and it was fine. But after a few hours, it was very slow. Something likely crashed, and probably runit restarted whatever that was. I discovered that my RAM was all used up, only 77mb was free.
I tried reconfigure chef server like this.
# this is /etc/chef-server/chef-server.rb
topology 'standalone'
api_fqdn 'chef.[mydomain]'
lb['fqdn'] = 'chef.[mydomain]'
nginx['server_name'] = 'chef.[mydomain]'
nginx['url'] = 'https://chef.[mydomain]'
# default was 25%
chef_solr['heap_size'] = 300
# default was 200 connections
postgresql['max_connections'] = 30
# default was 25%
postgresql['shared_buffers'] = '128MB'
# default was 50%
postgresql['effective_cache_size'] = '256MB'
That didn't help much. I now have 200-300mb RAM free, but chef server still has problem responding after a while. I also see that my chef client (daemon running every 30m) stops reporting eventually, and I'd find status like "last reported 5 hours ago".
Client Story
In my client log I'd find things like these:
Error connecting to https://chef.[mydomain]/nodes/db.[mydomain] - Connection timed out - connect(2)
My client has iptables configured as follows, just in case this might be a problem.
% iptables -L -v
Chain INPUT (policy DROP 6 packets, 676 bytes)
pkts bytes target prot opt in out source destination
3155 3260K system all -- any any anywhere anywhere
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 1419 packets, 178K bytes)
pkts bytes target prot opt in out source destination
Chain system (1 references)
pkts bytes target prot opt in out source destination
0 0 DROP tcp -- any any anywhere anywhere tcpflags: FIN,SYN,RST,PSH,ACK,URG/NONE
0 0 DROP tcp -- any any anywhere anywhere tcpflags:! FIN,SYN,RST,ACK/SYN state NEW
0 0 DROP tcp -- any any anywhere anywhere tcpflags: FIN,SYN,RST,PSH,ACK,URG/FIN,SYN,RST,PSH,ACK,URG
0 0 ACCEPT all -- lo any anywhere anywhere
22 1848 ACCEPT icmp -- any any anywhere anywhere
361 37759 ACCEPT tcp -- any any anywhere anywhere tcp dpt:ssh
2766 3219K ACCEPT all -- any any anywhere anywhere state RELATED,ESTABLISHED
Bootstrapping New Node
I tried bootstrapping a new node, and it failed twice. First time simply timed out, just like the other client. Second time I had this:
SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
So why are my clients timing out?
I've been on this for 3 days, and would appreciate some help.
This looks like a result of network issues with Digital Ocean, and not Chef's fault.