I've noticed a very peculiar problem with Apache. I have a very high number of virtual hosts set up - it's around 501.
Problems started occuring after vhost number 493. First 493 vhosts work as expected, but as soon as I add vhost number 494, PHP stops communicating with memcache, and it times out on every read/write access.
Actually, I'm using memcache as a backend sessions store, so, php function:
session_start();
simply times out after 30 sec.
If I delete the random one of the 494 vhosts and restart apache, it starts working again.
I've set ulimit really high (65k) but it doesn't help. I've tried switching off ulimits completely, but no luck.
Do you guys have any ideas what more could I try?
I've tried stracing the httpd process to which I'm connected to, after I hit the enter in browser and after 30 sec wait commences.
This is the strace output:
select(1170, [1024 1169], [], NULL, {1, 0}) = 2 (in [1024 1169], left {0, 999998})
select(1170, [1024 1169], [], NULL, {1, 0}) = 2 (in [1024 1169], left {0, 999998})
select(1170, [1024 1169], [], NULL, {1, 0}) = 2 (in [1024 1169], left {0, 999998})
select(1170, [1024 1169], [], NULL, {1, 0}) = 2 (in [1024 1169], left {0, 999998})
select(1170, [1024 1169], [], NULL, {1, 0}) = 2 (in [1024 1169], left {0, 999998})
So basically apache is stuck on select(), and that's it, it repeats the select() system call indefinitely.
Next thing I have come up with is tcpdump, to see if the package actually goes through from apache, and it really does:
22:11:28.366677 IP6 ::1.51404 > ::1.11914: Flags [S], seq 2899674987, win 32752, options [mss 16376,sackOK,TS val 1384759049 ecr 0,nop,wscale 9], length 0
22:11:28.366697 IP6 ::1.11914 > ::1.51404: Flags [S.], seq 2034630080, ack 2899674988, win 32728, options [mss 16376,sackOK,TS val 1384759049 ecr 1384759049,nop,wscale 9], length 0
22:11:28.366709 IP6 ::1.51404 > ::1.11914: Flags [.], ack 1, win 64, options [nop,nop,TS val 1384759049 ecr 1384759049], length 0
22:11:28.366752 IP6 ::1.51404 > ::1.11914: Flags [P.], seq 1:41, ack 1, win 64, options [nop,nop,TS val 1384759049 ecr 1384759049], length 40
22:11:28.366758 IP6 ::1.11914 > ::1.51404: Flags [.], ack 41, win 64, options [nop,nop,TS val 1384759049 ecr 1384759049], length 0
22:11:28.366768 IP6 ::1.51404 > ::1.11914: Flags [P.], seq 41:90, ack 1, win 64, options [nop,nop,TS val 1384759050 ecr 1384759049], length 49
22:11:28.366772 IP6 ::1.11914 > ::1.51404: Flags [.], ack 90, win 64, options [nop,nop,TS val 1384759050 ecr 1384759050], length 0
22:11:28.366779 IP6 ::1.51404 > ::1.11914: Flags [P.], seq 90:122, ack 1, win 64, options [nop,nop,TS val 1384759050 ecr 1384759050], length 32
22:11:28.366783 IP6 ::1.11914 > ::1.51404: Flags [.], ack 122, win 64, options [nop,nop,TS val 1384759050 ecr 1384759050], length 0
22:11:28.367063 IP6 ::1.11914 > ::1.51404: Flags [P.], seq 1:12, ack 122, win 64, options [nop,nop,TS val 1384759050 ecr 1384759050], length 11
22:11:28.367070 IP6 ::1.51404 > ::1.11914: Flags [.], ack 12, win 64, options [nop,nop,TS val 1384759050 ecr 1384759050], length 0
22:11:28.367266 IP6 ::1.11914 > ::1.51404: Flags [P.], seq 12:20, ack 122, win 64, options [nop,nop,TS val 1384759050 ecr 1384759050], length 8
22:11:28.367275 IP6 ::1.51404 > ::1.11914: Flags [.], ack 20, win 64, options [nop,nop,TS val 1384759050 ecr 1384759050], length 0
22:11:28.367477 IP6 ::1.11914 > ::1.51404: Flags [P.], seq 20:25, ack 122, win 64, options [nop,nop,TS val 1384759050 ecr 1384759050], length 5
22:11:28.367489 IP6 ::1.51404 > ::1.11914: Flags [.], ack 25, win 64, options [nop,nop,TS val 1384759050 ecr 1384759050], length 0
22:11:28.367629 IP6 ::1.51404 > ::1.11914: Flags [P.], seq 122:181, ack 25, win 64, options [nop,nop,TS val 1384759050 ecr 1384759050], length 59
22:11:28.367859 IP6 ::1.11914 > ::1.51404: Flags [P.], seq 25:33, ack 181, win 64, options [nop,nop,TS val 1384759051 ecr 1384759050], length 8
22:11:28.367869 IP6 ::1.51404 > ::1.11914: Flags [P.], seq 181:230, ack 33, win 64, options [nop,nop,TS val 1384759051 ecr 1384759051], length 49
22:11:28.368102 IP6 ::1.11914 > ::1.51404: Flags [P.], seq 33:41, ack 230, win 64, options [nop,nop,TS val 1384759051 ecr 1384759051], length 8
22:11:28.368138 IP6 ::1.51404 > ::1.11914: Flags [F.], seq 230, ack 41, win 64, options [nop,nop,TS val 1384759051 ecr 1384759051], length 0
22:11:28.368195 IP6 ::1.11914 > ::1.51404: Flags [F.], seq 41, ack 231, win 64, options [nop,nop,TS val 1384759051 ecr 1384759051], length 0
22:11:28.368206 IP6 ::1.51404 > ::1.11914: Flags [.], ack 42, win 64, options [nop,nop,TS val 1384759051 ecr 1384759051], length 0
Next thing I did is GDB of Apache process when I issue a curl call to a page containing session_start(), and this is the output:
232 *(*new)->local_addr = *sock->local_addr;
241 if (sock->local_addr->sa.sin.sin_family == AF_INET) {
238 (*new)->local_addr->pool = connection_context;
241 if (sock->local_addr->sa.sin.sin_family == AF_INET) {
238 (*new)->local_addr->pool = connection_context;
241 if (sock->local_addr->sa.sin.sin_family == AF_INET) {
245 else if (sock->local_addr->sa.sin.sin_family == AF_INET6) {
246 (*new)->local_addr->ipaddr_ptr = &(*new)->local_addr->sa.sin6.sin6_addr;
249 (*new)->remote_addr->port = ntohs((*new)->remote_addr->sa.sin.sin_port);
250 if (sock->local_port_unknown) {
256 if (apr_is_option_set(sock, APR_TCP_NODELAY) == 1) {
257 apr_set_option(*new, APR_TCP_NODELAY, 1);
266 if (sock->local_interface_unknown ||
267 !memcmp(sock->local_addr->ipaddr_ptr,
266 if (sock->local_interface_unknown ||
276 (*new)->local_interface_unknown = 1;
293 apr_pool_cleanup_register((*new)->pool, (void *)(*new), socket_cleanup,
292 (*new)->inherit = 0;
293 apr_pool_cleanup_register((*new)->pool, (void *)(*new), socket_cleanup,
296 }
unixd_accept (accepted=0x7fff14ecddf0, lr=0x7fe93a905aa8, ptrans=<value optimized out>) at /usr/src/debug/httpd-2.2.15/os/unix/unixd.c:507
507 if (status == APR_SUCCESS) {
508 *accepted = csd;
649 }
child_main (child_num_arg=<value optimized out>) at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:650
650 SAFE_ACCEPT(accept_mutex_off()); /* unlock after "accept" */
652 if (status == APR_EGENERAL) {
656 else if (status != APR_SUCCESS) {
665 current_conn = ap_run_create_connection(ptrans, ap_server_conf, csd, my_child_num, sbh, bucket_alloc);
666 if (current_conn) {
667 ap_process_connection(current_conn, csd);
At this location there's a big pause (~30 sec), until php times out. After that, I got this:
668 ap_lingering_close(current_conn);
676 if (ap_mpm_pod_check(pod) == APR_SUCCESS) { /* selected as idle? */
680 ap_scoreboard_image->global->running_generation) { /* restart? */
679 else if (ap_my_generation !=
680 ap_scoreboard_image->global->running_generation) { /* restart? */
679 else if (ap_my_generation !=
551 while (!die_now && !shutdown_pending) {
559 apr_pool_clear(ptrans);
562 && requests_this_child++ >= ap_max_requests_per_child)) {
561 if ((ap_max_requests_per_child > 0
562 && requests_this_child++ >= ap_max_requests_per_child)) {
561 if ((ap_max_requests_per_child > 0
562 && requests_this_child++ >= ap_max_requests_per_child)) {
561 if ((ap_max_requests_per_child > 0
566 (void) ap_update_child_status(sbh, SERVER_READY, (request_rec *) NULL);
573 SAFE_ACCEPT(accept_mutex_on());
575 if (num_listensocks == 1) {
The weirdest thing is I cannot reproduce this on another machine. Same OS, same packages, same config (puppet) same kernel, different HW.
After couple of more weeks of debugging and watching out for problems, I finally stumbled upon a message:
I will try this fix, but boy oh boy, why do PHP guys do this? This is so ugly, hardcoding nofile limit is utterly broken design. Not to mention that if this is the solution, forcing me to recompile each PHP minor version and security patch and maintain my own packages is a big big hustle.
EDIT: after more extensive debugging, it seems it's not only PHP that's 'broken by design', there's also a bunch of issues with memcache extension itself.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=629896
https://bugs.php.net/bug.php?id=59876
Bugs are open for quite a while now, and nothing happens. I guess one should just dump memcache extension and find a solution independent from it :-/