Description of the problem
Regularly, cron php processes crash on our production server, which result in mails with the following body :
PHP Fatal error: PHP Startup: apc_mmap: mmap failed: in Unknown on line 0 Segmentation fault (core dumped)
I think the Segmentation fault (core dumped)
should result in core files being handled by apport and then written in /var/crashes
, but the files I can see there are there since yesterday, although the last crash occured today :
-rw-r----- 1 root whoopsie 1138528 mai 22 04:09 _usr_bin_php5.0.crash
-rw-r----- 1 frontoffice whoopsie 1166373 mai 20 18:00 _usr_bin_php5.1005.crash
-rw-r----- 1 frontoffice whoopsie 81622658 mai 22 00:05 _usr_sbin_php5-fpm.1005.crash
I tried to download the last one anyway, and ran gdb /usr/sbin/php5-fpm /tmp/_usr_sbin_php5-fpm.1005.crash
, only to be told that the file is not a core file (its format was not recognized).
Here is the server's apc configuration :
cat /etc/php5/cli/conf.d/20-apc.ini
extension=apc.so
apc.shm_size=512M
apc.ttl=3600
apc.user_ttl=3600
apc.enable_cli=1
I'm mostly worried about the apc.shm_size
… isn't it too high or too low ? I understand it has to do with the size of memory segments.
Question(s)
- What could be the problem ?
- How can I troubleshoot it (how can I get a valid core file ?) ?
System information
free
total used free shared buffers cached
Mem: 5081296 4354684 726612 0 374744 959968
-/+ buffers/cache: 3019972 2061324
Swap: 522236 516888 5348
cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=12.04
DISTRIB_CODENAME=precise
DISTRIB_DESCRIPTION="Ubuntu 12.04.2 LTS"
php -v
PHP 5.4.17-1~precise+1 (cli) (built: Jul 17 2013 18:14:06)
Copyright (c) 1997-2013 The PHP Group
Zend Engine v2.4.0, Copyright (c) 1998-2013 Zend Technologies
php -i
excerpt :
Configuration
apc
APC Support => enabled
Version => 3.1.13
APC Debugging => Disabled
MMAP Support => Enabled
MMAP File Mask =>
Locking type => pthread mutex Locks
Serialization Support => php
Revision => $Revision: 327136 $
Build Date => Nov 20 2012 18:41:36
Directive => Local Value => Master Value
apc.cache_by_default => On => On
apc.canonicalize => On => On
apc.coredump_unmap => Off => Off
apc.enable_cli => On => On
apc.enabled => On => On
apc.file_md5 => Off => Off
apc.file_update_protection => 2 => 2
apc.filters => no value => no value
apc.gc_ttl => 3600 => 3600
apc.include_once_override => Off => Off
apc.lazy_classes => Off => Off
apc.lazy_functions => Off => Off
apc.max_file_size => 1M => 1M
apc.mmap_file_mask => no value => no value
apc.num_files_hint => 1000 => 1000
apc.preload_path => no value => no value
apc.report_autofilter => Off => Off
apc.rfc1867 => Off => Off
apc.rfc1867_freq => 0 => 0
apc.rfc1867_name => APC_UPLOAD_PROGRESS => APC_UPLOAD_PROGRESS
apc.rfc1867_prefix => upload_ => upload_
apc.rfc1867_ttl => 3600 => 3600
apc.serializer => default => default
apc.shm_segments => 1 => 1
apc.shm_size => 512M => 512M
apc.shm_strings_buffer => 4M => 4M
apc.slam_defense => On => On
apc.stat => On => On
apc.stat_ctime => Off => Off
apc.ttl => 3600 => 3600
apc.use_request_time => On => On
apc.user_entries_hint => 4096 => 4096
apc.user_ttl => 3600 => 3600
apc.write_lock => On => On
php -m
[PHP Modules]
apc
bcmath
bz2
calendar
Core
ctype
curl
date
dba
dom
ereg
exif
fileinfo
filter
ftp
gd
gettext
hash
iconv
imagick
intl
json
ldap
libxml
mbstring
memcache
memcached
mhash
mysql
mysqli
openssl
pcntl
pcre
PDO
pdo_mysql
pdo_pgsql
pdo_sqlite
pgsql
Phar
posix
Reflection
session
shmop
SimpleXML
soap
sockets
SPL
sqlite3
standard
sysvmsg
sysvsem
sysvshm
tidy
tokenizer
wddx
xml
xmlreader
xmlwriter
zip
zlib
[Zend Modules]
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 39531
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 39531
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
This should really be a comment, but it's a bit long
If you don't know how, then how should we? You haven't told us how much RAM and swap there is, how much is used for other stuff. You haven't told us how much of the APC memory is used before the system crashes.
Have you checked the ulimit? Most likely the file has been truncated. Regardless, a segmentation fault suggests an issue within PHP itself (or APC, or an extension). Were you planning on fixing it yourself? Don't get me wrong - the guys who write the stuff will welcome well researched and documented bug reports - but the first thing you should be looking at (and including in your question here) is the version of PHP, the extensions installed and the version of APC.
The core file needs to be read on a system that is at least very similar to the one where the crash happened. In particular you need to have the same versions of the binary and all involved libraries in order for the pointers to line up. Usually it's easiest to run gdb on the machine where the crash happened. You'll also need to have versions of the binary and libraries installed that have the symbolic data you need to identify locations in the source files where things happened. That might mean the dev versions of the various libraries, but It depends what distribution of linux you run.
Are you sure you have the right version of APC installed? Eg it solved this person's problem: https://stackoverflow.com/questions/14756385/php-fatal-error-php-startup-apc-mmap-mmap-failed-in-unknown-on-line-0
Is APC failing for web processes as well as command line ones? If it only fails for one of those, then check that both php packages are the correct versions to work with your version of APC.
The first two dump files you listed look very small to me. Just over 1 MB. PHP would usually get bigger than that before it gets as far as running any of your code. That's likely consistent with failing before loading the code though, and given APC is involved, that's likely. The fpm one is a web process, not a cron job (unless your cron calls php via the web interface?)
Setting apc.shm_size to 512MB may or not be optimal for efficiency, but I wouldn't expect it to be the cause of a segfault. Corrupt data in your APC cache could conceivably be the problem though, so I suggest you clear the cache. The normal process is to use an apc.php file which is likely distributed with apc. Vendor distributions vary on that, but it is included with the upstream source code, so you should be able to get a copy easily enough. That gives you a web interface for looking at the state of your cache, and for clearing it. If APC is failing to the point where that doesn't work, I'm not sure what the process is. Probably locate the cache, delete it, and reinstall APC if needed to rebuild it. (kinda dirty approach, but low effort if you can afford a brief outage).