I'm using celery 2.5.1 with django on a micro ec2 instance with 613mb memory and as such have to keep memory consumption down.
Currently I'm using it only for the scheduler "celery beat" as a web interface to cron, though I hope to use it for more in the future. I've noticed it is the biggest consumer of memory on my micro machine even though I have configured the number of workers to one. I don't have many other options set in settings.py:
import djcelery
djcelery.setup_loader()
BROKER_BACKEND = 'djkombu.transport.DatabaseTransport'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
CELERY_RESULT_BACKEND = 'database'
BROKER_POOL_LIMIT = 2
CELERYD_CONCURRENCY = 1
CELERY_DISABLE_RATE_LIMITS = True
CELERYD_MAX_TASKS_PER_CHILD = 20
CELERYD_SOFT_TASK_TIME_LIMIT = 5 * 60
CELERYD_TASK_TIME_LIMIT = 6 * 60
Here's the details via top:
PID USER NI CPU% VIRT SHR RES MEM% Command
1065 wuser 10 0.0 283M 4548 85m 14.3 python manage_prod.py celeryd --beat
1025 wuser 10 1.0 577M 6368 67m 11.2 python manage_prod.py celeryd --beat
1071 wuser 10 0.0 578M 2384 62m 10.6 python manage_prod.py celeryd --beat
That's about 214mb of memory (and not much shared) to run a cron job occasionally. Have I done anything wrong, or can this be reduced about ten-fold somehow? ;)
Update: here's my upstart config:
description "Celery Daemon"
start on (net-device-up and local-filesystems)
stop on runlevel [016]
nice 10
respawn
respawn limit 5 10
chdir /home/wuser/wuser/
env CELERYD_OPTS=--concurrency=1
exec sudo -u wuser -H /usr/bin/python manage_prod.py celeryd --beat --concurrency=1 --loglevel info --logfile /var/tmp/celeryd.log
Update 2:
I notice there is one root process, one user child process, and two grandchildren from that. So I think it isn't a matter of duplicate startup.
root 34580 1556 sudo -u wuser -H /usr/bin/python manage_prod.py celeryd
wuser 577M 67548 └─ python manage_prod.py celeryd --beat --concurrency=1
wuser 578M 63784 ├─ python manage_prod.py celeryd --beat --concurrency=1
wuser 271M 76260 └─ python manage_prod.py celeryd --beat --concurrency=1
You can make sure that celery is only including the bare minimum of your code (I've seen celery configured to import people's entire web applications... not pretty). However, at the end of the day, you're looking at a very large chunk of Python, which is going to chew up a lot of memory by it's very nature.
If you want a low-memory task scheduling tool, I'd suggest real, honest-to-goodness cron.
A colleague shared a trick a couple of years after I asked this question.
Basically you factor your application task into a separate script, and have celery run it via a subprocess. That way it can reclaim all the application memory regularly like cron would.
Sorry haven't mentioned it until now, the site just reminded me the question exists. ;-)