Celery task queue backed up — diagnosis and resolution
TL;DR
How to diagnose and resolve a Celery task queue growing unboundedly due to insufficient workers, slow tasks, or broker memory exhaustion.
Key facts
- Topic
- Production error triage
- Stack
- Python / Linux
TL;DR
A Celery task queue backing up means tasks are being published faster than workers can consume them. The queue grows unboundedly until the broker (Redis or RabbitMQ) runs out of memory, at which point new task publishes fail and the entire async processing pipeline collapses. This is distinct from workers not processing at all — workers are running, but throughput cannot keep pace with inflow.
Common causes
- Insufficient workers — the number of worker processes multiplied by their concurrency is lower than the sustained task arrival rate
- Long-running tasks blocking prefetch — a single task taking 10 minutes blocks one prefetch slot. With
prefetch_multiplier=4and 4 workers, just 4 slow tasks can block all 16 prefetch slots - ETA and countdown tasks accumulating — tasks with
etaorcountdownsit in the queue consuming broker memory until their scheduled time, inflating apparent queue depth - Missing task routing — all tasks land on the
defaultqueue regardless of priority, so a flood of low-priority tasks blocks high-priority ones - Broker memory exhaustion — Redis's
maxmemoryis hit, causingOOMrejections onLPUSH, or RabbitMQ memory alarms halt all publishers
Diagnosis workflow
Check queue depth at the broker level:
# Redis
redis-cli llen celery
redis-cli llen high_priority
redis-cli info memory | grep used_memory_human
# RabbitMQ
rabbitmqctl list_queues name messages consumers
Check active worker throughput with Celery's inspect commands:
celery -A myapp inspect active
celery -A myapp inspect stats
Launch Flower for a real-time dashboard:
celery -A myapp flower --port=5555
Flower shows task success/failure rates, worker pool utilisation, and queue depth trends over time.
Fix: task routing and priority queues
Separate fast and slow tasks onto different queues:
app.conf.task_routes = {
'myapp.tasks.send_email': {'queue': 'high_priority'},
'myapp.tasks.generate_report': {'queue': 'slow'},
'myapp.tasks.process_webhook': {'queue': 'high_priority'},
}
Run dedicated workers per queue:
celery -A myapp worker -Q high_priority -c 8 --max-tasks-per-child=200
celery -A myapp worker -Q slow -c 2 --max-tasks-per-child=50 --time-limit=600
Fix: rate limiting
Prevent a single task type from flooding the queue:
@app.task(rate_limit='100/m')
def send_notification(user_id):
pass
Fix: prefetch tuning
Reduce the prefetch multiplier so long tasks do not hoard queue slots:
app.conf.worker_prefetch_multiplier = 1
app.conf.task_acks_late = True
Setting prefetch_multiplier=1 means each worker only fetches one task at a time. Combined with task_acks_late=True, if a worker dies mid-task, the task returns to the queue instead of being lost.
Broker memory protection
For Redis, set a memory limit with an eviction policy that rejects new writes rather than silently dropping keys:
maxmemory 2gb
maxmemory-policy noeviction
For RabbitMQ, configure memory and disk watermarks:
vm_memory_high_watermark.relative = 0.6
disk_free_limit.relative = 1.5
Where Reflex helps
Reflex monitors Celery queue depths, worker throughput, and broker memory usage continuously. When queue depth grows while worker consumption flattens, Reflex detects the divergence, alerts your team, and can scale workers or pause low-priority task publishers to prevent broker memory exhaustion. See How it works.