Skip to main content

Celery task queue backed up — diagnosis and resolution

TL;DR

How to diagnose and resolve a Celery task queue growing unboundedly due to insufficient workers, slow tasks, or broker memory exhaustion.

Key facts

Topic
Production error triage
Stack
Python / Linux

TL;DR

A Celery task queue backing up means tasks are being published faster than workers can consume them. The queue grows unboundedly until the broker (Redis or RabbitMQ) runs out of memory, at which point new task publishes fail and the entire async processing pipeline collapses. This is distinct from workers not processing at all — workers are running, but throughput cannot keep pace with inflow.

Common causes

  • Insufficient workers — the number of worker processes multiplied by their concurrency is lower than the sustained task arrival rate
  • Long-running tasks blocking prefetch — a single task taking 10 minutes blocks one prefetch slot. With prefetch_multiplier=4 and 4 workers, just 4 slow tasks can block all 16 prefetch slots
  • ETA and countdown tasks accumulating — tasks with eta or countdown sit in the queue consuming broker memory until their scheduled time, inflating apparent queue depth
  • Missing task routing — all tasks land on the default queue regardless of priority, so a flood of low-priority tasks blocks high-priority ones
  • Broker memory exhaustion — Redis's maxmemory is hit, causing OOM rejections on LPUSH, or RabbitMQ memory alarms halt all publishers

Diagnosis workflow

Check queue depth at the broker level:

# Redis
redis-cli llen celery
redis-cli llen high_priority
redis-cli info memory | grep used_memory_human

# RabbitMQ
rabbitmqctl list_queues name messages consumers

Check active worker throughput with Celery's inspect commands:

celery -A myapp inspect active
celery -A myapp inspect stats

Launch Flower for a real-time dashboard:

celery -A myapp flower --port=5555

Flower shows task success/failure rates, worker pool utilisation, and queue depth trends over time.

Fix: task routing and priority queues

Separate fast and slow tasks onto different queues:

app.conf.task_routes = {
    'myapp.tasks.send_email': {'queue': 'high_priority'},
    'myapp.tasks.generate_report': {'queue': 'slow'},
    'myapp.tasks.process_webhook': {'queue': 'high_priority'},
}

Run dedicated workers per queue:

celery -A myapp worker -Q high_priority -c 8 --max-tasks-per-child=200
celery -A myapp worker -Q slow -c 2 --max-tasks-per-child=50 --time-limit=600

Fix: rate limiting

Prevent a single task type from flooding the queue:

@app.task(rate_limit='100/m')
def send_notification(user_id):
    pass

Fix: prefetch tuning

Reduce the prefetch multiplier so long tasks do not hoard queue slots:

app.conf.worker_prefetch_multiplier = 1
app.conf.task_acks_late = True

Setting prefetch_multiplier=1 means each worker only fetches one task at a time. Combined with task_acks_late=True, if a worker dies mid-task, the task returns to the queue instead of being lost.

Broker memory protection

For Redis, set a memory limit with an eviction policy that rejects new writes rather than silently dropping keys:

maxmemory 2gb
maxmemory-policy noeviction

For RabbitMQ, configure memory and disk watermarks:

vm_memory_high_watermark.relative = 0.6
disk_free_limit.relative = 1.5

Where Reflex helps

Reflex monitors Celery queue depths, worker throughput, and broker memory usage continuously. When queue depth grows while worker consumption flattens, Reflex detects the divergence, alerts your team, and can scale workers or pause low-priority task publishers to prevent broker memory exhaustion. See How it works.