Skip to main content

Python out of memory error on server — diagnosis and fix

TL;DR

How to diagnose and fix Python MemoryError and OOM kills on production Linux servers.

Key facts

Topic
Production error triage
Stack
Python / Linux

TL;DR

Python's MemoryError or a Linux OOM kill on a Python process means the application exceeded available system memory. Python's garbage collector is generally effective, but certain patterns — loading large datasets, creating massive dictionaries, or leaking references — can exhaust memory quickly on constrained servers.

Common causes

  • Loading entire database tables or CSV files into memory with Pandas or raw queries
  • Building large dictionaries or lists that grow unboundedly during request processing
  • Image or PDF processing without streaming (Pillow loading full images into memory)
  • Circular references preventing garbage collection in CPython
  • Memory not released back to the OS due to Python's internal allocator fragmentation

Diagnosis

Check if the OOM killer terminated the process:

dmesg | grep -i "killed process"
journalctl -k | grep -i "out of memory"

Profile memory usage with tracemalloc:

import tracemalloc
tracemalloc.start()
# ... run your code ...
snapshot = tracemalloc.take_snapshot()
for stat in snapshot.statistics('lineno')[:10]:
    print(stat)

Monitor memory from outside the process:

ps aux --sort=-%mem | head -10
watch -n 1 'ps -o pid,rss,comm -p $(pgrep -f uvicorn)'

Fixes

Process data in chunks instead of loading everything at once:

# Process in batches with Django ORM
for obj in MyModel.objects.iterator(chunk_size=500):
    process(obj)

For Pandas, use the chunksize parameter:

for chunk in pd.read_csv('large.csv', chunksize=10000):
    process(chunk)

Add swap space as a safety net:

sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Where Reflex helps

Reflex tracks RSS memory across your Python processes and detects upward trends indicating a leak or unbounded growth. It can execute a graceful restart before the OOM killer fires, preserving request availability, and alert your team with the memory timeline and the process that triggered the threshold. See How it works.