TL;DR

Emergency recovery guide for Linux servers with 100% disk usage causing cascading application failures.

Topic: Production error triage
Stack: Linux

TL;DR

A full disk on Linux causes cascading failures: databases refuse writes, logs stop recording, deployments fail, and applications crash. This is one of the most common — and most preventable — production incidents.

Emergency triage

Check disk usage immediately:

df -h

Identify the largest directories:

du -sh /* 2>/dev/null | sort -rh | head -10
du -sh /var/* | sort -rh | head -10

Or use ncdu for an interactive view:

ncdu /var

Quick wins to reclaim space

Clear systemd journal logs (often the single largest offender):

journalctl --disk-usage
sudo journalctl --vacuum-size=200M

Remove old apt packages and caches:

sudo apt autoremove -y && sudo apt clean

Clear Docker waste (if Docker is installed):

docker system prune -af --volumes

Find and truncate large log files:

find /var/log -name "*.log" -size +100M -exec ls -lh {} \;
sudo truncate -s 0 /var/log/large-app.log

Remove old deployment releases:

ls -lt /var/www/releases/ | tail -n +6 | awk '{print $NF}' | xargs rm -rf

Prevention

Configure logrotate for all application logs:

/var/log/myapp/*.log {
    daily
    rotate 7
    compress
    missingok
    notifempty
    copytruncate
}

Set up monitoring alerts at 80% disk usage — by the time you reach 95%, cascading failures have often already begun. Add a cron job to clean temporary files:

0 3 * * * find /tmp -type f -mtime +7 -delete

Where Reflex helps

Reflex monitors disk utilisation across all mounted volumes. When usage crosses a warning threshold, it can execute cleanup playbooks — vacuuming logs, purging old releases, clearing caches — and verify free space increases, often resolving the issue before it causes downtime. See How it works.

Disk full on Linux server — emergency recovery guide

TL;DR

Emergency triage

Quick wins to reclaim space

Prevention

Where Reflex helps