Disk full on Linux server — emergency recovery guide
TL;DR
Emergency recovery guide for Linux servers with 100% disk usage causing cascading application failures.
Key facts
- Topic
- Production error triage
- Stack
- Linux
TL;DR
A full disk on Linux causes cascading failures: databases refuse writes, logs stop recording, deployments fail, and applications crash. This is one of the most common — and most preventable — production incidents.
Emergency triage
Check disk usage immediately:
df -h
Identify the largest directories:
du -sh /* 2>/dev/null | sort -rh | head -10
du -sh /var/* | sort -rh | head -10
Or use ncdu for an interactive view:
ncdu /var
Quick wins to reclaim space
Clear systemd journal logs (often the single largest offender):
journalctl --disk-usage
sudo journalctl --vacuum-size=200M
Remove old apt packages and caches:
sudo apt autoremove -y && sudo apt clean
Clear Docker waste (if Docker is installed):
docker system prune -af --volumes
Find and truncate large log files:
find /var/log -name "*.log" -size +100M -exec ls -lh {} \;
sudo truncate -s 0 /var/log/large-app.log
Remove old deployment releases:
ls -lt /var/www/releases/ | tail -n +6 | awk '{print $NF}' | xargs rm -rf
Prevention
Configure logrotate for all application logs:
/var/log/myapp/*.log {
daily
rotate 7
compress
missingok
notifempty
copytruncate
}
Set up monitoring alerts at 80% disk usage — by the time you reach 95%, cascading failures have often already begun. Add a cron job to clean temporary files:
0 3 * * * find /tmp -type f -mtime +7 -delete
Where Reflex helps
Reflex monitors disk utilisation across all mounted volumes. When usage crosses a warning threshold, it can execute cleanup playbooks — vacuuming logs, purging old releases, clearing caches — and verify free space increases, often resolving the issue before it causes downtime. See How it works.