Spring Boot application crash recovery — Linux guide
TL;DR
How to diagnose and recover a Spring Boot application that crashes on Linux with systemd, Actuator health checks, and log analysis.
Key facts
- Topic
- Production error triage
- Stack
- Java / Linux
TL;DR
A Spring Boot application crashing on Linux typically exits with a non-zero status, leaving the systemd service in failed state. Users see connection-refused errors or 502s from the upstream nginx proxy until the process restarts. Spring Boot startup failures are particularly frustrating because the JVM starts, logs appear, and then the process dies — sometimes before the embedded Tomcat binds the port.
Common crash causes
- Port already in use —
java.net.BindException: Address already in usemeans another process (or a zombie of the previous instance) is holding the port. Check withss -tlnp | grep :8080and kill the stale process. - Bean creation failure — a
@Beanmethod throws duringApplicationContextinitialisation. The stack trace in journalctl names the exact bean. Common triggers: missing environment variables, unreachable database at startup, incompatible library versions after an upgrade. - Database connection refused — Spring Boot's auto-configuration tries to connect at startup. If PostgreSQL or MySQL is down, the context fails to load.
- Insufficient memory — the JVM exits immediately if it cannot allocate the requested heap (
-Xmx). The kernel OOM killer can also terminate the process post-startup. - Missing configuration —
application-production.ymlnot found or a required@Valueproperty not set causes anIllegalArgumentExceptionduring context refresh.
Diagnosis workflow
Check systemd status and recent logs:
systemctl status springapp
journalctl -u springapp --since "30 minutes ago" --no-pager
Look for the Spring Boot banner followed by the exception. The exit code matters:
- Exit 1 — application error (bean failure, missing config)
- Exit 137 — killed by OOM killer (
dmesg | grep -i "killed process") - Exit 143 — SIGTERM (normal shutdown, expected during restarts)
Test the application in the foreground for a clearer error:
sudo -u springapp /usr/bin/java -jar /opt/springapp/app.jar \
--spring.profiles.active=production
Check for port conflicts:
ss -tlnp | grep :8080
If a stale process holds the port:
sudo kill $(sudo ss -tlnp | grep :8080 | awk '{print $NF}' | cut -d= -f2 | cut -d, -f1)
Systemd configuration for auto-recovery
A robust systemd service file with restart and health-check support:
[Unit]
Description=Spring Boot Application
After=network.target postgresql.service
[Service]
User=springapp
Group=springapp
WorkingDirectory=/opt/springapp
ExecStart=/usr/bin/java -Xms512m -Xmx1024m \
-XX:+UseG1GC \
-XX:+HeapDumpOnOutOfMemoryError \
-XX:HeapDumpPath=/opt/springapp/dumps/ \
-Dspring.profiles.active=production \
-jar app.jar
Restart=always
RestartSec=10
SuccessExitStatus=143
StartLimitIntervalSec=300
StartLimitBurst=5
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
The StartLimitBurst=5 and StartLimitIntervalSec=300 settings prevent infinite restart loops — if the service crashes 5 times within 5 minutes, systemd stops trying and you get alerted.
Actuator health endpoint
Enable Spring Boot Actuator for production health monitoring:
# application-production.yml
management:
endpoints:
web:
exposure:
include: health, metrics, info
endpoint:
health:
show-details: when-authorized
probes:
enabled: true
health:
db:
enabled: true
diskSpace:
enabled: true
Verify health after a restart:
curl -s http://localhost:8080/actuator/health | python3 -m json.tool
Quick recovery checklist
sudo systemctl restart springapp
sleep 5
curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/actuator/health
journalctl -u springapp --since "1 minute ago" --no-pager | tail -20
Where Reflex helps
Reflex monitors your Spring Boot process state, restart frequency, and Actuator health endpoint continuously. When a crash is detected, Reflex can diagnose the cause from journal logs, restart the service, verify the health endpoint responds, and correlate the crash with recent deployments or dependency outages — providing a full incident timeline for your team. See How it works.