Skip to main content

Spring Boot application crash recovery — Linux guide

TL;DR

How to diagnose and recover a Spring Boot application that crashes on Linux with systemd, Actuator health checks, and log analysis.

Key facts

Topic
Production error triage
Stack
Java / Linux

TL;DR

A Spring Boot application crashing on Linux typically exits with a non-zero status, leaving the systemd service in failed state. Users see connection-refused errors or 502s from the upstream nginx proxy until the process restarts. Spring Boot startup failures are particularly frustrating because the JVM starts, logs appear, and then the process dies — sometimes before the embedded Tomcat binds the port.

Common crash causes

  • Port already in usejava.net.BindException: Address already in use means another process (or a zombie of the previous instance) is holding the port. Check with ss -tlnp | grep :8080 and kill the stale process.
  • Bean creation failure — a @Bean method throws during ApplicationContext initialisation. The stack trace in journalctl names the exact bean. Common triggers: missing environment variables, unreachable database at startup, incompatible library versions after an upgrade.
  • Database connection refused — Spring Boot's auto-configuration tries to connect at startup. If PostgreSQL or MySQL is down, the context fails to load.
  • Insufficient memory — the JVM exits immediately if it cannot allocate the requested heap (-Xmx). The kernel OOM killer can also terminate the process post-startup.
  • Missing configurationapplication-production.yml not found or a required @Value property not set causes an IllegalArgumentException during context refresh.

Diagnosis workflow

Check systemd status and recent logs:

systemctl status springapp
journalctl -u springapp --since "30 minutes ago" --no-pager

Look for the Spring Boot banner followed by the exception. The exit code matters:

  • Exit 1 — application error (bean failure, missing config)
  • Exit 137 — killed by OOM killer (dmesg | grep -i "killed process")
  • Exit 143 — SIGTERM (normal shutdown, expected during restarts)

Test the application in the foreground for a clearer error:

sudo -u springapp /usr/bin/java -jar /opt/springapp/app.jar \
  --spring.profiles.active=production

Check for port conflicts:

ss -tlnp | grep :8080

If a stale process holds the port:

sudo kill $(sudo ss -tlnp | grep :8080 | awk '{print $NF}' | cut -d= -f2 | cut -d, -f1)

Systemd configuration for auto-recovery

A robust systemd service file with restart and health-check support:

[Unit]
Description=Spring Boot Application
After=network.target postgresql.service

[Service]
User=springapp
Group=springapp
WorkingDirectory=/opt/springapp
ExecStart=/usr/bin/java -Xms512m -Xmx1024m \
    -XX:+UseG1GC \
    -XX:+HeapDumpOnOutOfMemoryError \
    -XX:HeapDumpPath=/opt/springapp/dumps/ \
    -Dspring.profiles.active=production \
    -jar app.jar
Restart=always
RestartSec=10
SuccessExitStatus=143
StartLimitIntervalSec=300
StartLimitBurst=5
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

The StartLimitBurst=5 and StartLimitIntervalSec=300 settings prevent infinite restart loops — if the service crashes 5 times within 5 minutes, systemd stops trying and you get alerted.

Actuator health endpoint

Enable Spring Boot Actuator for production health monitoring:

# application-production.yml
management:
  endpoints:
    web:
      exposure:
        include: health, metrics, info
  endpoint:
    health:
      show-details: when-authorized
      probes:
        enabled: true
  health:
    db:
      enabled: true
    diskSpace:
      enabled: true

Verify health after a restart:

curl -s http://localhost:8080/actuator/health | python3 -m json.tool

Quick recovery checklist

sudo systemctl restart springapp
sleep 5
curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/actuator/health
journalctl -u springapp --since "1 minute ago" --no-pager | tail -20

Where Reflex helps

Reflex monitors your Spring Boot process state, restart frequency, and Actuator health endpoint continuously. When a crash is detected, Reflex can diagnose the cause from journal logs, restart the service, verify the health endpoint responds, and correlate the crash with recent deployments or dependency outages — providing a full incident timeline for your team. See How it works.