Rails Puma worker crash — production recovery
TL;DR
How to diagnose and recover from Puma worker crashes in production Rails applications.
Key facts
- Topic
- Production error triage
- Stack
- Ruby / Linux
TL;DR
Puma worker crashes in production cause dropped requests and 502 errors from the upstream nginx proxy. Unlike a clean shutdown, a crashed worker exits abnormally — the Puma master process detects the missing worker and forks a replacement, but in-flight requests on that worker are lost. If multiple workers crash simultaneously (common with OOM kills on constrained servers), the entire application becomes unavailable.
Crash types
Segfaults from native extensions
A SIGSEGV (segmentation fault) in a Puma worker almost always originates from a native C extension — not from Ruby code. Common culprits:
- nokogiri — parsing malformed HTML/XML
- mysql2 / pg — driver-level crashes on unusual query results or connection state corruption
- sassc / libsass — CSS compilation failures
- image processing gems (mini_magick, vips) — corrupted image inputs
Diagnose from system logs:
dmesg | grep -i segfault
journalctl -u puma --since "1 hour ago" | grep -i "signal\|segfault\|abort"
Update the offending gem and its underlying native library. If the crash is reproducible, isolate it:
bundle exec ruby -e "require 'nokogiri'; Nokogiri::HTML('<broken')"
OOM kills
The kernel OOM killer sends SIGKILL to the highest-scoring process — usually the largest Puma worker:
dmesg | grep -i "killed process"
See the Rails OOM error guide for detailed memory diagnosis and jemalloc configuration.
Deadlocks in threaded mode
Puma runs multiple threads per worker. Deadlocks cause workers to hang (not crash), resulting in timeouts:
# Send SIGINFO (or SIGURG on Linux) to get a thread backtrace
kill -URG $(cat tmp/pids/server.pid)
Check the Puma log output for thread backtraces showing where each thread is blocked.
Configure Puma for resilience
lowlevel_error_handler
Catch errors that escape the Rails middleware stack:
# config/puma.rb
lowlevel_error_handler do |e, env, status|
Rails.logger.error("Puma lowlevel error: #{e.class} - #{e.message}")
Rails.logger.error(e.backtrace&.first(10)&.join("\n"))
[500, { "Content-Type" => "text/plain" }, ["Internal Server Error\n"]]
end
Phased restart for recovery
Puma's phased restart (SIGUSR1) replaces workers one at a time without dropping the listening socket:
# Restart workers without downtime
kill -USR1 $(cat /var/www/myapp/tmp/pids/server.pid)
# Or via pumactl
bundle exec pumactl -S tmp/pids/puma.state phased-restart
Systemd service configuration
[Unit]
Description=Puma Rails Server
After=network.target postgresql.service
[Service]
User=deploy
Group=deploy
WorkingDirectory=/var/www/myapp
ExecStart=/home/deploy/.rbenv/shims/bundle exec puma -C config/puma.rb
ExecReload=/bin/kill -USR1 $MAINPID
Restart=always
RestartSec=5
Environment=RAILS_ENV=production
EnvironmentFile=/var/www/myapp/.env
KillMode=mixed
TimeoutStopSec=30
[Install]
WantedBy=multi-user.target
The ExecReload directive maps systemctl reload puma to a phased restart. KillMode=mixed sends SIGTERM to the master and SIGKILL to remaining workers after TimeoutStopSec.
Monitor with puma-status
Install the puma-status gem for a quick health overview:
gem install puma-status
puma-status tmp/pids/puma.state
This shows per-worker thread utilisation, request backlog, and memory usage — essential for diagnosing whether crashes correlate with load spikes.
Quick recovery checklist
systemctl status puma
journalctl -u puma --since "10 minutes ago" --no-pager | tail -50
systemctl reload puma # phased restart
sleep 5
curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/up
Where Reflex helps
Reflex monitors Puma master and worker process health, restart frequency, and request error rates. When workers crash, Reflex can trigger a phased restart, verify the application responds to health checks, and correlate crashes with recent deployments or traffic spikes — providing your team with a full incident timeline and diagnostic context. See How it works.