7 server incidents every PHP developer will recognise
The Reflex Team12 minMarch 2026
You do not forget your first 3am page where the site is "up" but every fifth request is a 502 and the queue is quietly full of poisoned jobs. This is the field guide we wish we had years ago: symptoms that look like noise in one tool but are obvious in another, why they happen, what humans usually do, and how Reflex thinks about them.
We are not claiming magic. Some fixes are policy decisions. Some need more disk. Some need you to stop deploying on Friday. Here is the map.
1. PHP-FPM OOM / worker exhaustion
Symptoms: Spiky 502/504 from nginx, connect() to unix:/run/php/php8.2-fpm.sock failed (11: Resource temporarily unavailable) in error logs, or sudden silence as workers disappear.
Kernel side: dmesg shows Out of memory: Killed process php-fpm8.2 or similar. RSS climbed until the OOM killer chose your children.
Why it happens: pm.max_children set from vibes, not arithmetic. A Laravel app that legitimately needs 180 MB per worker on peak traffic will eat a 2 GiB box for lunch if you allow thirty children.
Human playbook: Lower max_children, add RAM, fix the leak, or cap queue concurrency. Tune pm.max_requests so long-lived leaks recycle.
Reflex angle: Correlate FPM pool pressure with per-request memory signals so you are not guessing whether it is traffic or a regression.
2. Nginx upstream 502 cascades
Symptoms: upstream prematurely closed connection and recv() failed (104: Connection reset by peer) while FPM logs show segfaults or timeouts.
Why: Upstream died mid-request—crash, timeout, or worker recycle under load.
Human playbook: Match fastcgi_read_timeout to reality, fix the crash, stabilise pools.
Reflex angle: Treat upstream health as first-class: detect reset storms and tie them to recent deploy markers when Pipeline is recording releases.
3. Dead or stalled queue workers
Symptoms: Jobs sit in pending, Horizon shows "inactive", or systemd restarts queue-worker every few minutes.
Why: Worker stuck on blocking I/O, deadlock, poison job that replays forever, or Redis connection limits.
Human playbook: queue:restart, drain failed jobs, fix the job payload, raise Redis maxclients carefully.
Reflex angle: Supervisor/systemd visibility plus sane alerts when workers stop heartbeating—before the backlog becomes a DDoS against your own database.
4. Disk full on /var or logs
Symptoms: writes fail, sessions break, uploads 500, apt complains.
Why: Laravel logs rotated badly, queue payloads huge, temp exports, or Docker layers on the wrong volume.
Human playbook: ncdu, fix logrotate, ship logs off-box, separate /var from /.
Reflex angle: Disk pressure is boring until it is not—we watch free space trends so "suddenly full" is not your first signal.
5. MySQL connection pool exhaustion
Symptoms: SQLSTATE[HY000] [1040] Too many connections, random 500s under burst.
Why: Too many FPM children × persistent connections, or a runaway while(true) in tinker on prod (we have seen it).
Human playbook: Lower persistence where unnecessary, raise max_connections with headroom math, fix connection leaks.
Reflex angle: Surface ORM-level connection churn alongside FPM counts so you do not tune the wrong knob.
6. Cache permission failures
Symptoms: file_put_contents(...storage/framework/cache...) Permission denied after a deploy or user switch.
Why: Deploy user ≠ FPM user, umask surprises, NFS ACLs.
Human playbook: Normalise ownership, use shared group, fix deploy script.
Reflex angle: Filesystem checks around deploy paths—cheap wins that prevent mystifying 500s.
7. Deploy-induced regressions
Symptoms: Everything was fine until twelve minutes ago—now errors cluster around one release SHA.
Why: Config cache pointing at old paths, migration half-applied, queue workers still on old code while web is new.
Human playbook: Rolling restarts, coordinated queue:restart, health gates.
Reflex angle: Pipeline records deploy markers; the Brain can reason about "new errors after marker X" instead of treating the world as static.
Closing
If this list felt like a greatest-hits album, good—that means you have lived it. We expand playbooks every sprint; the changelog is the source of truth for what actually shipped. When you are ready for fewer 3am déjà vu moments, that is the problem Reflex is built to chip away at—one boring, verifiable repair at a time.