NestJS application crash recovery — production guide
TL;DR
How to diagnose and recover from NestJS application crashes in production caused by unhandled exceptions, DI failures, and transport disconnections.
Key facts
- Topic
- Production error triage
- Stack
- Node.js / Linux
TL;DR
A NestJS application crash in production typically means an unhandled exception escaped the framework's exception zone — bypassing filters and crashing the underlying Node.js process. Unlike Express, NestJS has a layered execution pipeline (guards, interceptors, pipes, filters), and an error thrown outside this pipeline kills the process outright.
Common causes
- Unhandled exceptions in interceptors or guards — errors thrown before the exception filter layer is reached are not caught by
@Catch()decorators - Dependency injection failures — a provider throwing during
onModuleInit()oronApplicationBootstrap()crashes the entire startup sequence - Microservice transport disconnections — Redis, NATS, or RabbitMQ transport losing connection without reconnect logic configured
- Circular dependency resolution failures —
forwardRef()missing or incorrectly applied, causing a runtimeundefinedinjection - Memory leaks in WebSocket gateways — subscription handlers accumulating without cleanup on disconnect
Diagnosis workflow
Check your process manager logs first:
pm2 logs app-name --lines 200
# or
journalctl -u nestjs-app --since "30 minutes ago"
NestJS's built-in logger prefixes output with the context class. Look for [ExceptionHandler], [InstanceLoader], or [NestMicroservice] to identify the failure layer.
Test the application startup in isolation:
node dist/main.js 2>&1 | head -50
If the crash happens on startup, the error is almost always a DI or configuration issue. Check for missing environment variables or unavailable service dependencies.
Register a global exception filter
A global filter ensures uncaught HTTP-layer exceptions return a structured error instead of crashing:
import { Catch, ArgumentsHost, HttpException, HttpStatus } from '@nestjs/common';
import { BaseExceptionFilter } from '@nestjs/core';
@Catch()
export class AllExceptionsFilter extends BaseExceptionFilter {
catch(exception: unknown, host: ArgumentsHost) {
const status = exception instanceof HttpException
? exception.getStatus()
: HttpStatus.INTERNAL_SERVER_ERROR;
console.error('Unhandled exception:', exception);
super.catch(exception, host);
}
}
Register it in main.ts:
const { httpAdapter } = app.get(HttpAdapterHost);
app.useGlobalFilters(new AllExceptionsFilter(httpAdapter));
Microservice reconnection
For transport-based microservices, enable automatic reconnection:
const app = await NestFactory.createMicroservice(AppModule, {
transport: Transport.REDIS,
options: {
host: 'localhost',
port: 6379,
retryAttempts: 10,
retryDelay: 3000,
},
});
PM2 graceful shutdown
NestJS must respond to SIGINT for clean restarts:
app.enableShutdownHooks();
In your PM2 ecosystem file, set kill_timeout high enough for in-flight requests to drain:
module.exports = {
apps: [{
name: 'nestjs-app',
script: 'dist/main.js',
instances: 'max',
exec_mode: 'cluster',
kill_timeout: 10000,
listen_timeout: 8000,
max_memory_restart: '1500M',
}],
};
Where Reflex helps
Reflex monitors NestJS process health, restart frequency, and crash logs continuously. When a crash is detected, Reflex can restart the process via PM2, verify the health endpoint responds, and correlate the crash with recent deployments or dependency outages — providing a full incident timeline for your team. See How it works.