TL;DR

How to diagnose and recover from NestJS application crashes in production caused by unhandled exceptions, DI failures, and transport disconnections.

Topic: Production error triage
Stack: Node.js / Linux

TL;DR

A NestJS application crash in production typically means an unhandled exception escaped the framework's exception zone — bypassing filters and crashing the underlying Node.js process. Unlike Express, NestJS has a layered execution pipeline (guards, interceptors, pipes, filters), and an error thrown outside this pipeline kills the process outright.

Common causes

Unhandled exceptions in interceptors or guards — errors thrown before the exception filter layer is reached are not caught by @Catch() decorators
Dependency injection failures — a provider throwing during onModuleInit() or onApplicationBootstrap() crashes the entire startup sequence
Microservice transport disconnections — Redis, NATS, or RabbitMQ transport losing connection without reconnect logic configured
Circular dependency resolution failures — forwardRef() missing or incorrectly applied, causing a runtime undefined injection
Memory leaks in WebSocket gateways — subscription handlers accumulating without cleanup on disconnect

Diagnosis workflow

Check your process manager logs first:

pm2 logs app-name --lines 200
# or
journalctl -u nestjs-app --since "30 minutes ago"

NestJS's built-in logger prefixes output with the context class. Look for [ExceptionHandler], [InstanceLoader], or [NestMicroservice] to identify the failure layer.

Test the application startup in isolation:

node dist/main.js 2>&1 | head -50

If the crash happens on startup, the error is almost always a DI or configuration issue. Check for missing environment variables or unavailable service dependencies.

Register a global exception filter

A global filter ensures uncaught HTTP-layer exceptions return a structured error instead of crashing:

import { Catch, ArgumentsHost, HttpException, HttpStatus } from '@nestjs/common';
import { BaseExceptionFilter } from '@nestjs/core';

@Catch()
export class AllExceptionsFilter extends BaseExceptionFilter {
  catch(exception: unknown, host: ArgumentsHost) {
    const status = exception instanceof HttpException
      ? exception.getStatus()
      : HttpStatus.INTERNAL_SERVER_ERROR;
    console.error('Unhandled exception:', exception);
    super.catch(exception, host);
  }
}

const { httpAdapter } = app.get(HttpAdapterHost);
app.useGlobalFilters(new AllExceptionsFilter(httpAdapter));

Microservice reconnection

For transport-based microservices, enable automatic reconnection:

const app = await NestFactory.createMicroservice(AppModule, {
  transport: Transport.REDIS,
  options: {
    host: 'localhost',
    port: 6379,
    retryAttempts: 10,
    retryDelay: 3000,
  },
});

PM2 graceful shutdown

NestJS must respond to SIGINT for clean restarts:

app.enableShutdownHooks();

In your PM2 ecosystem file, set kill_timeout high enough for in-flight requests to drain:

module.exports = {
  apps: [{
    name: 'nestjs-app',
    script: 'dist/main.js',
    instances: 'max',
    exec_mode: 'cluster',
    kill_timeout: 10000,
    listen_timeout: 8000,
    max_memory_restart: '1500M',
  }],
};

Where Reflex helps

Reflex monitors NestJS process health, restart frequency, and crash logs continuously. When a crash is detected, Reflex can restart the process via PM2, verify the health endpoint responds, and correlate the crash with recent deployments or dependency outages — providing a full incident timeline for your team. See How it works.

NestJS application crash recovery — production guide

TL;DR

Common causes

Diagnosis workflow

Register a global exception filter

Microservice reconnection

PM2 graceful shutdown

Where Reflex helps