Skip to main content

Node.js CPU spike — how to diagnose and fix

TL;DR

How to diagnose and resolve sudden CPU spikes in Node.js production applications.

Key facts

Topic
Production error triage
Stack
Node.js / Linux

TL;DR

A sustained CPU spike in Node.js blocks the event loop, causing all requests to queue and eventually timeout. Because Node.js is single-threaded by default, one CPU-bound operation can bring down an entire process.

Common causes

  • Synchronous operations in request handlers — JSON.parse on huge payloads, catastrophic regex backtracking, synchronous file I/O
  • Tight loops processing large datasets on the main thread
  • Garbage collection storms from excessive short-lived object allocations
  • Unoptimised dependencies doing CPU-heavy work (image processing, PDF generation, cryptographic operations)

Diagnosis workflow

Identify the CPU-bound process:

top -c -p $(pgrep -f 'node')

Profile the event loop to find blocking operations:

npx clinic flame -- node app.js
# Or generate a V8 CPU profile:
node --prof app.js
node --prof-process isolate-0xNNNNNNNN-v8.log > profile.txt

Check event loop lag — healthy applications stay under 20 ms:

const start = process.hrtime.bigint();
setImmediate(() => {
  const lag = Number(process.hrtime.bigint() - start) / 1e6;
  if (lag > 100) console.warn(`Event loop lag: ${lag}ms`);
});

Fixes

Offload CPU-heavy work to worker threads:

const { Worker } = require('worker_threads');

function processInWorker(data) {
  return new Promise((resolve, reject) => {
    const worker = new Worker('./heavy-task.js', { workerData: data });
    worker.on('message', resolve);
    worker.on('error', reject);
  });
}

For image processing or PDF generation, use a dedicated job queue (BullMQ) and process in background workers. Break large data processing into chunks with setImmediate() between batches to yield the event loop.

Where Reflex helps

Reflex monitors CPU utilisation and event loop latency on your Node.js servers. When a CPU spike exceeds configurable thresholds, Reflex can capture a diagnostic profile, restart the affected worker, and alert your team with the timeline and profile data attached. See How it works.