Why Your Container Ignores SIGTERM: PID 1, npm start, and the Lost 30 Seconds

The deployment is clean. The code change is small. But every rollout, the metrics dashboard lights up with a brief spike of 5xx errors. Maybe 0.3% of traffic for 30 seconds — then everything settles. The first instinct is to check memory, because exit code 137 is the classic OOMKill signature: kernel decides a process is over its memory limit, sends SIGKILL, process exits with 128 + 9 = 137. Teams will spend a week tuning memory limits, adding heap dumps, watching container metrics. Nothing budges.

The real culprit usually shows up in a much smaller place. One Dockerfile line: CMD ["npm", "start"]. That single line causes the container to ignore SIGTERM entirely. Kubernetes waits 30 seconds, gives up, sends SIGKILL — and the application never had a chance to drain connections, flush logs, or deregister from the load balancer. Exit code 137 every time. Every rollout. The shape of the bug is consistent enough that engineers running EKS see it surface across services, languages, and frameworks.

Why PID 1 is special

Every Linux process has a PID. The first process started by the kernel gets PID 1 — traditionally init or systemd. In a container, whatever runs as the entrypoint takes that role. The kernel has a specific rule about PID 1 spelled out in man 7 signal: signals are only delivered to PID 1 if the process has explicitly registered a handler for them. The default disposition (terminate, ignore, core dump) does not apply. If PID 1 has no handler for SIGTERM, the kernel silently drops the signal.

This is intentional. PID 1 must survive signals from misbehaving children during boot, and it must reap zombie processes. The kernel gives it special protection. In containers, that protection becomes a trap.

You can verify which signals a running process catches by reading /proc/<PID>/status and looking at the SigCgt field — the bitmask of caught signals. For a shell process, that bitmask shows handlers for SIGHUP (signal 1) and SIGCHLD (signal 17). SIGTERM (signal 15) is not in the set. So /bin/sh running as PID 1 silently drops every SIGTERM the orchestrator sends.

The npm start trap

A typical Node.js Dockerfile looks like:

FROM node:20-slim
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY . .
CMD ["npm", "start"]

That last line looks innocuous. But npm start doesn't run Node directly. It executes /bin/sh -c "node server.js" (or whatever scripts.start is defined as). The process tree inside the container looks like:

PID 1: /bin/sh -c "npm start"
PID 7: node server.js

Kubernetes sends SIGTERM to PID 1. PID 1 is /bin/sh. /bin/sh has no SIGTERM handler. Signal dropped. Node (PID 7) never hears about the shutdown request. After 30 seconds, kubelet sends SIGKILL to the entire process group. Everything dies immediately. Every connection that was open at that moment is reset.

The pattern is documented with strace evidence in "Kubernetes: Containers and the Lost SIGTERM Signals" on ITNEXT — you can watch SIGTERM arrive at the shell and do nothing, while the application process runs happily until SIGKILL.

One useful diagnostic: send SIGTERM directly to the child process inside a running container — kubectl exec <pod> -- kill -TERM 7 — to verify the application itself handles shutdown correctly. If the app drains properly when signaled at PID 7 but not when Kubernetes restarts the pod, the PID 1 trap is confirmed. Fix the entrypoint rather than working around it.

Three actual fixes

Option A: exec form directly to Node. The simplest fix bypasses the shell entirely.

# Before
CMD ["npm", "start"]

# After
CMD ["node", "server.js"]

JSON-array exec form makes Docker run the command directly without invoking a shell. Node becomes PID 1. Node has built-in SIGTERM handling — register process.on('SIGTERM', ...) and it fires. The catch: if the start script does anything beyond node server.js (env setup, migrations, port checks), those steps get skipped. This works cleanly only when the start script is a thin wrapper.

Option B: tini or dumb-init as PID 1. When the application needs a shell-based startup sequence, use a real init system as PID 1.

tini — minimal init built for containers. Docker has shipped it as --init since version 1.13. Add --init to docker run or set init: true in Compose. Reference: Docker docs on --init.
dumb-init — does the same job (zombie reaping + signal forwarding), slightly more configurable for complex process groups. Install with the package manager, prepend to entrypoint.

# With tini installed in the image
ENTRYPOINT ["tini", "--"]
CMD ["npm", "start"]

Both register handlers for all signals and forward to children. SIGTERM arrives at tini → tini forwards to npm → npm forwards to Node → Node runs shutdown handlers. Zombie reaping matters too: if Node spawns workers (cluster mode, child processes), a shell PID 1 won't reap them when they exit, leaving zombies in the process table.

Option C: exec inside package.json. Keep the npm interface but replace the shell process:

// package.json
{"scripts": {"start": "exec node server.js"}}

exec replaces the shell with the command that follows, so Node ends up as PID 1. Same result as Option A, but the npm interface is preserved for local development.

The Kubernetes side: grace periods and load balancer races

Even with PID 1 fixed, there's a second problem: the race between endpoint-update propagation and pod termination.

When Kubernetes terminates a pod, several things happen simultaneously. The pod enters Terminating. kubelet runs the preStop hook (if configured), then sends SIGTERM to PID 1. Meanwhile the endpoint controller removes the pod from the service's endpoint list and iptables rules update on each node. That iptables propagation isn't instantaneous — typically several seconds — and during that window the load balancer can still route new traffic to a pod that's already shutting down. See the Kubernetes pod lifecycle docs for the full flow.

Default terminationGracePeriodSeconds is 30. That's how long kubelet waits after SIGTERM before sending SIGKILL. Usually enough — but external load balancers introduce a subtler issue. If an ALB has 60-second idle timeout but the grace period is 30 seconds, the load balancer can still hold open keep-alive connections to a pod that already exited. Those connections get reset mid-request → HTTP 502. The fix is a preStop sleep that lets iptables propagate before the app starts refusing connections:

lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "sleep 5"]

The 5-second sleep gives iptables time to remove the pod from routing tables before SIGTERM arrives. Total grace period needs to cover preStop duration + application drain time. If the app needs 20 seconds to drain and preStop sleeps 5, set terminationGracePeriodSeconds: 30 at minimum.

Daemon shutdown ordering

Multi-container pods add another layer. Kubernetes sends SIGTERM to all containers simultaneously and gives no ordering guarantee. If a log-shipping sidecar exits faster than the application container, the application's final log lines (often the "graceful shutdown complete" message that observability relies on) never reach the logging backend. They get buffered in the sidecar's memory and discarded.

In one team's post-mortem, engineers running a log-forwarding daemonset found that final audit log entries for terminated pods were systematically missing — roughly 2–4 seconds of tail logs lost per termination. The fix was a combination of synchronous flush in the application before exit, a short preStop hook on the sidecar, and a longer terminationGracePeriodSeconds covering the full flush-and-forward cycle.

The general rule: in any multi-container pod, know which container ordering matters and write it explicitly. Kubernetes guarantees only "SIGTERM to all, wait grace period, SIGKILL to all."

What to watch in production

Exit code 137 in pod events. kubectl describe pod <name> shows the Last State. 137 means SIGKILL'd, but does not automatically mean OOMKill. Check kubectl describe node or node-level dmesg to disambiguate OOM from grace-period expiration.
5xx spikes correlated with rollout events. If error rates spike within 30–60 seconds of a pod entering Terminating, that's the shutdown race — not capacity, not a bug in the new version.
Staging termination tests. Run kubectl delete pod <name> in staging while tailing logs. The application should log "received SIGTERM, draining" within a second. If logs just stop, the signal never arrived.
Verify PID 1 directly. kubectl exec <pod> -- cat /proc/1/status | grep -E "Name|SigCgt" shows what's running as PID 1 and which signals it catches.

Limitation worth saying out loud

Everything in this post is Kubernetes-specific. ECS task lifecycle works differently — essential containers, stopTimeout up to 120 seconds, container draining tied to DRAINING state on the ECS instance. The PID 1 problem exists in ECS too, but the remediation steps for terminationGracePeriodSeconds, preStop hooks, and endpoint propagation don't map directly. Different tooling needed.

For related reading: why Kubernetes came after Docker covers the architectural reasons for the split between container runtimes and orchestration. The kubecontext guide is the practical companion for day-to-day cluster navigation. And timezone bugs in production documents a similar pattern where a system behaves correctly in dev and fails only under real conditions.