There's a gap between "Docker works on my machine" and "Docker is running reliably in production at 2am without anyone watching it." This post is about that gap.
1. Not setting resource limits
Containers without CPU and memory limits will compete for resources. One misbehaving container can starve others on the same host. This seems obvious in retrospect — it wasn't obvious when we first started running containers in production and the documentation was more focused on getting things running than on operating them safely.
Set limits. Always. Even generous ones are better than none.
2. Running as root inside the container
The default for most base images. A container running as root with a misconfigured volume mount or a container escape vulnerability can cause disproportionate damage. Use a non-root user in your Dockerfile. It's three lines and it matters.
3. Ignoring graceful shutdown
SIGTERM handling. This tripped us on our first production deploy. The orchestrator sends SIGTERM to your container when it wants to stop it. If your application doesn't handle it, the orchestrator waits for the timeout, then sends SIGKILL. Mid-request. This is the most common cause of "mysterious 502s during deployments" we see.
"Every container that can't explain its shutdown behaviour to the orchestrator is a 502 waiting to happen."
4. Fat images in production
Multi-stage builds exist for a reason. Your production image should not contain your build toolchain, test dependencies, or development utilities. Smaller images mean faster pulls, smaller attack surface, and faster cold starts. The effort to implement multi-stage builds is about two hours the first time, then it's a template you copy.