Forty companies. Different sizes, different stacks, different clouds. GitHub Actions, CircleCI, GitLab CI, Jenkins (yes, still), Bitbucket Pipelines. After running or auditing CI/CD for all of them, three things stand out as the clearest predictors of whether a team has pipeline-related incidents.
1. They treat their pipeline as production code
The teams with the fewest incidents have pipeline configuration that's reviewed, tested, and versioned the same way their application code is. No "quick YAML tweak" that bypasses review. No pipeline changes deployed on Friday afternoons. The pipeline is part of the system, not a wrapper around it.
2. They have exactly one deployment path to production
Not two. Not a fast-track and a slow path. One. Every deployment to production goes through the same gates — automated tests, environment-based approvals, deployment verification. The teams who have multiple paths always end up using the fast path when it matters most, which is exactly when you want the gates most.
"A second deployment path is a pressure valve. It'll get used when pressure is highest, which is when you can least afford to skip your checks."
3. Rollback is tested and fast
Not theoretically possible. Tested. Monthly. With a timer. The teams with the lowest mean time to recovery practice rollback the same way good teams practice incident response — deliberately, before they need it. If rolling back takes more than 10 minutes, it'll get deprioritised when an incident is live and the option you'll reach for is forward-fix instead.
These three patterns aren't complicated. They're discipline. And discipline compounds.