rss-bridge 2026-01-26T22:33:21+00:00

The safety of speed: How we ship code 180 times per day

Averaging 180 ships per day, Intercom releases 20 deployments every hour. Find out how we ship code constantly, recover quickly, and use velocity to drive stability.

Engineering

Published
Jan 26, 2026

Averaging 180 ships per day, Intercom releases 20 deployments every hour. Find out how we ship code constantly, recover quickly, and use velocity to drive stability.

Danny Fallon Ryan Sherlock

10 min

“Speed is not the enemy of safety; it is the prerequisite for it.”

At Intercom, the average time from merging code to it being used by customers in production is just 12 minutes.

In January 2026, we are averaging 180 ships per workday – roughly 20 deployments every hour. Conventional wisdom suggests that to increase stability, you must slow down. We believe the opposite. At Intercom, speed is not the enemy of safety; it is the prerequisite for it. Accumulating code creates risk; shipping small batches minimizes it. Shipping is our company’s heartbeat.

Maintaining this frequency that fuels our product innovation, while targeting 99.8+% availability is a constant battle and has required over a decade of significant investment in systems, principles and processes. We protect the integrity of our systems through three distinct layers of defense: an automated pipeline that is simple, reliable and removes the need for manual intervention, a shipping workflow that promotes ownership and is flexible enough to provide guardrails that act as accelerants, and a recovery model optimising for mitigating inevitable failures. Here is how we’ve built each layer to ensure our velocity remains our greatest source of stability.

While Intercom consists of various services and frontend applications, this post focuses on our Ruby on Rails monolith. It is our core application and the one we deploy most frequently; we also deploy it to three different data-hosting regions with independent pipelines. While our other services (such as our Intercom UI) follow similar pipeline principles and safeguards, the Rails monolith is the best example of how we ship code at our scale.

[ProductionDeployment_Fin]

The automated pipeline

Designed to move code from merge to production as fast as possible while enforcing strict safety checks, our pipeline is optimized for speed and safety and is entirely automated with the majority of releases requiring no human intervention.

Build and parallel testing

The process begins when an engineer merges code to GitHub. Two things happen immediately:

The build: We compile the Rails application and its dependencies into a deployable asset that we call a slug. This takes four minutes.

Parallel CI: Our test suite runs in parallel with the build. Through extensive optimization, parallelization and test selection, the vast majority of CI builds finish in under five minutes.

Pre-production verification

Once built, the slug is deployed to a pre-production environment. CI does not block the progression of the slug to pre-production. Deploying to pre-production takes around two minutes. This environment serves no customer traffic, but it is connected to our production datastores and mirrors our production infrastructure variants (e.g. web serving, asynchronous worker) and is configured in a way that requests will exercise the pre-release code/workers.

Immediately after deployment we run and await the result of several automated approval gates to verify the release. These answer questions including:

Boot test: Does the application initialise correctly on the host?

CI check: Did the parallel test suite pass?

Functional synthetics: We use Datadog Synthetics to run browser-based tests on critical flows, like loading or editing a Fin workflow.

If any gate fails, the release is halted and does not go to production.

Production rollout and rolling restarts

Once the slug is approved for production, the code is promoted to thousands of large virtual machines. We use a deployment orchestrator to trigger these deployments simultaneously, but the actual rollout is decentralised.

This provides a staggered rollout, ensuring the entire fleet doesn’t change state at the exact same millisecond. Within these large virtual machines, we use a rolling restart mechanism at the process level:

An individual process with the old code is taken out of the customer-serving path

It is allowed to finish its current work and terminate gracefully once idle

It is replaced by a fresh process running the new code and returned to the serving path

This process ensures that from the moment a deployment starts, the first requests are being served by new code within ~2 minutes. Within 6 minutes, the vast majority of our global fleet has been transparently updated without any downtime. When the restart is triggered on every machine, the pipeline unblocks production so the next deployment can begin.

Monitoring pipeline health

If a piece of code doesn’t pass every safety check, it is automatically rejected before it ever touches a production server. Additionally we treat a stalled pipeline as a high-priority incident; if the automated system rejects three consecutive release attempts, it triggers a page to an on-call engineer.

To a customer, waiting for three failures might sound like a lot, but these are pre-production blocks. We page a human at this stage because if the shipping lane stops moving, code changes begin to pile up. Our stability relies on building and shipping in small steps. If the pipeline stays blocked, those tiny steps merge into a large changeset which increases the risk of the next deployment. We page an engineer to fix the pipeline so we can return to the small, safe, and frequent updates that keep our systems stable.

The shipping workflow

While our pipeline is highly automated, the responsibility for the quality of our code lies with the engineer, not the tools. The decision to merge is a human one. Our workflow is built on the principle of extreme ownership; the engineer who writes the code is accountable for its success in production.

Be present when you ship

A core tenet of our culture is that you must be present when you ship. There is a practical benefit to our 12-minute deployment cycle: it keeps the engineer “in the zone.” When a deployment takes hours, engineers naturally move on to the next task, a meeting, or a lunch break. By the time their code hits production, their context is gone and they aren’t watching anymore.

By keeping deployments fast, we ensure the engineer is still focused on the problem they just solved. To support this, our deployment system provides:

Notifications: Automatically messages the engineer on Slack the moment their code is submitted and as it moves through the stages.

Observability links: Includes direct links to relevant dashboards and logs in every PR and Slack message.

Prompted verification: Encourages the engineer to actively “watch the dials” and test their feature as it goes live. It is not acceptable to rely on “green builds”. You’re expected to watch your change go live and if you’re not prepared to rollback, you’re not prepared to ship.

We foster a no-blame culture focused on engagement. When we see an engineer trigger a rollback or open a revert immediately after a deployment, we don’t see it as a failure, we see it as a hallmark of an engineer who is actively watching their metrics and taking responsibility for the system’s health.

Feature flags

[...]

Original source