Programming

Deploying Elixir with Kamal 2

Zero-downtime deployments, secrets management, and self-hosted Elixir apps

22 min read elixir, deployment, kamal, docker, devops

Kubernetes is overkill for most applications. I've watched teams spend months building deployment pipelines that could have been replaced by a single SSH command and a process manager; the tooling became the product and the product became an afterthought. The industry's obsession with container orchestration has created a generation of developers who can't deploy a web application without a YAML file longer than their actual code.

Kamal changes this calculus entirely.

DHH and the Rails team built Kamal to solve a specific problem: deploying containerized applications to bare metal or simple VPS instances without the operational overhead of Kubernetes.kamal-origin It handles the parts that matter--zero-downtime deployments, health checks, rolling updates--while ignoring the parts that don't. Service meshes. Custom resource definitions. The entire Kubernetes operator ecosystem. Gone.

For Elixir applications, the fit is almost suspicious. The BEAM already handles most of what you'd use Kubernetes for; process supervision, fault tolerance, load distribution across cores--these aren't application features, they're runtime guarantees.beam-scheduling You don't need an orchestrator to restart crashed containers when your runtime restarts crashed processes automatically.

What Kamal Actually Does

Kamal is a deployment tool written in Ruby that uses Docker and SSH to get your application onto one or more servers. It runs a reverse proxy called Kamal Proxy on each host, manages container lifecycle, and coordinates zero-downtime deployments. That's the whole thing.

Your application runs in a Docker container; Kamal Proxy sits in front of it, routing traffic and performing health checks. When you deploy, Kamal builds a new container, starts it alongside the old one, waits for health checks to pass, switches traffic, stops the old container. No etcd. No control plane. No operators. Just containers, a proxy, and SSH.

Version 2 reworked several core pieces.kamal-proxy-switch The proxy switched from Traefik to Kamal Proxy--a purpose-built replacement that's simpler and faster. Configuration moved from environment variables to a cleaner YAML structure; boot strategies, health checks, and secret management all got rebuilt from scratch.

Setting Up Kamal for Phoenix

Kamal is a Ruby gem, which feels ironic for Elixir deployments but works fine in practice:

gem install kamal

Or if you prefer to keep Ruby dependencies isolated, use Docker:

docker run -it --rm -v "$PWD:/workdir" -v "$SSH_AUTH_SOCK:/ssh-agent" -e SSH_AUTH_SOCK=/ssh-agent ghcr.io/basecamp/kamal:latest init

Initialize Kamal in your Phoenix project:

kamal init

This creates two files: config/deploy.yml and .kamal/secrets. The deploy configuration is where the real work happens.

A production-ready configuration for a Phoenix application looks like this:

# config/deploy.yml
service: myapp

image: registry.example.com/myapp

servers:
  web:
    hosts:
      - 192.168.1.10
      - 192.168.1.11
    labels:
      kamal-proxy.http_request_timeout: 60

proxy:
  ssl: true
  host: myapp.example.com
  healthcheck:
    interval: 3
    path: /health
    timeout: 3

registry:
  server: registry.example.com
  username: deploy
  password:
    - KAMAL_REGISTRY_PASSWORD

builder:
  multiarch: false
  args:
    MIX_ENV: prod

env:
  clear:
    PHX_HOST: myapp.example.com
    PORT: 4000
  secret:
    - DATABASE_URL
    - SECRET_KEY_BASE
    - PHX_SERVER

ssh:
  user: deploy

accessories:
  db:
    image: postgres:16
    host: 192.168.1.10
    port: 5432
    env:
      secret:
        - POSTGRES_PASSWORD
    directories:
      - data:/var/lib/postgresql/data

The servers section defines where your application runs. You can split roles; for a typical Phoenix app, you might separate web servers from background job workers:

servers:
  web:
    hosts:
      - 192.168.1.10
      - 192.168.1.11
  worker:
    hosts:
      - 192.168.1.12
    cmd: bin/myapp eval "MyApp.Worker.start()"

Each role gets its own commands, labels, and resource configurations. The multiarch: false flag deserves a note--unless you're building on an M-series Mac and deploying to x86 servers, you don't need cross-platform builds, and skipping them cuts build times dramatically.multiarch

Docker Configuration for Elixir Releases

Phoenix has shipped a production-ready Dockerfile since version 1.6.3.phx-dockerfile Multi-stage builds, minimal release images, the works. The generated Dockerfile with annotations:

# Build stage: compile the release
ARG ELIXIR_VERSION=1.16.0
ARG OTP_VERSION=26.2.1
ARG DEBIAN_VERSION=bookworm-20240130-slim

ARG BUILDER_IMAGE="hexpm/elixir:${ELIXIR_VERSION}-erlang-${OTP_VERSION}-debian-${DEBIAN_VERSION}"
ARG RUNNER_IMAGE="debian:${DEBIAN_VERSION}"

FROM ${BUILDER_IMAGE} as builder

# Install build dependencies
RUN apt-get update -y && apt-get install -y build-essential git \
    && apt-get clean && rm -f /var/lib/apt/lists/*_*

WORKDIR /app

# Install hex and rebar
RUN mix local.hex --force && \
    mix local.rebar --force

# Set build environment
ENV MIX_ENV="prod"

# Install mix dependencies
COPY mix.exs mix.lock ./
RUN mix deps.get --only $MIX_ENV
RUN mkdir config

# Copy compile-time config files
COPY config/config.exs config/${MIX_ENV}.exs config/
RUN mix deps.compile

COPY priv priv
COPY lib lib
COPY assets assets

# Compile assets
RUN mix assets.deploy

# Compile the release
RUN mix compile

# Copy runtime config
COPY config/runtime.exs config/

# Build the release
COPY rel rel
RUN mix release

# Runtime stage: minimal image for running the release
FROM ${RUNNER_IMAGE}

RUN apt-get update -y && \
    apt-get install -y libstdc++6 openssl libncurses5 locales ca-certificates \
    && apt-get clean && rm -f /var/lib/apt/lists/*_*

# Set the locale
RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen

ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8

WORKDIR /app
RUN chown nobody /app

# Copy the release from builder
COPY --from=builder --chown=nobody:root /app/_build/prod/rel/myapp ./

USER nobody

# Set runtime environment
ENV HOME=/app
ENV MIX_ENV="prod"
ENV PHX_SERVER="true"

CMD ["/app/bin/server"]

Multi-stage build keeps the final image small; the release bundles the BEAM, so you don't need Erlang or Elixir installed at runtime. The nobody user runs the application without root privileges--a small thing that matters when your container is internet-facing.

One modification I always make: adding a health check endpoint. Kamal Proxy needs something to hit before routing traffic, and a simple plug handles it:

# lib/myapp_web/plugs/health_check.ex
defmodule MyAppWeb.Plugs.HealthCheck do
  import Plug.Conn

  def init(opts), do: opts

  def call(%Plug.Conn{request_path: "/health"} = conn, _opts) do
    conn
    |> put_resp_content_type("text/plain")
    |> send_resp(200, "ok")
    |> halt()
  end

  def call(conn, _opts), do: conn
end

Add it to your endpoint before the router:

# lib/myapp_web/endpoint.ex
plug MyAppWeb.Plugs.HealthCheck
plug MyAppWeb.Router

Zero-Downtime Deployments

Kamal achieves zero-downtime through a choreographed sequence. When you run kamal deploy, the process looks like this:

  1. Build the Docker image locally or on a remote builder
  2. Push the image to your registry
  3. Pull the image on each server
  4. Start the new container with a different internal port
  5. Wait for health checks to pass
  6. Tell Kamal Proxy to route traffic to the new container
  7. Stop the old container
  8. Remove the old container

The health check configuration in deploy.yml controls step 5:

proxy:
  healthcheck:
    interval: 3       # Check every 3 seconds
    path: /health     # Hit this endpoint
    timeout: 3        # Wait up to 3 seconds for response

Kamal Proxy attempts health checks repeatedly until the container responds with a 2xx status code. If the container never becomes healthy, the deployment fails and the old container keeps running. No half-deployed state. No broken traffic routing.

For Elixir applications, I configure the health check to verify more than just "the HTTP process started." A database connection that isn't ready, a migration that hasn't run--these are the things that bite you at 2 AM:

defmodule MyAppWeb.Plugs.HealthCheck do
  import Plug.Conn

  def init(opts), do: opts

  def call(%Plug.Conn{request_path: "/health"} = conn, _opts) do
    checks = [
      {:database, check_database()},
      {:migrations, check_migrations()}
    ]

    case Enum.filter(checks, fn {_, status} -> status != :ok end) do
      [] ->
        conn
        |> put_resp_content_type("application/json")
        |> send_resp(200, Jason.encode!(%{status: "healthy", checks: %{}}))
        |> halt()

      failures ->
        conn
        |> put_resp_content_type("application/json")
        |> send_resp(503, Jason.encode!(%{
          status: "unhealthy",
          failures: Map.new(failures)
        }))
        |> halt()
    end
  end

  def call(conn, _opts), do: conn

  defp check_database do
    case Ecto.Adapters.SQL.query(MyApp.Repo, "SELECT 1", []) do
      {:ok, _} -> :ok
      {:error, reason} -> {:error, inspect(reason)}
    end
  end

  defp check_migrations do
    case Ecto.Migrator.migrations(MyApp.Repo) do
      [] -> :ok
      pending when is_list(pending) -> :ok
      _ -> {:error, "migration check failed"}
    end
  end
end

This ensures your container doesn't receive traffic until it can actually serve requests.

Secrets Management

Kamal 2 overhauled secrets handling. The old approach was environment variables everywhere; the new approach uses .kamal/secrets as a dedicated file:

# .kamal/secrets
KAMAL_REGISTRY_PASSWORD=your-registry-password
DATABASE_URL=postgres://user:pass@db.example.com:5432/myapp_prod
SECRET_KEY_BASE=your-secret-key-base-here
PHX_SERVER=true
POSTGRES_PASSWORD=postgres-password

Add this file to .gitignore. Seriously. I've seen credentials committed to public repos more times than I'd like to admit.

Reference secrets in deploy.yml under the secret key:

env:
  clear:
    PHX_HOST: myapp.example.com
    PORT: 4000
  secret:
    - DATABASE_URL
    - SECRET_KEY_BASE

For team environments, Kamal supports pulling secrets from external providers--1Password, AWS Secrets Manager, and others:secret-backends

# Using 1Password
env:
  secret:
    - DATABASE_URL
    - SECRET_KEY_BASE

secrets:
  - KAMAL_REGISTRY_PASSWORD
  - DATABASE_URL
  - SECRET_KEY_BASE

secrets:
  provider: 1password

Secrets get injected as environment variables when your container starts; your config/runtime.exs reads them the same way it always has:

# config/runtime.exs
import Config

if config_env() == :prod do
  database_url =
    System.get_env("DATABASE_URL") ||
      raise "DATABASE_URL environment variable is not set"

  config :myapp, MyApp.Repo,
    url: database_url,
    pool_size: String.to_integer(System.get_env("POOL_SIZE") || "10")

  secret_key_base =
    System.get_env("SECRET_KEY_BASE") ||
      raise "SECRET_KEY_BASE environment variable is not set"

  config :myapp, MyAppWeb.Endpoint,
    http: [port: String.to_integer(System.get_env("PORT") || "4000")],
    secret_key_base: secret_key_base,
    server: System.get_env("PHX_SERVER") == "true"
end

Rolling Updates and Rollbacks

By default, Kamal deploys to all servers simultaneously. Fine for two boxes. Terrifying for twelve. For larger deployments, configure rolling updates:

servers:
  web:
    hosts:
      - 192.168.1.10
      - 192.168.1.11
      - 192.168.1.12
      - 192.168.1.13

boot:
  limit: 2           # Deploy to 2 servers at a time
  wait: 10           # Wait 10 seconds between batches

This deploys to two servers, waits for health checks, pauses another 10 seconds, then moves to the next pair. If any batch fails, the deployment stops; the remaining servers keep running the old version.

Rollbacks follow the same zero-downtime dance. Kamal tracks your recent images:

# See available versions
kamal app images

# Roll back to previous version
kamal rollback

For more control, specify a version:

kamal rollback [VERSION]

New container with the old image, health check, traffic switch, old container gone. Same process in reverse.

Running Migrations

Elixir releases support one-off commands through eval.release-migrations For migrations, add this to your release overlays:

# rel/overlays/bin/migrate
#!/bin/sh
cd -P -- "$(dirname -- "$0")"
exec ./myapp eval "MyApp.Release.migrate()"

And the corresponding module:

# lib/myapp/release.ex
defmodule MyApp.Release do
  @app :myapp

  def migrate do
    load_app()

    for repo <- repos() do
      {:ok, _, _} = Ecto.Migrator.with_repo(repo, &Ecto.Migrator.run(&1, :up, all: true))
    end
  end

  def rollback(repo, version) do
    load_app()
    {:ok, _, _} = Ecto.Migrator.with_repo(repo, &Ecto.Migrator.run(&1, :down, to: version))
  end

  defp repos do
    Application.fetch_env!(@app, :ecto_repos)
  end

  defp load_app do
    Application.load(@app)
  end
end

Run migrations before deployment:

kamal app exec --reuse "bin/migrate"

Or add a deploy hook:

# config/deploy.yml
hooks:
  pre-deploy:
    cmd: bin/migrate

One thing to watch: if your migration is destructive--dropping a column, renaming a table--the old containers still running during the deploy will break. Run destructive migrations as a separate step after the deploy completes, or better yet, split them into add-then-remove across two deploys.

Kamal vs Fly.io vs Gigalixir

Three options worth considering for Elixir deployments. Each makes different tradeoffs.

Fly.io runs your containers on their global edge network. Geographic distribution, automatic TLS, a managed Postgres option; the free tier is generous and the deploy experience is genuinely good. The downsides: platform lock-in, pricing that can surprise you at scale, and an Elixir clustering story that--while improved--still requires their specific networking setup.fly-clustering

Gigalixir is purpose-built for Elixir. It understands releases, handles clustering automatically, and offers something close to Heroku's experience. More managed, less control. Higher costs at scale. Another platform dependency.

Kamal gives you the servers. You own them, you can SSH in, you can see exactly what's running. The tradeoff is real: security patches, disk space, networking--that's on you now.

How I think about the choice:

  • Side project or startup validating an idea: Fly.io. Speed to deploy beats cost optimization every time.
  • Elixir-specific needs like distributed clustering with libcluster: Gigalixir or self-managed.
  • Production application where you need full control: Kamal.
  • Existing infrastructure or compliance requirements: Kamal, and it's not close.

Cost comparison at 2 servers with 4GB RAM each:

Platform Approximate Monthly Cost
Fly.io $60-80
Gigalixir $75-100
Kamal + Hetzner $15-25
Kamal + DigitalOcean $50-70

The Hetzner numbers aren't a typo.hetzner-pricing Kamal doesn't care where your servers live; a $7/month VPS in Falkenstein runs the same deployment process as a $100/month instance on AWS.

Production Checklist

A few things to verify before your first deploy:

  1. Secrets aren't in git. Check .gitignore includes .kamal/secrets.
  2. Health check endpoint exists. Kamal Proxy needs something to hit.
  3. Runtime configuration reads from environment. config/runtime.exs, not config/prod.exs.
  4. Migrations run cleanly. Test bin/migrate locally in a release build.
  5. SSH access works. Run kamal server bootstrap first--this installs Docker and sets up the basics.
  6. Registry authentication works. Test docker login with your credentials.
  7. Firewall allows ports 80 and 443. Kamal Proxy needs both.
  8. Server has Docker installed. kamal server bootstrap handles this, but verify.

I'd run through a full deployment to a staging environment before touching production. Configuration issues are cheap to fix on a throwaway server; they're expensive at 3 AM with real traffic.

The Operational Reality

I've been running Elixir applications with Kamal for production workloads, and the experience is exactly what I hoped for. Boring. Deployments work. Rollbacks work. Health checks catch bad releases before they take traffic.

The BEAM's operational characteristics complement Kamal well. Hot code upgrades are elegant in theory but terrifying in practice;hot-upgrades with Kamal, I get the pragmatic version--new container, health check, traffic switch, old container gone. The result is the same. The process is debuggable.

Kubernetes solves problems you probably don't have. Kamal solves problems you definitely do.


What do you think of what I said?

Share with me your thoughts. You can tweet me at @allanmacgregor.

Further reading