Deploying Elixir with Kamal 2
Zero-downtime deployments, secrets management, and self-hosted Elixir apps
Kubernetes is overkill for most applications. I've watched teams spend months building deployment pipelines that could have been replaced by a single SSH command and a process manager; the tooling became the product and the product became an afterthought. The industry's obsession with container orchestration has created a generation of developers who can't deploy a web application without a YAML file longer than their actual code.
Kamal changes this calculus entirely.
DHH and the Rails team built Kamal to solve a specific problem: deploying containerized applications to bare metal or simple VPS instances without the operational overhead of Kubernetes.kamal-origin It handles the parts that matter--zero-downtime deployments, health checks, rolling updates--while ignoring the parts that don't. Service meshes. Custom resource definitions. The entire Kubernetes operator ecosystem. Gone.
For Elixir applications, the fit is almost suspicious. The BEAM already handles most of what you'd use Kubernetes for; process supervision, fault tolerance, load distribution across cores--these aren't application features, they're runtime guarantees.beam-scheduling You don't need an orchestrator to restart crashed containers when your runtime restarts crashed processes automatically.
What Kamal Actually Does
Kamal is a deployment tool written in Ruby that uses Docker and SSH to get your application onto one or more servers. It runs a reverse proxy called Kamal Proxy on each host, manages container lifecycle, and coordinates zero-downtime deployments. That's the whole thing.
Your application runs in a Docker container; Kamal Proxy sits in front of it, routing traffic and performing health checks. When you deploy, Kamal builds a new container, starts it alongside the old one, waits for health checks to pass, switches traffic, stops the old container. No etcd. No control plane. No operators. Just containers, a proxy, and SSH.
Version 2 reworked several core pieces.kamal-proxy-switch The proxy switched from Traefik to Kamal Proxy--a purpose-built replacement that's simpler and faster. Configuration moved from environment variables to a cleaner YAML structure; boot strategies, health checks, and secret management all got rebuilt from scratch.
Setting Up Kamal for Phoenix
Kamal is a Ruby gem, which feels ironic for Elixir deployments but works fine in practice:
gem install kamal
Or if you prefer to keep Ruby dependencies isolated, use Docker:
docker run -it --rm -v "$PWD:/workdir" -v "$SSH_AUTH_SOCK:/ssh-agent" -e SSH_AUTH_SOCK=/ssh-agent ghcr.io/basecamp/kamal:latest init
Initialize Kamal in your Phoenix project:
kamal init
This creates two files: config/deploy.yml and .kamal/secrets. The deploy configuration is where the real work happens.
A production-ready configuration for a Phoenix application looks like this:
# config/deploy.yml
service: myapp
image: registry.example.com/myapp
servers:
web:
hosts:
- 192.168.1.10
- 192.168.1.11
labels:
kamal-proxy.http_request_timeout: 60
proxy:
ssl: true
host: myapp.example.com
healthcheck:
interval: 3
path: /health
timeout: 3
registry:
server: registry.example.com
username: deploy
password:
- KAMAL_REGISTRY_PASSWORD
builder:
multiarch: false
args:
MIX_ENV: prod
env:
clear:
PHX_HOST: myapp.example.com
PORT: 4000
secret:
- DATABASE_URL
- SECRET_KEY_BASE
- PHX_SERVER
ssh:
user: deploy
accessories:
db:
image: postgres:16
host: 192.168.1.10
port: 5432
env:
secret:
- POSTGRES_PASSWORD
directories:
- data:/var/lib/postgresql/data
The servers section defines where your application runs. You can split roles; for a typical Phoenix app, you might separate web servers from background job workers:
servers:
web:
hosts:
- 192.168.1.10
- 192.168.1.11
worker:
hosts:
- 192.168.1.12
cmd: bin/myapp eval "MyApp.Worker.start()"
Each role gets its own commands, labels, and resource configurations. The multiarch: false flag deserves a note--unless you're building on an M-series Mac and deploying to x86 servers, you don't need cross-platform builds, and skipping them cuts build times dramatically.multiarch
Docker Configuration for Elixir Releases
Phoenix has shipped a production-ready Dockerfile since version 1.6.3.phx-dockerfile Multi-stage builds, minimal release images, the works. The generated Dockerfile with annotations:
# Build stage: compile the release
ARG ELIXIR_VERSION=1.16.0
ARG OTP_VERSION=26.2.1
ARG DEBIAN_VERSION=bookworm-20240130-slim
ARG BUILDER_IMAGE="hexpm/elixir:${ELIXIR_VERSION}-erlang-${OTP_VERSION}-debian-${DEBIAN_VERSION}"
ARG RUNNER_IMAGE="debian:${DEBIAN_VERSION}"
FROM ${BUILDER_IMAGE} as builder
# Install build dependencies
RUN apt-get update -y && apt-get install -y build-essential git \
&& apt-get clean && rm -f /var/lib/apt/lists/*_*
WORKDIR /app
# Install hex and rebar
RUN mix local.hex --force && \
mix local.rebar --force
# Set build environment
ENV MIX_ENV="prod"
# Install mix dependencies
COPY mix.exs mix.lock ./
RUN mix deps.get --only $MIX_ENV
RUN mkdir config
# Copy compile-time config files
COPY config/config.exs config/${MIX_ENV}.exs config/
RUN mix deps.compile
COPY priv priv
COPY lib lib
COPY assets assets
# Compile assets
RUN mix assets.deploy
# Compile the release
RUN mix compile
# Copy runtime config
COPY config/runtime.exs config/
# Build the release
COPY rel rel
RUN mix release
# Runtime stage: minimal image for running the release
FROM ${RUNNER_IMAGE}
RUN apt-get update -y && \
apt-get install -y libstdc++6 openssl libncurses5 locales ca-certificates \
&& apt-get clean && rm -f /var/lib/apt/lists/*_*
# Set the locale
RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
WORKDIR /app
RUN chown nobody /app
# Copy the release from builder
COPY --from=builder --chown=nobody:root /app/_build/prod/rel/myapp ./
USER nobody
# Set runtime environment
ENV HOME=/app
ENV MIX_ENV="prod"
ENV PHX_SERVER="true"
CMD ["/app/bin/server"]
Multi-stage build keeps the final image small; the release bundles the BEAM, so you don't need Erlang or Elixir installed at runtime. The nobody user runs the application without root privileges--a small thing that matters when your container is internet-facing.
One modification I always make: adding a health check endpoint. Kamal Proxy needs something to hit before routing traffic, and a simple plug handles it:
# lib/myapp_web/plugs/health_check.ex
defmodule MyAppWeb.Plugs.HealthCheck do
import Plug.Conn
def init(opts), do: opts
def call(%Plug.Conn{request_path: "/health"} = conn, _opts) do
conn
|> put_resp_content_type("text/plain")
|> send_resp(200, "ok")
|> halt()
end
def call(conn, _opts), do: conn
end
Add it to your endpoint before the router:
# lib/myapp_web/endpoint.ex
plug MyAppWeb.Plugs.HealthCheck
plug MyAppWeb.Router
Zero-Downtime Deployments
Kamal achieves zero-downtime through a choreographed sequence. When you run kamal deploy, the process looks like this:
- Build the Docker image locally or on a remote builder
- Push the image to your registry
- Pull the image on each server
- Start the new container with a different internal port
- Wait for health checks to pass
- Tell Kamal Proxy to route traffic to the new container
- Stop the old container
- Remove the old container
The health check configuration in deploy.yml controls step 5:
proxy:
healthcheck:
interval: 3 # Check every 3 seconds
path: /health # Hit this endpoint
timeout: 3 # Wait up to 3 seconds for response
Kamal Proxy attempts health checks repeatedly until the container responds with a 2xx status code. If the container never becomes healthy, the deployment fails and the old container keeps running. No half-deployed state. No broken traffic routing.
For Elixir applications, I configure the health check to verify more than just "the HTTP process started." A database connection that isn't ready, a migration that hasn't run--these are the things that bite you at 2 AM:
defmodule MyAppWeb.Plugs.HealthCheck do
import Plug.Conn
def init(opts), do: opts
def call(%Plug.Conn{request_path: "/health"} = conn, _opts) do
checks = [
{:database, check_database()},
{:migrations, check_migrations()}
]
case Enum.filter(checks, fn {_, status} -> status != :ok end) do
[] ->
conn
|> put_resp_content_type("application/json")
|> send_resp(200, Jason.encode!(%{status: "healthy", checks: %{}}))
|> halt()
failures ->
conn
|> put_resp_content_type("application/json")
|> send_resp(503, Jason.encode!(%{
status: "unhealthy",
failures: Map.new(failures)
}))
|> halt()
end
end
def call(conn, _opts), do: conn
defp check_database do
case Ecto.Adapters.SQL.query(MyApp.Repo, "SELECT 1", []) do
{:ok, _} -> :ok
{:error, reason} -> {:error, inspect(reason)}
end
end
defp check_migrations do
case Ecto.Migrator.migrations(MyApp.Repo) do
[] -> :ok
pending when is_list(pending) -> :ok
_ -> {:error, "migration check failed"}
end
end
end
This ensures your container doesn't receive traffic until it can actually serve requests.
Secrets Management
Kamal 2 overhauled secrets handling. The old approach was environment variables everywhere; the new approach uses .kamal/secrets as a dedicated file:
# .kamal/secrets
KAMAL_REGISTRY_PASSWORD=your-registry-password
DATABASE_URL=postgres://user:pass@db.example.com:5432/myapp_prod
SECRET_KEY_BASE=your-secret-key-base-here
PHX_SERVER=true
POSTGRES_PASSWORD=postgres-password
Add this file to .gitignore. Seriously. I've seen credentials committed to public repos more times than I'd like to admit.
Reference secrets in deploy.yml under the secret key:
env:
clear:
PHX_HOST: myapp.example.com
PORT: 4000
secret:
- DATABASE_URL
- SECRET_KEY_BASE
For team environments, Kamal supports pulling secrets from external providers--1Password, AWS Secrets Manager, and others:secret-backends
# Using 1Password
env:
secret:
- DATABASE_URL
- SECRET_KEY_BASE
secrets:
- KAMAL_REGISTRY_PASSWORD
- DATABASE_URL
- SECRET_KEY_BASE
secrets:
provider: 1password
Secrets get injected as environment variables when your container starts; your config/runtime.exs reads them the same way it always has:
# config/runtime.exs
import Config
if config_env() == :prod do
database_url =
System.get_env("DATABASE_URL") ||
raise "DATABASE_URL environment variable is not set"
config :myapp, MyApp.Repo,
url: database_url,
pool_size: String.to_integer(System.get_env("POOL_SIZE") || "10")
secret_key_base =
System.get_env("SECRET_KEY_BASE") ||
raise "SECRET_KEY_BASE environment variable is not set"
config :myapp, MyAppWeb.Endpoint,
http: [port: String.to_integer(System.get_env("PORT") || "4000")],
secret_key_base: secret_key_base,
server: System.get_env("PHX_SERVER") == "true"
end
Rolling Updates and Rollbacks
By default, Kamal deploys to all servers simultaneously. Fine for two boxes. Terrifying for twelve. For larger deployments, configure rolling updates:
servers:
web:
hosts:
- 192.168.1.10
- 192.168.1.11
- 192.168.1.12
- 192.168.1.13
boot:
limit: 2 # Deploy to 2 servers at a time
wait: 10 # Wait 10 seconds between batches
This deploys to two servers, waits for health checks, pauses another 10 seconds, then moves to the next pair. If any batch fails, the deployment stops; the remaining servers keep running the old version.
Rollbacks follow the same zero-downtime dance. Kamal tracks your recent images:
# See available versions
kamal app images
# Roll back to previous version
kamal rollback
For more control, specify a version:
kamal rollback [VERSION]
New container with the old image, health check, traffic switch, old container gone. Same process in reverse.
Running Migrations
Elixir releases support one-off commands through eval.release-migrations For migrations, add this to your release overlays:
# rel/overlays/bin/migrate
#!/bin/sh
cd -P -- "$(dirname -- "$0")"
exec ./myapp eval "MyApp.Release.migrate()"
And the corresponding module:
# lib/myapp/release.ex
defmodule MyApp.Release do
@app :myapp
def migrate do
load_app()
for repo <- repos() do
{:ok, _, _} = Ecto.Migrator.with_repo(repo, &Ecto.Migrator.run(&1, :up, all: true))
end
end
def rollback(repo, version) do
load_app()
{:ok, _, _} = Ecto.Migrator.with_repo(repo, &Ecto.Migrator.run(&1, :down, to: version))
end
defp repos do
Application.fetch_env!(@app, :ecto_repos)
end
defp load_app do
Application.load(@app)
end
end
Run migrations before deployment:
kamal app exec --reuse "bin/migrate"
Or add a deploy hook:
# config/deploy.yml
hooks:
pre-deploy:
cmd: bin/migrate
One thing to watch: if your migration is destructive--dropping a column, renaming a table--the old containers still running during the deploy will break. Run destructive migrations as a separate step after the deploy completes, or better yet, split them into add-then-remove across two deploys.
Kamal vs Fly.io vs Gigalixir
Three options worth considering for Elixir deployments. Each makes different tradeoffs.
Fly.io runs your containers on their global edge network. Geographic distribution, automatic TLS, a managed Postgres option; the free tier is generous and the deploy experience is genuinely good. The downsides: platform lock-in, pricing that can surprise you at scale, and an Elixir clustering story that--while improved--still requires their specific networking setup.fly-clustering
Gigalixir is purpose-built for Elixir. It understands releases, handles clustering automatically, and offers something close to Heroku's experience. More managed, less control. Higher costs at scale. Another platform dependency.
Kamal gives you the servers. You own them, you can SSH in, you can see exactly what's running. The tradeoff is real: security patches, disk space, networking--that's on you now.
How I think about the choice:
- Side project or startup validating an idea: Fly.io. Speed to deploy beats cost optimization every time.
- Elixir-specific needs like distributed clustering with libcluster: Gigalixir or self-managed.
- Production application where you need full control: Kamal.
- Existing infrastructure or compliance requirements: Kamal, and it's not close.
Cost comparison at 2 servers with 4GB RAM each:
| Platform | Approximate Monthly Cost |
|---|---|
| Fly.io | $60-80 |
| Gigalixir | $75-100 |
| Kamal + Hetzner | $15-25 |
| Kamal + DigitalOcean | $50-70 |
The Hetzner numbers aren't a typo.hetzner-pricing Kamal doesn't care where your servers live; a $7/month VPS in Falkenstein runs the same deployment process as a $100/month instance on AWS.
Production Checklist
A few things to verify before your first deploy:
-
Secrets aren't in git. Check
.gitignoreincludes.kamal/secrets. - Health check endpoint exists. Kamal Proxy needs something to hit.
-
Runtime configuration reads from environment.
config/runtime.exs, notconfig/prod.exs. -
Migrations run cleanly. Test
bin/migratelocally in a release build. -
SSH access works. Run
kamal server bootstrapfirst--this installs Docker and sets up the basics. -
Registry authentication works. Test
docker loginwith your credentials. - Firewall allows ports 80 and 443. Kamal Proxy needs both.
-
Server has Docker installed.
kamal server bootstraphandles this, but verify.
I'd run through a full deployment to a staging environment before touching production. Configuration issues are cheap to fix on a throwaway server; they're expensive at 3 AM with real traffic.
The Operational Reality
I've been running Elixir applications with Kamal for production workloads, and the experience is exactly what I hoped for. Boring. Deployments work. Rollbacks work. Health checks catch bad releases before they take traffic.
The BEAM's operational characteristics complement Kamal well. Hot code upgrades are elegant in theory but terrifying in practice;hot-upgrades with Kamal, I get the pragmatic version--new container, health check, traffic switch, old container gone. The result is the same. The process is debuggable.
Kubernetes solves problems you probably don't have. Kamal solves problems you definitely do.