42m read
Tags: elixir, ai, llm, openai, machine-learning

Elixir and AI: Building LLM-Powered Features

The conventional wisdom says Python owns the AI stack. Every tutorial, every example, every getting-started guide assumes you're writing Python. Fair enough for model training and research; most of the tooling lives there, and the ecosystem is mature. But production systems that serve AI features to real users? That's a different problem entirely.

Most AI-powered features aren't training models. They're calling APIs, processing responses, handling failures, and serving results to users who expect sub-second latency. This is I/O-bound, concurrency-heavy, failure-prone work — exactly the domain where Elixir has spent the last decade proving itself.

The BEAM Was Built for This

The BEAM virtual machine was designed for telecom switches — systems that process millions of concurrent operations, never go down, and recover gracefully from failures.beam-telecom-origin Those constraints map onto LLM workloads more directly than you'd expect.

Concurrency without complexity. When you call an LLM API, you wait. OpenAI's GPT-4 might take 2-10 seconds to respond; in a thread-per-request model, that's catastrophic. Your server threads block, throughput craters, requests start dropping. Elixir handles this differently. Spawn 10,000 concurrent LLM requests and your system barely notices; each request runs in its own process, waiting on I/O without consuming meaningful resources.beam-process-cost

Fault tolerance baked in. LLM APIs fail. They timeout. They return malformed responses. They rate-limit you without warning. In most languages you handle this with try-catch blocks and retry logic scattered through your codebase — hope-based engineering. In Elixir, supervision trees encode your recovery strategy explicitly. A failed API call crashes its process; the supervisor restarts it; the rest of your system continues unaffected.

Streaming as a first-class citizen. Modern LLM APIs stream tokens as they generate. This isn't a nice-to-have — it's the difference between a UI that feels responsive and one that feels broken. Phoenix LiveView and Elixir's process model make streaming trivial to implement and reason about; each token chunk is just a message in a mailbox.sse-streaming

LLM API Clients in Elixir

The Elixir ecosystem for LLM integration has grown fast in the last eighteen months. Two libraries handle most use cases: openai_ex for OpenAI's APIs and anthropix for Anthropic's Claude models.

OpenAI Integration

# mix.exs
defp deps do
  [
    {:openai_ex, "~> 0.8"},
    {:instructor, "~> 0.1"}
  ]
end
defmodule MyApp.LLM.OpenAI do
  @moduledoc """
  OpenAI API client wrapper with sensible defaults.
  """

  def client do
    OpenaiEx.new(System.get_env("OPENAI_API_KEY"))
    |> OpenaiEx.with_receive_timeout(60_000)
  end

  def chat_completion(messages, opts \\ []) do
    model = Keyword.get(opts, :model, "gpt-4-turbo")
    temperature = Keyword.get(opts, :temperature, 0.7)

    request = %{
      model: model,
      messages: messages,
      temperature: temperature
    }

    case OpenaiEx.Chat.Completions.create(client(), request) do
      {:ok, %{"choices" => [%{"message" => %{"content" => content}} | _]}} ->
        {:ok, content}

      {:error, reason} ->
        {:error, reason}
    end
  end
end

Anthropic Integration

defmodule MyApp.LLM.Anthropic do
  @moduledoc """
  Anthropic Claude API client.
  """

  @base_url "https://api.anthropic.com/v1"

  def chat(messages, opts \\ []) do
    model = Keyword.get(opts, :model, "claude-3-5-sonnet-20241022")
    max_tokens = Keyword.get(opts, :max_tokens, 4096)

    body = %{
      model: model,
      max_tokens: max_tokens,
      messages: format_messages(messages)
    }

    headers = [
      {"x-api-key", System.get_env("ANTHROPIC_API_KEY")},
      {"anthropic-version", "2023-06-01"},
      {"content-type", "application/json"}
    ]

    case Req.post("#{@base_url}/messages", json: body, headers: headers) do
      {:ok, %{status: 200, body: %{"content" => [%{"text" => text} | _]}}} ->
        {:ok, text}

      {:ok, %{status: status, body: body}} ->
        {:error, {:api_error, status, body}}

      {:error, reason} ->
        {:error, reason}
    end
  end

  defp format_messages(messages) do
    Enum.map(messages, fn
      %{role: role, content: content} -> %{"role" => to_string(role), "content" => content}
      map when is_map(map) -> map
    end)
  end
end

Instructor: Structured LLM Output

Raw LLM output is a string. You want structured data. That gap — between what the model returns and what your application can safely consume — is where bugs breed and production incidents start.

The Instructor library closes it by using Ecto schemas to define expected output shapes.instructor-port The library handles prompt engineering, JSON schema generation, and response validation automatically. You define a schema; the LLM fills it; Ecto validates the result.

defmodule MyApp.AI.Schemas.SentimentAnalysis do
  use Ecto.Schema
  use Instructor.Validator

  @llm_doc """
  Analyze the sentiment of the provided text.
  """

  @primary_key false
  embedded_schema do
    field :sentiment, Ecto.Enum, values: [:positive, :negative, :neutral]
    field :confidence, :float
    field :reasoning, :string
  end

  @impl true
  def validate_changeset(changeset) do
    changeset
    |> Ecto.Changeset.validate_number(:confidence,
      greater_than_or_equal_to: 0.0,
      less_than_or_equal_to: 1.0
    )
    |> Ecto.Changeset.validate_required([:sentiment, :confidence, :reasoning])
  end
end
defmodule MyApp.AI.Analyzer do
  def analyze_sentiment(text) do
    Instructor.chat_completion(
      model: "gpt-4-turbo",
      response_model: MyApp.AI.Schemas.SentimentAnalysis,
      messages: [
        %{role: "user", content: "Analyze the sentiment of this text: #{text}"}
      ]
    )
  end
end

The response isn't a string you parse. It's an Ecto struct with validated fields:

{:ok, %MyApp.AI.Schemas.SentimentAnalysis{
  sentiment: :positive,
  confidence: 0.92,
  reasoning: "The text uses enthusiastic language..."
}}

Complex Structured Extraction

Instructor really shines with nested, multi-field extraction — the kind of thing that would require fragile regex or hand-rolled JSON parsing in other approaches.

defmodule MyApp.AI.Schemas.ExtractedContact do
  use Ecto.Schema
  use Instructor.Validator

  @primary_key false
  embedded_schema do
    field :name, :string
    field :email, :string
    field :phone, :string
    field :company, :string

    embeds_many :addresses, Address, primary_key: false do
      field :street, :string
      field :city, :string
      field :state, :string
      field :zip, :string
      field :type, Ecto.Enum, values: [:home, :work, :other]
    end
  end

  @impl true
  def validate_changeset(changeset) do
    changeset
    |> Ecto.Changeset.validate_required([:name])
    |> Ecto.Changeset.validate_format(:email, ~r/@/)
  end
end
def extract_contact_info(raw_text) do
  Instructor.chat_completion(
    model: "gpt-4-turbo",
    response_model: MyApp.AI.Schemas.ExtractedContact,
    messages: [
      %{
        role: "system",
        content: "Extract contact information from the provided text. Be thorough."
      },
      %{role: "user", content: raw_text}
    ]
  )
end

The schema is the contract. Instructor enforces it. You stop hoping the LLM returns valid JSON and start knowing it does.

Streaming LLM Responses in LiveView

Users don't wait well. A 5-second spinner while GPT-4 thinks? That's a UX failure.perceived-latency Streaming tokens as they generate changes everything; users see progress, start reading immediately, and perceive the system as faster even when total latency is identical.

LiveView makes this almost boring to implement.

defmodule MyAppWeb.ChatLive do
  use MyAppWeb, :live_view

  def mount(_params, _session, socket) do
    {:ok, assign(socket, messages: [], streaming: false, current_response: "")}
  end

  def handle_event("send_message", %{"message" => content}, socket) do
    user_message = %{role: :user, content: content}
    messages = socket.assigns.messages ++ [user_message]

    # Start streaming in a separate process
    pid = self()
    Task.start(fn -> stream_response(messages, pid) end)

    {:noreply,
     socket
     |> assign(messages: messages, streaming: true, current_response: "")}
  end

  def handle_info({:stream_chunk, chunk}, socket) do
    current = socket.assigns.current_response <> chunk
    {:noreply, assign(socket, current_response: current)}
  end

  def handle_info(:stream_complete, socket) do
    assistant_message = %{role: :assistant, content: socket.assigns.current_response}

    {:noreply,
     socket
     |> assign(
       messages: socket.assigns.messages ++ [assistant_message],
       streaming: false,
       current_response: ""
     )}
  end

  defp stream_response(messages, pid) do
    OpenaiEx.Chat.Completions.create(
      MyApp.LLM.OpenAI.client(),
      %{
        model: "gpt-4-turbo",
        messages: format_messages(messages),
        stream: true
      },
      stream: fn
        %{"choices" => [%{"delta" => %{"content" => content}}]} when is_binary(content) ->
          send(pid, {:stream_chunk, content})

        %{"choices" => [%{"finish_reason" => "stop"}]} ->
          send(pid, :stream_complete)

        _other ->
          :ok
      end
    )
  end

  defp format_messages(messages) do
    Enum.map(messages, fn %{role: role, content: content} ->
      %{"role" => to_string(role), "content" => content}
    end)
  end
end
<div class="chat-container">
  <%= for message <- @messages do %>
    <div class={"message #{message.role}"}>
      <%= message.content %>
    </div>
  <% end %>

  <%= if @streaming do %>
    <div class="message assistant streaming">
      <%= @current_response %><span class="cursor">|</span>
    </div>
  <% end %>
</div>

<form phx-submit="send_message">
  <input type="text" name="message" disabled={@streaming} />
  <button type="submit" disabled={@streaming}>Send</button>
</form>

Streaming is just message passing. The LLM client sends chunks to the LiveView process; the process updates socket assigns; Phoenix pushes the diff over the WebSocket. No polling. No callback hell. Just processes talking to each other — the thing Elixir was literally designed to do.

Building an AI Agent Pattern

An AI agent is a loop: observe state, decide action, execute, observe new state.react-pattern If that sounds like a GenServer, it should. The pattern maps directly.

defmodule MyApp.Agent do
  use GenServer
  require Logger

  defmodule State do
    defstruct [:goal, :context, :history, :max_iterations, :current_iteration]
  end

  # Client API
  def start_link(opts) do
    GenServer.start_link(__MODULE__, opts)
  end

  def run(pid, goal) do
    GenServer.call(pid, {:run, goal}, :infinity)
  end

  # Server callbacks
  @impl true
  def init(opts) do
    {:ok,
     %State{
       context: Keyword.get(opts, :context, %{}),
       history: [],
       max_iterations: Keyword.get(opts, :max_iterations, 10),
       current_iteration: 0
     }}
  end

  @impl true
  def handle_call({:run, goal}, _from, state) do
    state = %{state | goal: goal, current_iteration: 0, history: []}
    result = run_loop(state)
    {:reply, result, state}
  end

  defp run_loop(%{current_iteration: i, max_iterations: max} = state) when i >= max do
    {:error, :max_iterations_exceeded}
  end

  defp run_loop(state) do
    case decide_action(state) do
      {:complete, result} ->
        {:ok, result}

      {:action, action, params} ->
        case execute_action(action, params, state) do
          {:ok, observation} ->
            new_history = state.history ++ [{action, params, observation}]
            new_state = %{state | history: new_history, current_iteration: state.current_iteration + 1}
            run_loop(new_state)

          {:error, reason} ->
            {:error, {:action_failed, action, reason}}
        end
    end
  end

  defp decide_action(state) do
    prompt = build_decision_prompt(state)

    case MyApp.AI.decide(prompt) do
      {:ok, %{action: "complete", result: result}} ->
        {:complete, result}

      {:ok, %{action: action, params: params}} ->
        {:action, String.to_existing_atom(action), params}

      {:error, reason} ->
        {:error, reason}
    end
  end

  defp execute_action(:search, %{query: query}, _state) do
    MyApp.Tools.search(query)
  end

  defp execute_action(:calculate, %{expression: expr}, _state) do
    MyApp.Tools.calculate(expr)
  end

  defp execute_action(:fetch_url, %{url: url}, _state) do
    MyApp.Tools.fetch_url(url)
  end

  defp build_decision_prompt(state) do
    history_text =
      state.history
      |> Enum.map(fn {action, params, observation} ->
        "Action: #{action}(#{inspect(params)})\nObservation: #{observation}"
      end)
      |> Enum.join("\n\n")

    """
    Goal: #{state.goal}

    Available actions: search, calculate, fetch_url, complete

    History:
    #{history_text}

    What action should be taken next? If the goal is achieved, use action "complete" with the result.
    """
  end
end

The GenServer maintains agent state across iterations; the supervision tree handles crashes. If an agent dies mid-execution, it restarts cleanly. The loop terminates when the LLM decides the goal is complete or when the iteration limit kicks in — a safeguard you'll be grateful for the first time a model gets stuck in a reasoning loop.agent-runaway

Caching and Rate Limiting

LLM calls are expensive. Both in latency and in money.token-costs Caching identical requests is the obvious first move; rate limiting protects you from runaway costs and API bans.

Response Caching with Cachex

defmodule MyApp.LLM.CachedClient do
  @cache_name :llm_cache
  @default_ttl :timer.hours(24)

  def chat_completion(messages, opts \\ []) do
    cache_key = build_cache_key(messages, opts)

    case Cachex.get(@cache_name, cache_key) do
      {:ok, nil} ->
        result = MyApp.LLM.OpenAI.chat_completion(messages, opts)

        case result do
          {:ok, response} ->
            ttl = Keyword.get(opts, :cache_ttl, @default_ttl)
            Cachex.put(@cache_name, cache_key, response, ttl: ttl)

          _error ->
            :ok
        end

        result

      {:ok, cached} ->
        {:ok, cached}
    end
  end

  defp build_cache_key(messages, opts) do
    data = {messages, Keyword.take(opts, [:model, :temperature])}
    :crypto.hash(:sha256, :erlang.term_to_binary(data))
  end
end

Rate Limiting with Hammer

defmodule MyApp.LLM.RateLimitedClient do
  @bucket "openai_api"
  @scale_ms :timer.minutes(1)
  @limit 60

  def chat_completion(messages, opts \\ []) do
    case Hammer.check_rate(@bucket, @scale_ms, @limit) do
      {:allow, _count} ->
        MyApp.LLM.CachedClient.chat_completion(messages, opts)

      {:deny, retry_after} ->
        {:error, {:rate_limited, retry_after}}
    end
  end

  def chat_completion_with_retry(messages, opts \\ [], retries \\ 3) do
    case chat_completion(messages, opts) do
      {:error, {:rate_limited, retry_after}} when retries > 0 ->
        Process.sleep(retry_after)
        chat_completion_with_retry(messages, opts, retries - 1)

      result ->
        result
    end
  end
end

Stack these into a single client module that your application calls everywhere. The caching and rate limiting become invisible to consuming code — which is how infrastructure should work.

Testing AI Features

Testing LLM integrations takes thought. You can't rely on the actual API; it's slow, expensive, and non-deterministic. But you can't ignore the integration either. I've found three approaches that cover most of the surface area.

Mocking with Mox

# test/support/mocks.ex
Mox.defmock(MyApp.LLM.MockClient, for: MyApp.LLM.ClientBehaviour)

# lib/my_app/llm/client_behaviour.ex
defmodule MyApp.LLM.ClientBehaviour do
  @callback chat_completion(list(), keyword()) :: {:ok, String.t()} | {:error, term()}
end

# In your actual client
defmodule MyApp.LLM.OpenAI do
  @behaviour MyApp.LLM.ClientBehaviour
  # ... implementation
end
# test/my_app/ai/analyzer_test.exs
defmodule MyApp.AI.AnalyzerTest do
  use ExUnit.Case, async: true
  import Mox

  setup :verify_on_exit!

  describe "analyze_sentiment/1" do
    test "returns structured sentiment for positive text" do
      expect(MyApp.LLM.MockClient, :chat_completion, fn _messages, _opts ->
        {:ok, ~s({"sentiment": "positive", "confidence": 0.95, "reasoning": "Test"})}
      end)

      assert {:ok, %{sentiment: :positive}} =
               MyApp.AI.Analyzer.analyze_sentiment("I love this product!")
    end

    test "handles API errors gracefully" do
      expect(MyApp.LLM.MockClient, :chat_completion, fn _messages, _opts ->
        {:error, :timeout}
      end)

      assert {:error, :timeout} = MyApp.AI.Analyzer.analyze_sentiment("test")
    end
  end
end

Mox works here because it forces you to define explicit contracts via behaviours.mox-philosophy No monkey-patching, no global state pollution. Your tests verify that your code handles the client's return values correctly; they don't test OpenAI's API.

Integration Tests with Recorded Responses

For critical paths, record real API responses and replay them.

defmodule MyApp.LLM.RecordedResponses do
  @responses_dir "test/fixtures/llm_responses"

  def record(name, messages, opts) do
    result = MyApp.LLM.OpenAI.chat_completion(messages, opts)

    path = Path.join(@responses_dir, "#{name}.json")
    data = %{messages: messages, opts: opts, result: result}
    File.write!(path, Jason.encode!(data, pretty: true))

    result
  end

  def replay(name) do
    path = Path.join(@responses_dir, "#{name}.json")

    path
    |> File.read!()
    |> Jason.decode!()
    |> Map.get("result")
    |> atomize_result()
  end

  defp atomize_result(%{"ok" => value}), do: {:ok, value}
  defp atomize_result(%{"error" => reason}), do: {:error, reason}
end

Property-Based Testing for Structured Output

When using Instructor, property-based tests catch the edge cases your examples miss.property-testing

defmodule MyApp.AI.Schemas.SentimentAnalysisTest do
  use ExUnit.Case, async: true
  use ExUnitProperties

  property "changeset rejects invalid confidence values" do
    check all confidence <- float(min: -10.0, max: 10.0),
              confidence < 0.0 or confidence > 1.0 do
      changeset =
        MyApp.AI.Schemas.SentimentAnalysis.changeset(
          %MyApp.AI.Schemas.SentimentAnalysis{},
          %{sentiment: :positive, confidence: confidence, reasoning: "test"}
        )

      refute changeset.valid?
    end
  end
end

Running This in Production

I've shipped LLM features in three Elixir applications over the past year. Some lessons cost money to learn.

Set aggressive timeouts. LLM APIs can hang indefinitely. Thirty seconds is generous; I've seen calls stall for two minutes before the provider's load balancer killed them. Use Task.async_stream with :timeout for batch operations.

Log everything except secrets. When a feature misbehaves at 2 AM, you need the exact prompt sent and exact response received. Redact PII and API keys; log the structure. I've debugged production issues in minutes because the prompt log showed the model was receiving truncated context — something that would have taken hours to reproduce otherwise.

Track token usage obsessively. Tokens per request, per user, per feature. GPT-4 Turbo charges $10 per million input tokens and $30 per million output tokens; a chatty agent loop can burn through dollars in seconds. Build dashboards. Set alerts. You'll thank yourself.

Degrade gracefully. When the LLM API is down — and it will go down — your feature shouldn't explode. Cache aggressively. Provide fallback behavior. Show users a helpful message, not a stack trace.

Version your prompts. Treat them like code, because they are code. Store them in version control. When you change a prompt, you're changing behavior; test accordingly.

Where This Leaves Us

Elixir isn't the obvious choice for AI features. I think that's actually an advantage; you end up reaching for it because the problem genuinely fits, not because it's the default everyone uses without thinking.

The BEAM's properties — concurrency that scales, fault tolerance that's structural rather than bolted on, real-time capabilities that come free with Phoenix — these aren't marketing bullet points when you're running LLM workloads. They're the difference between a production system that handles failure gracefully and one that pages you at 3 AM because a single API timeout cascaded into a full outage.

Python will remain the language of AI research. For shipping AI-powered features to users who expect them to work reliably? Elixir is a serious option. Not the only one. But a good one, and one I keep reaching for.


What do you think of what I said?

Share with me your thoughts. You can tweet me at @allanmacgregor.

Other articles that may interest you