Designing Infra for Agents
The surprising thing about handling infrastructure for a non-deterministic agent is how deterministically it breaks. Notes on the primitives that don’t make the agent guess.
It’s always some version of: the agent skims the docs, guesses the endpoints, misses a required field, recovers from the error, then spirals out of control — or gives up and rebuilds the thing from scratch.
The question is no longer “does this take more than 3 clicks?”. It’s whether the agent can infer the primitive from almost no context, default to the safe path, and recover when it’s wrong. There’s a lot of existing literature on this.
Agent experience optimization
It’s already visible in what agents pick as their preferred tools. A non-obvious take is that almost every coding agent is some version of the same ~8 frontier models. That is, functionally, the entire customer base. Eight ICPs that account for nearly all agent behavior in production.
That means reproducibility comes built into the problem. We can essentially hack AX by overfitting to those eight models — getting feedback from our 8 ICPs before anything reaches prod.
As agents get more and more autonomous, when the right primitive isn’t obvious and available, the agent doesn’t stop to ask you. It builds the smallest thing that satisfies the prompt, and that leads to unintended effects downstream. Half-cooked auth with no audit trail, no revocation policy, no rollbacks — because the task was to “integrate auth,” not to build auth you can observe and control.
Agents are ruthlessly goal-oriented and lack long-horizon context.
That raises a couple of unexpected problems.
Rollbacks
The first thing a human does with a new database is set up rollbacks and an ORM. The first thing an agent does is write to it. SQL over HTTPS. Since the agent focuses on the task at hand and lacks long-horizon context, it won’t set up checkpointing on its own — even if it’s one call away.
Everything works until that one agent decides to ALTER a prod table because a column seemed redundant. That breaks a contract two layers down. You confront the agent, only to hear back the classic: “You’re absolutely right! I accidentally ALTERed the table, let me roll it back.” But there is no rollback.
Purpose-built primitives for agents should have rollbacks and checkpoints baked into them, by default.
Observability & traceability
Imagine a long-running agent that spins up a couple of VMs, finishes its tasks, and gets auto-compacted. The VMs quietly keep burning your budget, and the agent no longer retains the decision thread.
Every primitive should emit traces on every state change: every action, actor, resource, cost, and recovery path.
Pricing
Most of us have been burned by a cloud bill at least once. Agents are no different — except now there are millions of them, so the surface area for a catastrophic error rises exponentially. And with loops and long-running agents slowly becoming the norm, this gets exponentially worse.
The interesting thing to note: when you have 8 ICPs, the failures repeat. And a failure that repeats is one you can design against.
It’s a primitive problem, not a runtime problem
You don’t bolt observability onto a system after you’ve built it. You bake it into the primitive, so the agent can’t not have it. Same with traceability and rollbacks.
The general design principle: agent-native primitives should minimize interpretive surface area — everything an agent has to guess, everything that isn’t a default. Which endpoints to call. What permissions apply. Which region to run in. How billing works, and how to recover.
Human-first infra maximizes optionality: too many ways to do one thing, where every knob is a feature. For an agent, every extra knob is one more surface to hallucinate on.
Sensible defaults
Agent-native primitives should ship defaults that quietly cover the median task, with progressive controls — defaults the agent can infer and reason against. Most tasks should be on the happy path, with bounded deviation.
In our own tests we found agents tend to read only part of the docs (roughly the first 2000 tokens for opus-4.7-high) and try to infer the rest. So the choices that matter should arrive just in time — right where the agent is looking, right when it needs it.
Composability
The single most exciting unlock about purpose-building for agents is composability: primitives that expose capabilities other primitives can consume.
Auth can expose user tokens that scope files (in a storage bucket) to specific users. The agent never writes its own implementation of the users→files map, because the primitive already exposes an identity binding. One less thing to invent. One less thing to get subtly, dangerously wrong.
Domains can expose DNS record handlers, so an agent configuring a signup confirmation email doesn’t have to reason through SPF, DKIM, DMARC, and verification. It simply writes:
await email.send({
from: "agent@example.com",
to: "human@example.com",
body,
});
The backend infers the intent and configures the rest. The agent reasons about DNS only when DNS is the job.
On-the-fly infra
Take something awkward enough that most teams don’t bother. An agent ships an API in us-east-1 and wants to test latency for users around the world. With an agent-native primitive, that could take maybe 3 calls.
One call to provision a database with the schema { location: text, p90: float, p99: float, error: text }. Then a schema to provision 300 sandboxes across the globe:
plan:
service: compute-service:sandboxes
startup-script: "<fetch endpoint.com, calculate metrics, upload to db with location=$loc>"
location: "$loc for each $loc in {locations}"
Get back a cost estimate. The agent self-authorizes payments with x402. Intent to teardown in under ten seconds.
On-the-fly infra is the use case I’m most excited about. Today there’s no clean way to do it — that’s why almost nobody bothers. But it becomes trivial the moment the primitives make sense. That flow is hard to reach when AX is an afterthought, and almost trivial when AX is first-principles.
The primitive that is reachable, and gets the task done in the fewest steps, is the one an agent never rebuilds.
We’re still early into building that. Very early. But this is the architecture we’re building toward. That’s the north star: backend infra, purpose-built for AI agents.
Arag · June 2026