Post

Headless Agents in CI/CD

Headless Agents in CI/CD

Initial generations of AI tooling in the developer ecosystem was designed around a human in the loop. An IDE extension that autocompletes as you type. A chat interface that waits for your next message. Even the earliest agentic tools assumed someone was watching, ready to approve the next step. The industry is cautiously stepping beyond that paradigm with things like “yolo” modes and “run without asking” settings. But the assumption of a human intermediary is baked into the design of most tools.

That assumption shifts inside a pipeline.

The three generations of AI tooling

It’s worth tracing how we got here, because the limitations of each wave explain why the current one matters.

  1. IDE extensions were the first wave, led by GitHub Copilot. They were genuinely impressive — inline suggestions, in-context completions, reasonable awareness of the file you had open. But context was the constraint. You had to have the right files open, and the model’s awareness didn’t extend much beyond your current buffer. For a developer working interactively, this was fine. For a pipeline with no open files and no human at the keyboard, it was useless.

  2. Chat interfaces came next allowing Agents to loop & reason. ChatGPT, Claude’s web interface, and their contemporaries were good at reasoning over problems you described in prose. But they required you to bring the context to them — copy-paste the log, describe the environment, transcribe the error. MCP led the effort to help agents autonomously probe for missing context. Chat+Agents+MCP started showing signs of an unattended agentic workflow.

  3. The CLI tools that emerged in 2025 changed the model fully to what most of my Enterprise customers were asking for since Day 1. Claude Code, GitHub Copilot CLI, OpenAI Codex CLI — all of them could accept context as arguments or files, execute with no interactive input required, and return structured output that downstream steps could consume. For the first time, you had AI tooling that was designed to run unattended and was easily programmable without needing to think through a model providers API response schemas.

What headless means in practice

A headless agent is an AI CLI running non-interactively inside a pipeline step. It receives its context upfront — a prompt, environment variables, credentials, tool configuration — and it runs to completion without waiting for human input. The output goes to a file or stdout. The next step in the workflow consumes it.

The key difference from interactive use isn’t just the absence of a human — it’s that the absence is by design. When you run Claude Code on your laptop and it asks a clarifying question, you answer it. When it runs as a pipeline step, there’s no one to answer. You have to design the prompt so the agent never needs to ask. Complete context in, complete output out.

That constraint has three practical implications:

  • Accuracy: The agent has to consistently get things right. The knobs we have to boost its accuracy are better prompts, better context, better models, or hooks to add a flavor of deterministic behavior.
  • Guardrails: The agent can be deployed to ephemeral environments or production. So, what it can do and what it can’t do needs to be carefully defined and enforced. A tool allowlist, a permissions boundary, and a clear scope of work are all part of that.
  • Auditability: When the agent is running without a human watching, you need to be able to reconstruct what it did after the fact. This means durable records of prompts, outputs, and actions taken — ideally with a clear link between them.

Why this matters for platform teams

Platform engineering teams have spent years building pipelines that are deterministic and observable. But achieving that predictability is a lot of work. You build something “good enough,” iterate on it through real-world edge cases, add guardrails, improve it through incidents, and deal with the glue code that holds outputs between steps together. That cycle never really ends.

Headless agents can either take over that process entirely or dramatically accelerate it.

Fully Agentic: The pipeline is the prompt. A Skill file, a custom prompt, or a set of subagents defines what needs to happen, and the agent executes end to end. Whether a human gate belongs in that flow depends on the stakes — an agent surfacing a diagnosis can run unattended, while one taking action on a production system probably shouldn’t. The main benefit is that the “what needs to be done” is written once in natural language — useful to both humans and agents. The trade-off is non-determinism: two executions can take different paths and produce different outputs. Token costs compound with complexity. This pattern works best for workflows that are well-scoped, where the cost of a wrong answer is low, or where the situation requires multiple situational probes that would be hard to script deterministically.

For example, consider an incident first-responder pipeline that runs when an alert fires: pulls recent logs, checks resource state, and produces a triage report for the on-call engineer before they’ve even opened their laptop. In this example, the agent is doing the work of gathering and correlating information that would be time-consuming for a human, and the output is a structured report that helps the engineer make a decision faster. The non-determinism is acceptable because the agent is surfacing information rather than taking an action, and the cost of a wrong answer is low because the engineer reviews the report before acting.

Agent-Accelerated: The agent is an optimizer, not a pipeline replacement. It starts from a natural language description of the workflow and, over time, helps replace that description with structured steps, scripts, and reliable automation. The end goal is to narrow the agent’s role — not necessarily eliminate it. This approach lets platform teams build trust incrementally — you understand what the agent is doing at each stage before you hand off more to it. It mirrors the way a good engineer would build automation: iteratively, with accountability at each step.

To extend the above example, the same incident first-responder pipeline illustrates the Agent-Accelerated pattern at a different point in its maturity. The agent starts broad — pull whatever looks relevant, summarise what you find. Over time, the checks that always appear in the report get hardcoded as structured pre-flight steps. The agent’s role narrows to the gaps: novel context, cross-service correlation, the things that don’t fit a script. It’s to understand the problem well enough to know exactly where you do need it. And even that boundary shifts — what starts as cross-service correlation that only an agent can reason over might, with enough data and pattern recognition, become a bash script.

The goal isn’t to keep the agent doing everything — it’s to understand the problem well enough to know exactly where you do.

I’ve been exploring both patterns in the context of Octopus Deploy, where the platform’s connectivity to deployment targets and its first-class concept of runbooks make it a natural fit for headless agent steps. More on that in future posts.

This post is licensed under CC BY 4.0 by the author.