I wanted coding agents to be the story. The honest story is me: where I stayed vague, where I stayed out of the loop, and where I merged anyway. Agents did not create most of my failures. They surfaced how much I had been relying on intuition instead of contracts. This post is self-accountability first, Twelve-Factor second, and a few constraints I now treat as my job to uphold before anything ships.
Table of Contents
- What Twelve-Factor Is (Briefly)
- The Surprise
- What I Let Agents Amplify
- Enterprise Collaboration vs Small-Team Intuition
- The Human in the Loop Is Still the Developer
- Rediscovering Twelve-Factor Through Agents
- Agents Inside Sandboxes
- Constraints I Now Own
- What I Am Taking Forward
What Twelve-Factor Is (Briefly)
The Twelve-Factor App methodology is a set of practices for building modern hosted applications so they stay portable, operable, and easy to run in any environment. Adam Wiggins wrote it in 2011 around takeaways from Heroku, but the ideas outlived that platform.
In short, it asks you to: track one codebase per app; declare dependencies explicitly; store config in the environment; treat databases and queues as attached backing services; separate build, release, and run; run work units that hold no local state; bind services to ports; scale out via the execution model; design for fast startup and graceful shutdown; keep dev and prod similar enough to trust what you ship; treat logs as event streams; and run admin tasks (migrations, one-offs) as explicit jobs, not surprises in prod.
None of that is magic. It is discipline so a team (or a future you) can understand how the system is meant to behave. I came back to it because agents made every place I had skipped that discipline painfully visible.
The Surprise
Over the last few months, I have spent most of my development time working alongside coding agents.
I started like most people: prompt, diff, review, merge. It felt magical. Boilerplate vanished. Refactors were faster. Plans appeared while I was still reading the ticket.
So I believed the next step was obvious:
Better prompts → Better code → Better software.
I was half right. Better prompts did produce better diffs. They did not produce better judgment on my part.
I upgraded the workflow: planning, sub-agents for research and implementation, reviewer agents, more CI. Generation improved. My habits did not keep up. I merged changes I understood at the file level but not at the system level. I let requirements stay fuzzy because the code looked credible. I treated green checks as approval instead of one input to a decision I still owned.
Bugs followed. Not because the model was broken. Because I had accelerated the easy half and under-invested in the hard half: clear intent, production-shaped beliefs, and the boring operational contracts that were always my responsibility.
That is the accountability line I keep returning to: agents are in the loop, but the developer is still on the hook.
What I Let Agents Amplify
Looking back at incidents, agents rarely invented new failure modes. They amplified what I had already tolerated.
- I left requirements ambiguous, so agents produced ambiguous diffs I did not tighten.
- I let architecture live in my head, so one agent run could drift over many services without me drawing boundaries.
- I under-invested in observability, so “works in the PR” had no story when prod disagreed.
- I blurred build, release, and run because it was faster in the moment.
- I did not write down who owned what, so agent runs became harder to reason about than human commits.
The helper did not skip those gaps. It walked through them at higher speed.
Enterprise Collaboration vs Small-Team Intuition
Context matters for how those gaps show up.
In large enterprises, no single developer is meant to carry the whole system in one head. Platform teams, security review, architecture forums, SRE ties, and written standards exist so collaboration distributes the load. Agents fit that world when they plug into existing guardrails: approved templates, paved roads, enforced policies, shared catalogs. The human in the loop is often a network of humans. My mistake in enterprise-shaped work was still personal: thinking someone else had made intent explicit when I had not read it, or not contributed the absent paragraph the next person needed.
In small teams, the same factors collapse onto one or two developers. There is no platform team to catch vague config. No architecture review board to challenge a multi-service change. Intuition, experience, and late-night context carry what enterprises spread over roles. Agents feel transformative there because they multiply the one person who was already doing the heavy lifting. They also multiply that person’s blind spots if they skip the work of making systems legible to anyone else, including future them.
I have worked in both modes. In both, the takeaway was the same: collaboration or intuition can get you through a sprint; only explicit contracts get you through production. Agents do not remove that. They punish the absence of it faster.
The Human in the Loop Is Still the Developer
“Human in the loop” is not a badge you earn by glancing at a diff.
It means the developer takes ownership to clear things up before and after automation runs:
- Before: crisp agreed criteria, known blast radius, explicit non-goals, environment boundaries, and a spec that outlives the chat run.
- During: review that asks what prod will do, not only whether TypeScript compiles.
- After: telemetry that proves behavior, postmortems that name my workflow gaps, not “the bot.”
When I treated the agent as the author, I outsourced accountability. When I treated it as a fast junior with unlimited stamina, I kept accountability and got leverage.
That shift is not organizational theory. It is how I stopped lying to myself that merge equaled done.
Rediscovering Twelve-Factor Through Agents
Twelve-Factor did not save me. It gave me vocabulary for work I had been deferring. A few factors mapped directly to mistakes I could own.
Dependencies (II)
I let agents bump packages without me insisting on lockfiles and reproducible CI. “Compiles on my machine” was never enough; I knew that. I acted like it was.
Config (III)
I tolerated config in READMEs, and chat-generated .env examples. I did not force a single environmental source of truth. Agents did not invent the fourth source. I allowed a fuzzy boundary and paid for it.
Build, Release, Run (V)
I let one changeset touch app code, workflows, and deploy manifests because it was convenient. I am the one who needs to enforce separation when the tool does not.
Dev/Prod Parity (X)
My biggest personal gap. I optimized local agent loops and under-invested in production-shaped constraints: traffic, flags, limits, real data forms. Production kept teaching what I had postponed.
Logs (XI)
I reached for better prompts when I should have reached for structured logs and traces. Observability was not a nice-to-have. It was how I prove whether my change behaved as intended.
Admin and blast radius (XII and friends)
I treated migrations and one-off jobs as things the agent could “just run” because tests expected it. That was me skipping the same caution I would expect from another engineer.
Agents Inside Sandboxes
I also build systems where agents run inside prod-shaped boundaries: tool servers, eval workers. There the accountability story flips slightly but does not disappear.
Someone still decided what the agent may read, invoke, spend, and deploy. When those decisions are implicit, “human in the loop” is a slide deck, not a control.
Sandboxing is how blast radius becomes physical: ephemeral environments, scoped credentials, default-deny network, least-privilege tools. I treat that as engineering work I own, not a feature the model requests politely.
Constraints I Now Own
These are not replacements for Twelve-Factor. They are promises I make to myself before an agent touches the repo.
Agent legibility
I write repos so a competent stranger (human or model) can navigate them: layout, docs beside code, errors that say what to fix. If only I understand the system, agents become an extension of my confusion.
Human checkpoints
I gate what must not be automated on a whim: auth, money, retention, public APIs, sensitive dependencies. In enterprise settings that means using the review paths that already exist. In small teams it means I do not merge what I would not defend in a design review with two people.
Cost guardrails
I watch spend like I watch latency. Retries, tools, embeddings, forgotten eval jobs: my budget, my alert.
Provable intent
The spec, ADR, or agreed criteria outlive the chat run. If the diff diverges, I update the artifact or reject the diff. I do not hide behind “the PR looked fine.”
Observable agency
I log who invoked automation, what it touched, what deployed. “The bot did it” is where my accountability stops being useful.
What I Am Taking Forward
Coding agents are real leverage. They are not a substitute for owning clarity, constraints, and outcomes.
Large organizations will keep relying on collaboration to spread that ownership. Small teams will keep relying on developer intelligence to carry what there is no committee for. Both fail the same way when intent stays implicit and prod stays distant.
The future of DX, for me, is not faster merges. It is me doing the unglamorous work first: make the system understandable, bound the blast radius, align environments, instrument behavior, and stay in the loop as the person who clears things up.
Twelve-Factor still promises that disciplined constraints free you to focus on product. I believe that. I also believe agents make it obvious when I have been freelancing on intuition instead of doing my part. That is the work I am not delegating away.