How We Built a CI/CD Pipeline Where AI Reviews AI's Code
BDC Dual-Gate Build Pipeline
War Council validates design. Captain CI validates code. No AI marks its own work complete.
Spec
Read source code. Write behavior spec.
Review Design
Arch + QA check schema, deps, scope.
Write Code
Build to completion. Tests pass. Push.
REVIEW
Files, routes, migrations, test counts.
Validate
Files exist. Tests pass. No stubs.
Ship It
LSPRO: auto. All else: John approves.
DONE
Only CI marks Done. Next WO immediately.
XO
War Council
Major Build
Captain CI
Deploy
Fail Path
We run a comic book shop. We also run a full engineering pipeline where AI agents write code, other AI agents review it, and nothing touches production without passing automated checks.
Here’s how it works, and why we built it this way.
The Problem
We’re one person running a SaaS product (LiveSeller Pro), a retail operation, and a growing list of integrations — WhatNot, eBay, Shopify, Stripe, Twilio. There is no engineering team. There’s John, and there’s a roster of AI agents.
AI is fast. AI is also confidently wrong. Without guardrails, you get code that looks right, passes a vibe check, and breaks in production at 2am when a customer is trying to check out.
We needed a system where speed doesn’t sacrifice reliability.
The Dual-Gate Pipeline
The core idea is simple: two gates, two different questions.
Gate 1 — War Council (Pre-Build): “Should we build this?”
Before any code is written, architect and QA agents review the design. They check the database schema against what actually exists. They verify that the API endpoints referenced in the spec are real, not hallucinated. They flag scope creep, missing dependencies, and security gaps.
This catches the most expensive bugs — the ones where you build the wrong thing entirely.
Gate 2 — Captain CI (Post-Build): “Did we actually build it?”
After code is written, Captain CI checks the receipts. Do the files exist on disk? Did the migration get created? Do the tests pass? Can we curl the endpoint and get the expected response?
No completion manifest = automatic rejection. The manifest lists every file created, every route added, every test result. If Major Build (our code execution agent) claims “tests pass” without evidence, Captain CI sends it back.
The Agents
We don’t use one AI for everything. We use specialists:
- XO reads source code and writes the technical spec. Every work order enters the queue with exact file paths, command-based stop conditions, and explicit scope boundaries.
- War Council (architect, QA, legal, commercial) reviews the spec before build. Advisory only — they flag problems but never block.
- Major Build writes the code. Picks up the work order, builds to completion in a continuous session, pushes to GitHub.
- Captain CI validates the build. Checks files, runs tests, verifies stop conditions. Only Captain CI can mark a work order as Done.
The key insight: the agent that writes the code is never the agent that validates it.
The Rules
Three rules govern everything:
Rule 1: No Git, No Work. Every change is a commit. Every commit is on GitHub. If it’s not in git, it doesn’t exist.
Rule 2: Completion Manifest Required. When submitting for review, the builder attaches a manifest listing files created, files modified, routes added, migrations added, test counts, and the git commit hash. No manifest = auto-reject.
Rule 3: Only Captain CI Closes Work Orders. The builder marks “Review.” Captain CI checks the files. CI passes = Done. CI fails = back to the builder. Two failures on the same work order = Blocked, and the builder submits a corrective plan.
No AI marks its own homework as complete.
What This Looks Like in Practice
Last week, we specced and queued 17 work orders in a single session. Four architect agents ran in parallel, each reading actual source code from different repositories, producing specs with exact file paths and line numbers.
Every spec includes stop conditions like:
curl -s -X POST http://localhost:3402/customers/dedup-check \
-H "Content-Type: application/json" \
-d '{"email":"[email protected]"}'
-> HTTP 200, {"matches": [...]}
Not “customer dedup works.” Not “endpoint is accessible.” A command you can paste into a terminal and verify.
The Deploy Split
Not everything auto-deploys. Our React dashboard (LiveSeller Pro BBDB) has a fully automated GitHub Actions pipeline: push to main triggers build, staging deploy, staging validation, production promote, production validation. No human in the loop.
Everything else — Google Apps Script, WordPress, ShopOps API — requires John to say “PROCEED DEPLOY” before anything touches production. AI handles staging. Humans handle the last mile.
Why This Matters for Small Businesses
This isn’t a big-company process scaled down. It’s a small-business process that happens to use AI instead of a team of engineers.
The pipeline exists because:
- We can’t afford production outages (customers are trying to buy comics)
- We can’t hire 5 engineers (we’re a comic shop)
- AI makes mistakes that look exactly like correct code
- Speed without guardrails is just faster failure
The result: we ship features daily, every change is tested, and nothing reaches production without verification. A one-person shop running an engineering pipeline that most startups don’t have.
LiveSeller Pro is an AI-powered inventory and operations platform for comic book and collectibles shops. We’re building it in public because the process is as interesting as the product.
If you run a shop and want to see how this works, reach out at livesellerpro.app.