UncategorizedMarch 18, 20267 min read

The Anatomy of a Work Order Pipeline: How We Build LiveSeller Pro

How we built a structured Work Order pipeline using Notion, n8n, and hard-learned lessons from eight weeks of AI-assisted development.

Eight weeks ago, we had a whiteboard, a Notion database, and an idea: what if we treated every piece of engineering work like a military operation?

Not the bureaucratic kind. The kind where every task has a clear objective, a chain of command, stop conditions, and accountability at every step. No task ships without verification. No builder marks their own homework as done.

Today, the LiveSeller Pro engineering pipeline processes work orders through a structured lifecycle that has caught placeholder code, prevented bad deploys, and kept a small team building at a pace that would normally require three times the headcount.

Here is how it works — and how it is still evolving.

The Problem We Were Solving

When you are building a product with AI-assisted development, speed is not the bottleneck. Quality is. An AI builder can generate a component in minutes, but if nobody checks whether that component actually does anything, you end up with a codebase full of beautiful placeholders.

We learned this the hard way. In mid-March, we ran a validation audit on eleven dashboard work orders that had all been marked complete and approved. Every single one was rejected. Seven of nine page components were fifteen to eighteen lines of nothing — an h1 tag and a paragraph. The dashboard looked great in a screenshot. It did absolutely nothing when you clicked on it.

The auto-approve system had checked the risk level (LOW) and whether the word “tests” appeared in the completion comment. It did not check whether any code actually existed. That is when we killed the rubber stamp and built the pipeline.

The Work Order Lifecycle

Every piece of work in LiveSeller Pro flows through a defined status chain:

Backlog → Need Spec → To Do → In Progress → Review → Ready for Deploy → Deployed → Done

Each transition has rules. You cannot skip steps. And every status has a specific meaning:

Backlog — an idea captured, nothing more
Need Spec — the idea exists but needs a full technical specification before anyone touches code
To Do — fully specced, architecture approved, ready for a builder to pick up
In Progress — a builder has claimed it and is actively writing code
Review — code complete, tests passing, waiting for validation
Ready for Deploy — validated by the pipeline, waiting for human release authority
Done — live in production and verified

The key insight: no AI marks anything as Done. Only a human can deploy, and only a human can verify the deploy worked. The pipeline gets work to the door. A person opens it.

Notion as the Source of Truth

Our entire pipeline lives in a Notion database called the Project Task Tracker. Not because Notion is the best project management tool — but because it is the best single source of truth we have found for a small team that needs flexibility without chaos.

Every work order is a Notion page with properties tracking priority (P0 through P3), builder assignment, execution state, architecture approval status, and an internal AI status field that tracks where the builder is in its own workflow.

The Claude Queue is a filtered view of that database. When a builder checks for work, it queries the queue sorted by priority. P0 always beats P1. No exceptions, no drift.

We enforce a rule we call anti-drift: builders cannot return to a previously worked task based on memory. They must always query the queue fresh. This prevents the natural tendency to keep polishing the thing you just worked on instead of picking up what actually matters most.

We also enforce a session limit of five work orders. After five completions, the builder must stop and start a fresh session. By work order ten or twelve, context drift is real — the builder is working with degraded context and producing lower quality work. Five and out. Fresh eyes every time.

n8n as the Automation Layer

Notion holds the data. n8n moves it.

We self-host n8n on our production server as the workflow automation engine. It handles the connective tissue between systems:

When a work order hits Review, a webhook fires to run risk assessment
Notifications route to the right builder based on assignment
Status transitions trigger downstream validation checks
Daily health checks run across all services and report back

We currently run over sixty n8n workflows handling everything from work order lifecycle events to server monitoring to test pipeline triggers. The system is not perfect — we have hit edge cases where n8n deactivates workflows silently after errors, and we have learned the hard way that restarting the Docker container reactivates everything whether you want it to or not. But it works, and it is getting better every week.

The Two-Gate Validation System

This is where the pipeline got serious. After the eleven-WO placeholder incident, we built a two-gate system that we call the War Council:

Gate 1: Architect Review (before build starts)

Before a builder writes a single line of code, the work order’s design is validated against the actual codebase. Are the database column names real? Do the API routes match what actually exists? Are the state machine transitions correct? This catches spec errors before they become code errors.

Gate 2: Code Validator (before marking Review)

After the builder claims the work is done, a validator checks ten specific criteria:

Do the claimed files actually exist on disk?
Are any components under 30 lines? (Placeholder flag)
Is there real logic or just hardcoded stub data?
Do claimed database migrations actually exist as SQL files?
Is there real database connection code or hardcoded arrays?
Do API endpoints have actual handler logic?
Do tests have real assertions, not empty describe blocks?
Does the Dockerfile match the claimed runtime?
Do UI components have state hooks, API calls, and event handlers?
Is every stop condition on the work order verified with a file path and line number?

The validator returns one of four verdicts: Validated, Needs Work, Rejected, or Fraudulent. That last one means a builder claimed work that demonstrably does not exist. It exists because we needed it to exist. The accountability has to be real or the pipeline is theater.

The 12-Section Work Order Template

Every work order follows the same twelve-section structure. No exceptions:

Objective — what are we building and why
Behavior Source of Truth — what existing specs or docs govern this
Prior Art — what already exists that we are building on or replacing
System Context — what subsystem, what dependencies, what consumers
UI Hierarchy — exact component tree, not a flat feature list
Mode Behavior Matrix — how does it behave in different states
Backend Function Inventory — every function labeled EXISTING or NEW
Data Flow — step by step, how data moves through the system
Database Schema References — exact column names from the live schema
Deploy Target — where does this go
Test Scenarios — Given/When/Then for every testable behavior
Stop Conditions — the exact checklist that must be true before this ships

Section 7 is the one that changed everything. By forcing every function to be labeled as EXISTING (already in the codebase) or NEW (being created by this work order), we eliminated the single biggest source of bugs: invented functions. Builders would reference functions that sounded right but did not exist. The code would compile. The tests would pass against mocks. And then it would fail in production because it was calling a function that was never written.

We also require that no spec is written from concept or memory. Before writing a work order, the author must read the actual source files, query the live database schema, and check what code already exists. We call this artifact-first — if you have not looked at the real code, you do not get to write the spec.

What Eight Weeks Taught Us

Week 1-2: Work orders were freeform. Some had specs, some were a single sentence. Build quality was random. We were moving fast and felt productive.

Week 3-4: We introduced the 12-section template and the priority queue. Quality improved but enforcement was manual. If someone forgot a section, nobody caught it until the code was wrong.

Week 5-6: The placeholder incident. Eleven work orders, all marked complete, all approved, all empty. The auto-approve saw low risk and waved them through. We killed the rubber stamp, built the two-gate War Council, and wrote twenty-two engineering rules that every builder must follow.

Week 7-8: Doctrine formalized across six versions. Builder accountability rules codified. Session limits introduced. The pipeline started catching its own bugs — including, this week, discovering that a config sync had overwritten one builder’s identity file. It thought it was a different builder entirely, could not find its own work orders in the queue, and sat idle for days. That is the kind of problem you only discover when you have a real pipeline with real accountability and someone actually looks at why a builder is not producing.

The system is still evolving. We are currently wiring the validator directly into the review pipeline so it runs automatically instead of manually. We are building behavior specs for every subsystem so future work orders start from documented reality instead of assumptions. And we are learning, every week, that the pipeline is not just about catching bad code — it is about building the discipline that makes good code inevitable.

Why This Matters for LiveSeller Pro

Every feature in LiveSeller Pro — from the import center that processes your comic inventory to the export engine that formats listings for Whatnot and eBay — goes through this pipeline. Every one has a work order with stop conditions. Every one is validated before it ships.

We are a small team building serious software for people who run real businesses. When you trust a tool to manage your inventory, generate your listings, and track your sales, you deserve to know it was built with discipline. Not speed for the sake of speed. Not AI-generated code that nobody checked. Real engineering, with real accountability, building real tools for real sellers.

That is what we are building at Blue Devil Collectibles. And the pipeline is how we make sure it stays that way.

— John Ranson, Blue Devil Collectibles