What is flowgate?

The problem you already have

You told the agent to run the test suite first. It pushed anyway. You told it CI had to be green before deploy — it deployed on a red build. You opened a PR for review and it merged its own change. You run Claude Code or Cursor against real commands — git, npm test, deploys, migrations — and you’ve watched it pick the wrong one.

The system prompt is a suggestion. The agent reads it, agrees, and does the other thing two turns later when the context fills up. You can’t make a rule stick by asking nicely.

flowgate fixes this differently: it makes the wrong move not exist as an action the agent can call. Not blocked, not discouraged — absent from the menu. The agent can only ever do what the current state allows, and the right move appears only once it’s been earned.

How airtight that is depends on how you run flowgate. Wire it into Claude Code or Cursor and it governs everything you route through it — your MCP servers, CLIs, and APIs all sit behind its two tools (its one blind spot is your host’s own built-in shell, which you restrict separately). Run the loop in flowgate’s own chat TUI and the guarantee is absolute: the model’s only tools are the legal moves, so there’s nothing else to call. How to run it lays out the three modes and exactly what each enforces.

The fix: moves your agent can’t make

You describe your process as states and the moves allowed in each. A move that isn’t declared in the current state simply isn’t offered. The agent reaches for it, gets refused, and is handed back the moves that are legal — so it recovers on its own.

Here’s the whole idea in nine lines of YAML — a ship guard that won’t let the agent ship before it runs your check:

version: "1.0.0"
workflows:
  ship_guard:
    initialState: unchecked
    states:
      unchecked:
        transitions:
          run_check: { target: checked }   # the only move offered here
      checked:
        transitions:
          ship: { target: shipped }        # `ship` exists ONLY in this state
      shipped: {}                           # terminal — done

ship is declared only inside checked, and the only way into checked is run_check. So from unchecked, ship is not a move that exists. The agent can’t push past the gate because the action to do so isn’t on the wire until the gate is passed. Point run_check at your real test command and the gate has teeth — a red suite means ship never becomes reachable. (Quick start walks the full loop, including making the check run npm test.)

That’s TDD — red has no write-implementation move until a test fails. That’s deploy-gating — deploy isn’t a move until tests pass. Same machine, your labels.

Two tools, any number of moves

No matter how many workflows you wire in, your agent always sees exactly two MCP tools:

Tool	What it does
`flowgate.query`	All reads: browse what’s available, search, check workflow state
`flowgate.command`	All writes: start workflows, fire transitions

Reads on one tool, writes on the other. That split lets your host gate permissions on exactly the right axis — auto-approve the reads, require confirmation for anything that changes state. It’s also why the tool surface stays flat: whether you have one workflow or five hundred, the model still sees two tools.

Every response carries the moves that are legal from where you are now — the server tells the agent what it can do next, so the agent just follows the links instead of memorizing a catalog.

How a refusal looks on the wire

When the agent tries a move that isn’t declared in the current state, the call is rejected and the legal moves come back with it:

→ flowgate.command { "workflowId": "wf_01H…", "transition": "ship" }
← { "result": { "status": "rejected" },
    "error": { "code": "INVALID_TRANSITION",
               "message": "Transition 'ship' is not valid from state 'unchecked'." },
    "links": [ { "rel": "run_check", … } ] }   # the refusal hands back the legal move

The agent didn’t get a vague “no.” It got told what it can do, and it follows that — runs the check, and only then does ship appear in links. The guardrail holds without a single line in the system prompt.

Make some moves wait for you

Some moves shouldn’t fire autonomously at all. Mark a move actor: "human" and the agent is never even offered the link — if it reaches for it, it gets ACTOR_MISMATCH. A pr.merge move gated this way means the agent can open the PR but a person merges it. The agent literally cannot submit the move; it waits for you.

You declare all of this in YAML — no glue code, no per-tool wrappers. The same declarative surface also gives you:

Guards — permission checks, role requirements, expression-based rules, evidence requirements, evaluated before a declared move runs (a failed guard returns GUARD_REJECTED, distinct from an undeclared move’s INVALID_TRANSITION).
Reliability — timeouts, retries with backoff, fallback executors. Declared, not coded.
Audit — every step emits a structured JSON event automatically. You get a complete, replayable trace of what the agent did, including the moves it tried and was refused.
Schema validation — bad input is caught before it reaches your backend.

The config keys are small: version, a workflows.<id>.initialState, and per-state states.<s>.transitions.<t>.target, with optional actor (who’s allowed to fire it) and executor (what actually runs).

Who this is for

You let a coding agent run real commands — tests, git, deploys, migrations — and you’ve watched it do the wrong one. You want some moves to be impossible (ship before the suite passes) and others to wait for a human (merge to main). You’d rather encode that in the structure of what’s callable than keep re-explaining it in a prompt the agent ignores.

If that’s you, this earns its keep on the first refused git push. If you only ever let the agent read files and suggest diffs — no commands with consequences — you don’t need it yet. Bookmark it for when you do.

And it’s cheaper at scale

There’s a cost side-effect worth knowing. Every MCP tool you register normally lands in the model’s system prompt and costs 50–150 tokens just to describe — and the model burns output tokens reasoning about which of fifty tools to pick, often wrongly. With flowgate the model sees two tools regardless of how much you wire in, so that per-tool cost stays fixed as you grow. The full breakdown is in the cost analysis.

Going further

When you have ten workflows and half of them re-implement “run the tests, open a PR, review the diff,” you’ll want to write that sequence once and compose it rather than copy-paste it. v0.2 adds a composition model for exactly that, plus two ready-made libraries. That’s a deeper topic — see Capabilities and orchestrators, Multi-repo loading, and Self-authoring.

To go hands-on now: the Quick start wires the ship guard into your agent and watches it try the wrong move in about a minute. From there, add an approval gate, see how discovery keeps the surface flat, or compose a small architecture from skills and workflows.