Workflows

Your agent ran deploy before the tests finished. Not because it’s careless — because deploy was a tool it could call, and nothing in the moment stopped it. If you run Claude Code or Cursor against real commands all day, you’ve watched this happen: it skips the suite, self-approves the PR, deploys on red. A workflow fixes that by making deploy not exist as a move until you’re in a state that offers it: the agent is in unchecked, and from there the only thing on the wire is run_check. The move it shouldn’t make isn’t blocked, it’s absent.

Under the hood, every tool call — every action the agent can take — passes through a state machine. That might sound heavy, but the simplest case is a single state that loops back to itself, which is just a normal tool call. The power comes when you add more states, because that’s where you start carving moves out of the menu.

The simplest workflow

When you declare your tools in proxy.expose, the gateway compiles them into a workflow called proxy_default with one state (ready) and one transition per tool. Call any tool, end up back at ready.

ready --hello.echo--> ready
ready --github.list_issues--> ready
ready --dotnet.test--> ready

Same engine. Same wire format. You don’t think about state machines until you need them.

Adding states

A multi-state workflow has several states with transitions between them. Each state represents a phase, and each transition represents a move that takes you to the next phase. This is where you get guardrails — moves your agent can’t make — because a move only exists in the states where you declared it.

Here’s the canonical shape: a ship guard. The agent has to run your check before it can ship, and “ship without checking” is simply not a move it can call.

workflows:
  ship_guard:
    description: Run the check, then ship.
    initialState: unchecked

    states:
      unchecked:
        transitions:
          run_check:
            title: Run the check
            target: checked
            executor:
              kind: cli
              connection: shell
              args: ["test"]

      checked:
        transitions:
          ship:
            title: Ship the change
            target: shipped
            actor: human

      shipped:
        terminal: true

Two real states, two transitions. The agent starts in unchecked, where the only move on the wire is run_check. It can’t ship from here — ship is declared only inside checked, so from unchecked it isn’t a move that exists. Once the check passes, the agent lands in checked and ship appears. And because ship is marked actor: human, even then the agent can’t fire it — a person does. shipped is terminal: the workflow is done.

This is the spine everything else is built on. TDD is an instance of it — red has no write-implementation move until a test fails. Deploy-gating is an instance of it — deploy isn’t a move until tests pass. Same machine, your labels.

How airtight that “not a move that exists” guarantee is — and whether the agent can sidestep it through its host’s own shell — depends on how you run flowgate. How to run it lays out the three modes and exactly what each enforces.

How state machines run

initialState is where flowgate.command({definitionId}) lands you. The response includes links to every transition available from that state.

Terminal states have no outgoing transitions (or set terminal: true explicitly). When the workflow reaches a terminal state, the response has result.status: "completed" and an empty links array.

Transitions are the edges between states. Each transition has:

A target state
Optional guards that must pass before execution
Optional executor — the thing that does the actual work when the move fires (a CLI command, an MCP tool call, an HTTP request)
Optional inputSchema for the arguments the caller must provide
Optional output mapping to thread results into the workflow’s shared state (its context)

The model never sees the full state machine. It sees the current state and the links to legal next moves — the server hands the agent its legal options instead of trusting it to remember them. There’s nothing to memorize and nothing to drift from.

Optimistic locking

Every workflow response includes a version number. Every flowgate.command({workflowId, transition, expectedVersion}) requires that expectedVersion. If someone else advanced the workflow between your read and your write, your version is stale and the submit is rejected with STALE_WORKFLOW_VERSION.

This stops the agent clobbering your decision with a stale view. You approved the PR while the agent was mid-turn on an older snapshot; its expectedVersion no longer matches, so its submit bounces instead of overwriting what you just did. More generally, it prevents race conditions when multiple actors (model, human, system) operate on the same workflow. The rejection response includes the current state and links, so recovery is straightforward — re-read, re-decide, re-submit.

{
  "workflow": {
    "id": "wf_3f8b...",
    "definitionId": "pr_review",
    "state": "in_review",
    "version": 3
  },
  "result": { "status": "rejected" },
  "error": {
    "code": "STALE_WORKFLOW_VERSION",
    "message": "Expected version 2 but current is 3."
  },
  "links": [
    { "rel": "approve", "method": "flowgate.command",
      "args": { "workflowId": "wf_3f8b...", "expectedVersion": 3, "transition": "approve" } }
  ]
}

The model doesn’t need to understand version numbers intellectually. It just passes back the expectedVersion from the most recent response’s links. The links always carry the right version.

The response wire format

Every flowgate.command({definitionId}) (start), flowgate.command({workflowId, transition}) (submit), and flowgate.query({workflowId}) (get) returns the same shape:

{
  "workflow": {
    "id": "wf_3f8b...",
    "definitionId": "deploy_pipeline",
    "state": "ready_to_deploy",
    "version": 6
  },
  "result": {
    "status": "waiting_for_action",
    "message": "All automated checks passed."
  },
  "context": {
    "lintPassed": true,
    "testsPassed": true,
    "testCount": 47,
    "coverage": 92.3,
    "artifactId": "img-a1b2c3"
  },
  "guidance": {
    "goal": "Confirm deployment",
    "instructions": "Review lint, test, and build results before deploying."
  },
  "links": [
    {
      "rel": "deploy",
      "title": "Deploy to environment",
      "method": "flowgate.command",
      "actor": "agent",
      "args": {
        "workflowId": "wf_3f8b...",
        "expectedVersion": 6,
        "transition": "deploy"
      }
    },
    {
      "rel": "abort",
      "title": "Abort deployment",
      "method": "flowgate.command",
      "actor": "agent",
      "args": {
        "workflowId": "wf_3f8b...",
        "expectedVersion": 6,
        "transition": "abort"
      }
    }
  ]
}

Key things to notice:

workflow tells you where you are: which workflow, which state, which version.
result.status tells you what happened: started, waiting_for_action, executed, completed, rejected, failed, timed_out.
context is the workflow’s shared state — the values accumulated from every previous step’s output mapping (test counts, the artifact ID, a coverage number). Guards and prefill read from it; it travels with the workflow.
guidance gives the model phase-specific instructions (if declared on the state).
links are the legal next moves. Each link carries everything the model needs to make the call — workflowId, expectedVersion, transition. The model picks one, fills in any required arguments, and submits.

Output mapping

By default, an executor’s result is forgotten after the transition completes. To pass data between steps, use output mappings:

transitions:
  run_tests:
    target: build
    executor:
      kind: cli
      connection: test_runner
    output:
      testsPassed: "$.output.json.passed"
      testCount: "$.output.json.count"
      coverage: "$.output.json.coverage"

The left side is the key in context. The right side is a path expression that reads from the executor’s result. After this transition runs, context.testsPassed, context.testCount, and context.coverage are set and available to guards and prefill in subsequent states.

Expression scopes for output mapping:

Scope	Reads from
`$.output.*`	The executor’s result (only available in output)
`$.arguments.*`	The caller’s transition arguments
`$.context.*`	The workflow’s accumulated context
`$.workflow.input.*`	The input passed to `flowgate.command({definitionId})`

You can also use operators for computed values:

output:
  attempts: { add: ["$.context.attempts", 1] }
  status: "reviewed"
  message: { concat: ["PR #", "$.context.prNumber", " is ready"] }

Operators: add, subtract, multiply, divide, set, concat. Arithmetic operands can be paths or literals. Missing/null values default to 0 for arithmetic, so a counter increment works on the first call.

Seeding context

If you need values in context before any executor runs — counters, flags, defaults — declare them with initialContext:

workflows:
  deploy:
    initialState: planning
    initialContext:
      attempts: 0
      status: pending
      approved: false
    states: { ... }

initialContext is set once when the workflow starts. Self-loops don’t reset it.

Prefill: pre-populating arguments

Transitions can pre-populate argument values in the links they generate. This means the model doesn’t have to reason about values the workflow already knows.

transitions:
  create_pr:
    target: review
    inputSchema:
      type: object
      required: [repo, base, head, title, body]
      properties: { ... }
    prefill:
      repo: "$.workflow.input.repo"
      base: "main"
      head: "$.context.branch_name"
    executor: { kind: mcp, connection: github, tool: create_pull_request }

The link that appears in the response will already have repo, base, and head filled in. The model only needs to generate title and body — the genuinely creative fields.

Prefill resolves at link-generation time using $.context.* and $.workflow.input.*. It’s guidance, not enforcement — the model can override prefilled values if it has reason to, and the final submission is still validated against inputSchema.

Branching: dynamic targets

Sometimes where you go next depends on what the executor returned. Instead of a single target, you can declare branches:

transitions:
  run_tests:
    target: red
    executor:
      kind: cli
      connection: shell
      args: ["-c", "cargo test"]
      treatNonZeroAsFailure: false
    output:
      passed: "$.output.success"
    branches:
      - when: { kind: expr, expr: "$.context.passed == true" }
        target: green
      - when: { kind: expr, expr: "$.context.passed == false" }
        target: red

Branches evaluate after the executor succeeds and after output mappings apply (so branches can reference values just produced). First match wins. If no branch matches, the transition’s declared target is the fallback.

The treatNonZeroAsFailure: false flag on the CLI executor is important here — it turns a non-zero exit code into output.success: false instead of erroring the transition. This lets you use exit codes as data for branching.

A complete multi-state workflow

Here’s a deploy-gating pipeline that puts it all together — the earned-move flip from the ship guard, scaled up. Lint, test, and build all have to pass before deploy is even a move the agent sees. It uses deterministic chaining (steps marked actor: deterministic run automatically with no model turn — more on chaining), output mapping, prefill, and phase guidance:

workflows:
  deploy_pipeline:
    description: Lint, test, build, and deploy a service.
    tags: [deploy, ci, pipeline]
    initialState: lint
    maxChainDepth: 10

    inputSchema:
      type: object
      required: [service]
      properties:
        service: { type: string }
        environment:
          type: string
          enum: [staging, production]
          default: staging

    states:
      lint:
        goal: Validate code quality
        transitions:
          run_lint:
            target: test
            actor: deterministic
            executor:
              kind: cli
              command: lint-check
              args: ["$.input.service"]
            output:
              lintPassed: "$.output.json.passed"
              lintReport: "$.output.json.report"

      test:
        goal: Run the test suite
        transitions:
          run_tests:
            target: build
            actor: deterministic
            executor:
              kind: cli
              command: test-runner
              args: ["$.input.service"]
            output:
              testsPassed: "$.output.json.passed"
              testCount: "$.output.json.count"
              coverage: "$.output.json.coverage"

      build:
        goal: Build the deployment artifact
        transitions:
          build_artifact:
            target: ready_to_deploy
            actor: deterministic
            executor:
              kind: cli
              command: build-artifact
              args: ["$.input.service"]
            output:
              artifactId: "$.output.json.artifactId"

      ready_to_deploy:
        goal: Confirm deployment
        guidance: >
          All automated checks passed. Review the lint report,
          test results, and build artifact before deciding to deploy.
        transitions:
          deploy:
            title: Deploy to environment
            target: deployed
            actor: agent
            prefill:
              artifact: "$.context.artifactId"
              env: "$.workflow.input.environment"
            executor:
              kind: cli
              command: deploy
              args: ["$.context.artifactId", "$.input.environment"]

          abort:
            title: Abort deployment
            target: aborted
            actor: agent

      deployed:
        terminal: true

      aborted:
        terminal: true

When you call flowgate.command({definitionId: "deploy_pipeline", input: {...}}), the runtime chains through lint, test, and build automatically (all three are actor: deterministic). The model’s first response is at ready_to_deploy with the full context from all three steps. It sees two links: deploy or abort. The deploy link has artifact and env prefilled.

The model calls flowgate.command once and gets back the entire pipeline’s results in a single round trip. It only has to make one decision: deploy or abort.

Workflow-level timeout

You can set a deadline for the entire workflow:

workflows:
  approval:
    timeoutMs: 86400000    # 24 hours
    onTimeout:
      target: timed_out
    initialState: pending
    states:
      pending: { ... }
      timed_out: { terminal: true }

The timeout is lazy — it’s checked on the next submit or get. If the workflow has been alive longer than timeoutMs, the runtime auto-transitions to onTimeout.target and short-circuits whatever the caller submitted. No background scheduler needed.