AI Pipeline Authoring

AI Pipeline Mode lives in the right-side AI panel. Open it on any /data/flow-pipelines/* page, describe what you want (“add a daily CSV ingestion of /tmp/orders.csv into starbucks.orders”), and the assistant produces a typed IR patch. The panel renders the patch as a colored diff with three actions: Apply, Show full IR, Reject (reason).

How the AI stays honest

Three validation layers, all enforced server-side:

Constrained JSON generation — the provider’s output is bound by a Zod schema via Vercel AI SDK generateObject. Invalid shapes are rejected at the wire.
Patch application — the AI emits an RFC 6902 JSON Patch, not a full IR. Our applier rejects paths that do not exist in the current IR.
Validate-then-regenerate — the patched IR is fed through the Phase 1 parsePipelineIR validator. If structural checks fail, the server re-prompts the AI with the exact error and retries (up to 3 times). You only ever see IR that the runtime can execute.

Every proposal — success, failure, error, or rejected — is written to the audit log with entityType = 'pipeline_ai_proposal': full patch, rationale, retry count, token count. Rejections capture your reason text.

Prerequisites

Tenant setting flowIrEnabled must be on (defaults to true; the flag exists as a kill-switch only).
AI provider configured in platform settings. Any provider that supports structured output works — Anthropic, OpenAI, Google, AWS Bedrock. Providers that lack constrained-JSON support will return a 400 with provider_unsupported_structured.

Rate limits

100 proposals per tenant per hour. Each proposal may use up to 3 internal retries against your provider. Your provider’s per-request rate limits still apply.

Phase 1 scope

The AI is explicitly warned to stay inside the runtime’s current capabilities:

Sources — runtime executors are live for local_fs, s3, gcs, azure_blob, sftp, and http_pull. ftp is documented as deprecated (plaintext) and unreachable from runtime. Streaming consumer kinds (kafka, kinesis, pulsar) are gated by FLOW_STREAMING_ENABLED — disabled by default in playground.
Target load modes — append, truncate, upsert, blue_green, and incremental_watermark are all live. cdc_mirror is gated by the same streaming env flag and throws “streaming disabled” or “Debezium connector not configured” depending on env state.
Validate node — five dataset-level checks live (rowCount, freshness, fkIntegrity, cardinality, duplicateKey).

If your prompt asks for something outside the current scope, the AI responds with an empty patch and a rationale explaining the limit.

API

POST /api/v1/ai/pipeline-chat Request body:

{
  "pipelineId": "optional-uuid-or-name",
  "message": "describe the change you want"
}

Response (200, valid):

{
  "ok": true,
  "rationale": "Adds a validate node that aborts when fewer than 100 rows arrive.",
  "patch": [
    {
      "op": "add",
      "path": "/nodes/-",
      "value": {
        "id": "v",
        "kind": "validate",
        "input": "src",
        "datasetLevel": { "rowCount": { "min": 100, "onFail": "abort" } }
      }
    }
  ],
  "resultingIr": { /* full pipeline IR after the patch */ },
  "warnings": [],
  "retries": 0,
  "tokensUsed": 1842
}

Response (200, retries exhausted):

{
  "ok": false,
  "rationale": "...",
  "errors": ["...", "..."],
  "retries": 2,
  "tokensUsed": 4012
}

Other statuses:

Status	Body `error`	Meaning
400	`provider_unsupported_structured`	Tenant’s AI provider lacks constrained JSON
403	`flow_ir_disabled`	Enable `flowIrEnabled` in tenant settings
429	`rate_limited`	100/h budget exhausted; check `Retry-After` header
503	`provider_unavailable`	AI provider down

POST /api/v1/ai/pipeline-chat/feedback Captures rejection reasons:

{
  "pipelineId": "p1",
  "rationale": "<the rationale that was rejected>",
  "patch": [ /* the patch that was rejected */ ],
  "reason": "wrong target table"
}

Pipeline IR — the underlying typed document the AI edits.
Pipelines API — how to apply an IR via POST /api/v1/pipelines with irVersion: "1.0".

AI Document Import Agent Playbooks

​How the AI stays honest

​Prerequisites

​Rate limits

​Phase 1 scope

​API

​Related

How the AI stays honest

Prerequisites

Rate limits

Phase 1 scope

API

Related