File Ingestion (Flow)

Connector scope
Pattern types
Date-template tokens
Ordering
Wait policy
Atomic staging
Crash recovery
Formats
Error model
Related

The Flow source executor reads files into a pipeline run with five production-grade behaviors:

Pattern matching — glob, regex, or date-templated globs.
Ordering — latest-by-mtime, FIFO, lexicographic, or process-all.
Wait policy — retry up to N times with configurable interval; act on deadline miss with alert / skip / fail.
Atomic staging — every matched file is renamed into a .processing/ folder before any byte is read; on success it moves to a date-templated .archive/ folder, on failure to .failed/.
Format parsers — CSV, JSON, and JSONL ship in Phase 3; Parquet, Avro, ORC, TSV, and XML throw an explicit scope-defer error rather than silently mishandle binary blobs.

Connector scope

Phase 3 implements the file-handling LOGIC for the local_fs connector. The IR schema validates seven connector kinds (s3, gcs, azure_blob, sftp, ftp, local_fs, http_pull) but only local_fs has a runtime executor today; the others throw a clear scope-defer error. Cloud-connector executors land in a dedicated follow-up phase that focuses on cloud-SDK integration + credentials + retry semantics — orthogonal to the file-handling logic shipped here.

Pattern types

{ "pattern": { "type": "glob",          "value": "orders_*.csv" } }
{ "pattern": { "type": "regex",         "value": "^events_[0-9]{4}\\.json$" } }
{ "pattern": { "type": "date_template", "value": "log_{YYYY-MM-DD}.jsonl",
                "timezone": "America/Los_Angeles" } }

Date-template tokens

Token	Meaning
`{YYYY}`	4-digit year
`{MM}`	2-digit month
`{DD}`	2-digit day
`{HH}`	2-digit hour (24h)
`{mm}`	2-digit minute
`{ss}`	2-digit second
`{YYYY-MM-DD}`	shorthand combo
`{YYYYMMDD}`	compact combo

Tokens are evaluated against the source’s configured timezone (IANA), defaulting to UTC. After expansion, the value is compiled as a glob (literal text + * and ? wildcards).

Ordering

Mode	Behavior
`latest_by_mtime`	mtime descending; newest first
`fifo_by_name`	lexicographic ascending
`lexicographic`	alias of `fifo_by_name`
`process_all`	preserve readdir order

The source executor processes ALL matched files in the chosen order. Single-file mode is a future option.

Wait policy

"waitPolicy": {
  "maxRetries": 6,
  "intervalMinutes": 10,
  "onMissAction": "alert"
}

The discover loop runs up to maxRetries times, sleeping intervalMinutes between attempts. On exhaustion:

`onMissAction`	Behavior
`alert`	Log a warn; return `rowsOut: 0`; downstream sees the empty result
`skip`	Log info only; return `rowsOut: 0`
`fail`	Throw `file arrival deadline missed`; the run fails

Atomic staging

"atomicity": {
  "stagingFolder": ".processing/",
  "successFolder": ".archive/{YYYY-MM-DD}/",
  "failureFolder": ".failed/"
}

Every matched file is renamed into stagingFolder BEFORE any byte is read. On clean parse, files move to successFolder (date templates are expanded at archive time, in UTC by default). On parse failure, they move to failureFolder and the upstream error rethrows. stagingFolder, successFolder, and failureFolder may be relative to path or absolute. If a file with the same basename already exists at the destination, a .dup-<unix-ms> suffix is appended.

Crash recovery

If a run crashes mid-parse, files remain in stagingFolder. A follow-up improvement will rescan that folder first on the next run; today the operator should manually move files back if needed.

Formats

Format	Phase 3 status
`csv`	implemented (header + comma-split; blank trailing lines ignored)
`json`	implemented (array → rows; object → single row)
`jsonl`	implemented (line-delimited; parse error reports the line number)
`parquet` / `avro` / `orc` / `tsv` / `xml`	scope-defer error at runtime

Error model

Where	Trigger	Effect
`resolvePattern`	malformed regex	run aborts with the original pattern
`discoverFiles`	dir not found	ENOENT propagates (operator-visible)
`waitForFiles`	exhausted + onMissAction=fail	run fails with `file arrival deadline missed`
`stageFiles`	rename error	rethrows; downstream archive routes to failureFolder
`parseByFormat`	malformed input	run fails; files moved to failureFolder
Unsupported format	`parquet` etc.	scope-defer error

Pipeline IR — the typed source-node schema.
AI Pipeline Authoring — chat surface that produces source-node IR.
MCP Flow Server — external-agent surface.

MCP Flow Server Loading Modes, Hooks & Validation

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

File Ingestion (Flow)

Connector scope

Pattern types

Date-template tokens

Ordering

Wait policy

Atomic staging

Crash recovery

Formats

Error model

Get Started

Deploy & Operate

Runbooks

Data Platform

Decisioning Studio

Execute & Optimize

Intelligence

Platform & Security

Integrations

Reports

Release Notes

​Connector scope

​Pattern types

​Date-template tokens

​Ordering

​Wait policy

​Atomic staging

​Crash recovery

​Formats

​Error model

​Related

Connector scope

Pattern types

Date-template tokens

Ordering

Wait policy

Atomic staging

Crash recovery

Formats

Error model

Related