Drain Queues (Cron)

KaireonAI’s BullMQ workers have two execution modes:

Always-on (WORKER_INPROCESS=1, the legacy default): five workers run continuously inside the API container, long-polling Redis with BLPOP / BRPOPLPUSH. Job latency ≈ 0. Idle Redis cost ≈ 30 ops/min permanently.
Cron-driven drain (WORKER_INPROCESS=0, recommended for free-tier Redis): no always-on workers. A scheduled cron hits POST /api/v1/cron/drain-queues every few minutes. Each invocation connects, processes available jobs, and disconnects. Job latency ≤ cron interval. Idle Redis cost ≈ 20 ops per invocation × invocations/day.

For a 5-minute cron with 5 idle queues, that’s ~170K ops/month — comfortably under the Upstash free tier of 500K, vs ~1.3M/month for always-on workers with no actual jobs.

When to use

Workload	Recommended mode
Free-tier Redis or low-traffic playground	Cron-driven, every 5 min
Production with paid Redis + active batch/journey traffic	Always-on
Mixed (paid Redis, but workers run elsewhere as a separate service)	Cron-driven on the API; standalone worker container for the heavy queues

Set the toggle

Set the env var on your API container and redeploy:

# Disables in-process workers (instrumentation.ts skips importing src/worker/index.ts)
WORKER_INPROCESS=0
# Required — used for cron auth
CRON_SECRET=<long-random-hex>

When WORKER_INPROCESS is unset or =1, the API container runs the legacy always-on worker. When =0, only /api/v1/cron/drain-queues produces job consumption.

POST /api/v1/cron/drain-queues

Drain queued jobs across the 5 BullMQ queues (batch-jobs, dsar-jobs, journey-jobs, retrain-jobs, seed-jobs).

Auth

Header: X-Cron-Secret: <CRON_SECRET> (or Authorization: Bearer <CRON_SECRET>). The endpoint also accepts CRON_TOKEN for backwards compatibility on environments that haven’t migrated.

Query parameters

Parameter	Type	Default	Description
`queue`	string	(all 5)	Drain only this queue. Allowed: `batch-jobs`, `dsar-jobs`, `journey-jobs`, `retrain-jobs`, `seed-jobs`.
`maxDurationMs`	number	`60000` (1 min)	Hard wall-clock cap. Max `600000` (10 min). The endpoint exits as soon as queues report idle, but never runs longer than this.
`maxConcurrentQueues`	number	`2`	How many queues run BullMQ workers in parallel inside one invocation. Caps concurrent Redis connections so idle ops stay low on free-tier Redis. Bump to 5 when self-hosted.
`maxJobsPerQueue`	number	unbounded	Safety stop per queue. Useful when chaining short cron ticks.

Response `200`

{
  "ok": true,
  "durationMs": 1234,
  "totalProcessed": 3,
  "totalFailed": 0,
  "queues": {
    "batch-jobs":   { "jobsProcessed": 1, "jobsFailed": 0, "idleAt": 412 },
    "dsar-jobs":    { "jobsProcessed": 0, "jobsFailed": 0, "idleAt": 0 },
    "journey-jobs": { "jobsProcessed": 2, "jobsFailed": 0, "idleAt": 891 },
    "retrain-jobs": { "jobsProcessed": 0, "jobsFailed": 0, "idleAt": 0 },
    "seed-jobs":    { "jobsProcessed": 0, "jobsFailed": 0, "idleAt": 0 }
  }
}

idleAt is the wall-clock-ms-since-start when the queue first reported idle. 0 means the queue was already empty when probed (no worker was started — the cheap getJobCounts probe runs and the endpoint moves on).

Error codes

Code	Reason
`400`	Unknown `queue` parameter.
`401`	Missing or invalid `CRON_SECRET`.
`500`	`REDIS_URL` not configured.

Scheduling — pick one

Option A — GitHub Actions cron (simplest, free)

.github/workflows/drain-queues.yml:

name: Drain Queues
on:
  schedule:
    - cron: "*/5 * * * *"   # every 5 minutes
  workflow_dispatch:
jobs:
  drain:
    runs-on: ubuntu-latest
    steps:
      - run: |
          curl -fsS -X POST "${{ secrets.PLAYGROUND_URL }}/api/v1/cron/drain-queues" \
            -H "X-Cron-Secret: ${{ secrets.CRON_SECRET }}" \
            -H "Content-Type: application/json"

GitHub’s free tier gives 2,000 minutes/month — plenty for 5-min cron.

Option B — AWS EventBridge schedule (preferred when already on AWS)

aws events put-rule --name kaireon-drain-queues \
  --schedule-expression "rate(5 minutes)" \
  --region us-east-1

aws events put-targets --rule kaireon-drain-queues \
  --targets '[{
    "Id": "1",
    "Arn": "arn:aws:apigateway:us-east-1:apigateway/.../api/v1/cron/drain-queues",
    "HttpParameters": {
      "HeaderParameters": { "X-Cron-Secret": "<CRON_SECRET>" }
    }
  }]'

(For App Runner, you may need an API Gateway connector or a Lambda intermediary since EventBridge can’t directly POST to App Runner URLs.)

Option C — External uptime monitor (cheap, hands-off)

Services like Cron-Job.org, EasyCron, or UptimeRobot can hit any HTTPS URL on a schedule. Configure:

URL: https://playground.kaireonai.com/api/v1/cron/drain-queues
Method: POST
Headers: X-Cron-Secret: <CRON_SECRET>
Schedule: */5 * * * *

Option D — Self-managed (k8s CronJob, supervised cron, etc.)

Use whatever scheduler your platform provides. Each tick should run:

curl -fsS -X POST "$KAIREON_URL/api/v1/cron/drain-queues" \
  -H "X-Cron-Secret: $CRON_SECRET"

Cost math

For a tenant with zero queued jobs (the common idle case on playground):

ops_per_invocation = 5 queues × 4 ops (getJobCounts) = 20
invocations_per_month = 12 × 24 × 30 = 8,640  (every 5 min)
ops_per_month_idle = ~173,000

Free tier (Upstash) = 500,000 ops/month
Headroom = ~327,000 ops for actual jobs + /recommend rate-limit + flow cache

For every-30-minute cron (lower latency tolerance):

ops_per_month_idle = 20 × 48 × 30 = ~29,000
Headroom = ~471,000 ops for everything else

In practice you can run a 5-min cron on a free-tier Redis with no concerns until you reach hundreds of /recommend calls per minute, at which point the rate-limiter cost dominates and you should upgrade Redis anyway.

Caveats

Job-failure semantics differ from always-on: in always-on mode, a failed job retries via BullMQ’s exponential backoff immediately. In cron-driven mode, retries are picked up on the next tick. For low-frequency workloads this is fine. For SLA-sensitive workloads, run always-on workers on paid Redis.
Long-running jobs (>maxDurationMs) will be aborted mid-flight when the worker closes. They’ll be re-enqueued by BullMQ’s stalled-job detector on the next tick. Set maxDurationMs higher than your longest expected job, or split jobs into smaller chunks.
The drain endpoint is idempotent — re-hitting it during an in-flight invocation just no-ops on jobs already in-flight (BullMQ’s lock semantics).

Overview

Decisioning

Studio

Data

Algorithms & Models

Content & Metrics

Orchestration

Customers & Traces

Events & Attribution

Analytics

AI

Testing & Simulation

Export & Import

Admin

Drain Queues (Cron)

When to use

Set the toggle

POST /api/v1/cron/drain-queues

Auth

Query parameters

Response `200`

Error codes

Scheduling — pick one

Option A — GitHub Actions cron (simplest, free)

Option B — AWS EventBridge schedule (preferred when already on AWS)

Option C — External uptime monitor (cheap, hands-off)

Option D — Self-managed (k8s CronJob, supervised cron, etc.)

Cost math

Caveats

Overview

Decisioning

Studio

Data

Algorithms & Models

Content & Metrics

Orchestration

Customers & Traces

Events & Attribution

Analytics

AI

Testing & Simulation

Export & Import

Admin

Documentation Index

​When to use

​Set the toggle

​POST /api/v1/cron/drain-queues

​Auth

​Query parameters

​Response 200

​Error codes

​Scheduling — pick one

​Option A — GitHub Actions cron (simplest, free)

​Option B — AWS EventBridge schedule (preferred when already on AWS)

​Option C — External uptime monitor (cheap, hands-off)

​Option D — Self-managed (k8s CronJob, supervised cron, etc.)

​Cost math

​Caveats

When to use

Set the toggle

POST /api/v1/cron/drain-queues

Auth

Query parameters

Response `200`

Error codes

Scheduling — pick one

Option A — GitHub Actions cron (simplest, free)

Option B — AWS EventBridge schedule (preferred when already on AWS)

Option C — External uptime monitor (cheap, hands-off)

Option D — Self-managed (k8s CronJob, supervised cron, etc.)

Cost math

Caveats