Audience: SREs, DBAs, platform operators Last updated: 2026-02-23 Database: PostgreSQL 15+ via Amazon RDS (or self-managed) Connection pooler: PgBouncer
Table of Contents
- Backup
- Restore
- Manual VACUUM
- Index Maintenance
- Schema Migrations
- Monitoring Queries
- Connection Pool Tuning (PgBouncer)
1. Backup
Automated Backups (RDS)
RDS automated backups are configured with a 7-day retention period. Snapshots are taken daily during the maintenance window (02:00-03:00 UTC).Manual Backup
Use themake backup target or run the equivalent commands directly.
Makefile target:
RDS Snapshot
Backup Verification
Run monthly to ensure backups are valid:2. Restore
Restore from pg_dump Backup
Use themake restore target or run the equivalent commands directly.
Makefile target:
Restore from RDS Snapshot
Point-in-Time Recovery (RDS)
Post-Restore Checklist
- Verify row counts match expectations.
- Refresh planner statistics on all tables (PostgreSQL ANALYZE).
- Verify application connectivity.
- Run a smoke test against the decision endpoint.
- Update DNS or connection strings if restoring to a new instance.
3. Manual VACUUM
When to VACUUM
- Dead tuple ratio exceeds 20% on any table.
- After large bulk deletes or updates.
- Before
VACUUM FULLif table bloat exceeds 50%. - Autovacuum is lagging (check
pg_stat_user_tables.last_autovacuum).
Standard VACUUM
Does not lock the table. Safe to run during production hours.VACUUM FULL
Reclaims disk space by rewriting the table. Acquires an exclusive lock. Schedule during maintenance windows only.Autovacuum Tuning
Monitor Autovacuum Progress
4. Index Maintenance
Check Index Health
Rebuild Indexes
UseREINDEX CONCURRENTLY to avoid locking the table.
Create Missing Indexes
Common queries that benefit from indexes:Drop Unused Indexes
5. Schema Migrations
Using Prisma db push
KaireonAI uses Prisma 7 withdb push for schema synchronization.
Standard migration workflow:
- Back up the database (see Backup).
- Review the schema diff carefully.
- Check for data loss warnings.
- Run in staging first.
- Coordinate with the team (announce in
#kaireon-deployments).
- Refresh planner statistics on affected tables (PostgreSQL ANALYZE).
- Verify application starts without errors.
- Check that all API endpoints return valid responses.
- Monitor error rates for 15 minutes.
Manual SQL Migrations
For changes that Prisma cannot handle (data migrations, custom constraints):Rollback Procedure
6. Monitoring Queries
Dashboard Queries
Run these queries periodically or integrate them into Grafana via the PostgreSQL data source. Database size and growth:Alerting Thresholds
| Metric | Warning | Critical |
|---|---|---|
| Cache hit ratio | <99% | <95% |
| Dead tuple ratio | >10% | >30% |
| Active connections | >70% of max | >90% of max |
| Replication lag | >1MB | >10MB |
| Long-running queries | >30s | >120s |
| Idle in transaction | >60s | >300s |
| Database size growth | >5%/day | >10%/day |
7. Connection Pool Tuning (PgBouncer)
Current Configuration
PgBouncer sits between the KaireonAI application and PostgreSQL, multiplexing connections. Architecture:Configuration File
Located in the PgBouncer ConfigMap:kaireon-pgbouncer-config
Pool Sizing Formula
Monitoring PgBouncer
Tuning for Common Scenarios
High API traffic (many short queries):Applying Configuration Changes
Troubleshooting PgBouncer
Clients waiting for connections:Maintenance Schedule
| Task | Frequency | Window | Impact |
|---|---|---|---|
| Automated backup (RDS) | Daily | 02:00-03:00 UTC | None |
| Manual backup verification | Monthly | Business hours | None |
| VACUUM ANALYZE (large tables) | Nightly | 03:00-04:00 UTC | None |
| VACUUM FULL (if needed) | Quarterly | Maintenance window | Table locked |
| REINDEX CONCURRENTLY | Monthly | Off-peak hours | Minimal |
| Index usage review | Quarterly | Business hours | None |
| Connection pool review | Quarterly | Business hours | None |
| Disk usage review | Weekly | Business hours | None |
| Statistics update (ANALYZE) | Nightly | 03:00-04:00 UTC | None |