# AWS Deployment Guide

> Deploy Agencio Predict to AWS for beta or production. Uses IAM roles (no env-var credential plumbing), AWS Secrets Manager for secrets, and Cognito for optional SSO.

This guide assumes a small-to-medium deploy (1-3 containers, < 100 concurrent users). For anything larger, see the "Scaling notes" at the bottom.

## Beta readiness status (2026-04-18)

All three ship-blockers from `BETA_READINESS.md` are resolved:

- **Redis-backed rate limiter + OAuth state** — both modules use Redis-first with in-memory fallback. Multi-task deploys are now safe as long as `REDIS_URL` points at ElastiCache.
- **Sentry wiring** — `@sentry/nextjs` is installed and `next.config.js` is wrapped by `withSentryConfig`. Runtime configs (`client` / `server` / `edge`) and a `global-error.tsx` are committed. They no-op until the DSN env vars are set — see Step 4 and Step 8 below.
- **Vitest smoke suite (27 tests across 6 files)** — `npm test -w @agencio-predict/be`. Wire this into CI before enabling auto-deploy.

**AI-layer additions (2026-04-18)** — deployment now has four new subsystems that need env configuration:

- **Regime-stress backtest** (`/api/predict/v1/algorithms/:id/stress-test`) — no new env; uses existing data providers.
- **AI-Engine signal-weight learner** + **Marketing learning auto-apply** + **Learning rollback monitor** — no new env; daily cron jobs tracked by migration 049 + 052.
- **RAG over news corpus** — requires `VOYAGE_API_KEY` (preferred) or `OPENAI_API_KEY` for embeddings. For corpora > ~50k articles, enable pgvector on RDS (PG 15.2+) and apply migration 053.
- **LLM sentiment scoring** — uses the existing `CLAUDE_API_KEY`; rate-limited by `LLM_SENTIMENT_DAILY_BUDGET` (default 5000 calls/UTC day).

Full-repo typecheck is clean (`npx tsc --build packages/shared packages/be packages/fe --force && npx tsc --noEmit -p apps/web/tsconfig.json`). **Next action to go live:** set `SENTRY_DSN` and `NEXT_PUBLIC_SENTRY_DSN` in the prod env to activate error reporting (Step 4 below).

## Target architecture

```
            ┌──────────────────┐
            │   Route 53 + ACM │  (TLS termination)
            └─────────┬────────┘
                      │
            ┌─────────▼────────┐
            │ ALB (HTTPS:443)  │
            │ HTTP→HTTPS redir │
            └─────────┬────────┘
                      │
        ┌─────────────┴─────────────┐
        │                           │
   ┌────▼────┐                 ┌────▼────┐
   │ ECS Task│                 │ ECS Task│
   │ predict-│                 │ predict-│
   │ web     │                 │ web     │
   │ (Next)  │                 │ (Next)  │
   └────┬────┘                 └────┬────┘
        │                           │
        └─────────┬─────────────────┘
                  │
   ┌──────────────┼──────────────────────┐
   ▼              ▼                      ▼
┌──────┐     ┌─────────┐          ┌──────────────┐
│ RDS  │     │ ElastiC │          │ S3 buckets   │
│Postgr│     │ Redis   │          │ agencio-…    │
│ 16 + │     │ (cache) │          └──────────────┘
│vector│     └─────────┘
└──────┘
                  │
                  ▼
            ┌─────────────┐     ┌─────────────┐
            │ Secrets Mgr │     │ Cognito     │
            │ (API keys)  │     │ User Pool   │
            └─────────────┘     │ (optional)  │
                                └─────────────┘
```

## Pre-flight (one-time setup)

### 1. Fork / clone the repo to somewhere deployable

```bash
git clone git@github.com:Agencio-Bertha/agencio-predict.git
cd agencio-predict
```

### 2. Build the prod image locally to verify it still compiles

```bash
docker build -f docker/web/Dockerfile -t agencio-predict-web:local .
docker run --rm -it agencio-predict-web:local node --version
```

If the build fails, stop and fix — deploying a half-compiled image is painful.

### 3. Push to a registry

Choose ECR (recommended — stays inside AWS):

```bash
aws ecr create-repository --repository-name agencio-predict-web --region us-east-1
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <account-id>.dkr.ecr.us-east-1.amazonaws.com
docker tag agencio-predict-web:local <account-id>.dkr.ecr.us-east-1.amazonaws.com/agencio-predict-web:latest
docker push <account-id>.dkr.ecr.us-east-1.amazonaws.com/agencio-predict-web:latest
```

## Step 1: Provision the data stores

### RDS Postgres

Minimum: `db.t4g.micro` (beta) or `db.t4g.small` (real users). **Postgres 15.2+ required** (pgvector extension support for the RAG layer). Postgres 16.x is the recommended track.

```bash
aws rds create-db-instance \
  --db-instance-identifier agencio-predict-prod \
  --db-instance-class db.t4g.small \
  --engine postgres \
  --engine-version 16.4 \
  --master-username postgres \
  --master-user-password <strong-password> \
  --allocated-storage 20 \
  --storage-type gp3 \
  --backup-retention-period 7 \
  --publicly-accessible false \
  --vpc-security-group-ids <sg-id> \
  --db-subnet-group-name <subnet-group>
```

**Critical:** put RDS in a private subnet, lock SG to allow inbound 5432 only from the ECS task role's security group.

Record the endpoint: `agencio-predict-prod.xxx.us-east-1.rds.amazonaws.com`.

**Enable pgvector (required for RAG, optional if you're not running `/api/predict/v1/rag/*`).** After the instance is available:

```bash
psql "$DATABASE_URL" -c "CREATE EXTENSION IF NOT EXISTS vector;"
```

RDS supports pgvector natively on every 15.2+ / 16.x / 17.x version — no parameter group changes required. The RAG module falls back to JSONB + Node-side cosine similarity if the extension isn't present (degrades gracefully up to ~50k articles; above that, apply migration 053 + the `/rag/migrate-pgvector` endpoint for HNSW-backed KNN).

### ElastiCache Redis

One node, `cache.t4g.micro` is fine for beta.

```bash
aws elasticache create-cache-cluster \
  --cache-cluster-id agencio-predict-redis \
  --engine redis \
  --cache-node-type cache.t4g.micro \
  --num-cache-nodes 1 \
  --security-group-ids <sg-id> \
  --engine-version 7.0
```

### S3 buckets

The app uses three buckets by default (prefix `agencio-` is configurable via `S3_BUCKET_PREFIX`):

```bash
for bucket in reports exports uploads; do
  aws s3 mb s3://agencio-${bucket} --region us-east-1
  aws s3api put-public-access-block --bucket agencio-${bucket} \
    --public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
done
```

## Step 2: Apply migrations to RDS

From a machine that can reach RDS (a bastion, CloudShell, or your laptop with an SSH tunnel):

```bash
export DATABASE_URL="postgresql://postgres:<password>@<rds-endpoint>:5432/predict_db"

# Create the DB if it doesn't exist
psql "postgres://postgres:<password>@<rds-endpoint>:5432/postgres" -c "CREATE DATABASE predict_db;"

# Apply migrations in order
for migration in db/migrations/*.sql; do
  echo "Applying $migration"
  psql "$DATABASE_URL" -f "$migration"
done
```

The migrations are idempotent — safe to re-run. On the first clone there will be ~53 files, including:

- `048_news_sentiment_llm.sql` — LLM sentiment metadata on `news_archive`
- `049_ai_signal_weights.sql` — ensemble weight persistence for the AI-Engine learning loop
- `050_rag_embeddings.sql` — embedding storage + query audit for RAG
- `051_social_posts_sentiment_llm.sql` — LLM sentiment metadata on `social_posts`
- `052_learning_rollback_schema.sql` — pre/post-adjustment metrics on the two learning-history tables for auto-rollback
- `053_pgvector_upgrade.sql` — **optional**, run only after `CREATE EXTENSION vector` succeeded. Adds a `vector(1536)` column + HNSW index on `news_archive`

### Warm up the flow-engine history table

Z-scores on the flow engine depend on 30d of trailing data. Before launch:

```bash
DATABASE_URL=<url> npx tsx scripts/backfill-flow-history.ts --days 90
```

This runs the overlay-derived flow adapter for 90 days of history and seeds `flow_edges_history`.

## Step 3: Provision Cognito (optional but recommended)

If you want SSO / MFA for admins, create a Cognito user pool:

```bash
aws cognito-idp create-user-pool \
  --pool-name agencio-predict-prod \
  --policies '{"PasswordPolicy":{"MinimumLength":12,"RequireUppercase":true,"RequireLowercase":true,"RequireNumbers":true,"RequireSymbols":true}}' \
  --mfa-configuration OPTIONAL \
  --auto-verified-attributes email
```

Record the `UserPoolId`. Create an app client:

```bash
aws cognito-idp create-user-pool-client \
  --user-pool-id <pool-id> \
  --client-name agencio-predict-web \
  --explicit-auth-flows ALLOW_USER_PASSWORD_AUTH ALLOW_REFRESH_TOKEN_AUTH \
  --generate-secret
```

Record the `ClientId` and client secret.

## Step 4: Store secrets in Secrets Manager

Every secret the app expects from env becomes a row in Secrets Manager under a prefix you choose (e.g. `agencio-predict-prod/`):

```bash
PREFIX="agencio-predict-prod"

# JWT signing
aws secretsmanager create-secret --name "${PREFIX}/JWT_SECRET" \
  --secret-string "$(openssl rand -base64 48)"

# Broker credential encryption (REQUIRED in prod — the app hard-fails without it)
aws secretsmanager create-secret --name "${PREFIX}/CREDENTIALS_ENCRYPTION_KEY" \
  --secret-string "$(openssl rand -hex 32)"

# DB URL — Secrets Manager can rotate this with RDS integration
aws secretsmanager create-secret --name "${PREFIX}/DATABASE_URL" \
  --secret-string "postgresql://postgres:<password>@<rds-endpoint>:5432/predict_db"

# Redis
aws secretsmanager create-secret --name "${PREFIX}/REDIS_URL" \
  --secret-string "redis://<elasticache-endpoint>:6379"

# Cognito (if using SSO)
aws secretsmanager create-secret --name "${PREFIX}/COGNITO_USER_POOL_ID" --secret-string "<pool-id>"
aws secretsmanager create-secret --name "${PREFIX}/COGNITO_CLIENT_ID"    --secret-string "<client-id>"
aws secretsmanager create-secret --name "${PREFIX}/COGNITO_CLIENT_SECRET" --secret-string "<secret>"

# LLM (at least one is needed for the entity extractor + AI features)
aws secretsmanager create-secret --name "${PREFIX}/CLAUDE_API_KEY" \
  --secret-string "<your-claude-key>"

# Sentry (activates error reporting — both vars required; client DSN is public by design)
aws secretsmanager create-secret --name "${PREFIX}/SENTRY_DSN" --secret-string "<server-dsn>"
aws secretsmanager create-secret --name "${PREFIX}/NEXT_PUBLIC_SENTRY_DSN" --secret-string "<client-dsn>"
# Optional — only set if you want CI to upload sourcemaps on every build
aws secretsmanager create-secret --name "${PREFIX}/SENTRY_ORG" --secret-string "<org-slug>"
aws secretsmanager create-secret --name "${PREFIX}/SENTRY_PROJECT" --secret-string "<project-slug>"
aws secretsmanager create-secret --name "${PREFIX}/SENTRY_AUTH_TOKEN" --secret-string "<token>"

# Optional — only the ones you're using
aws secretsmanager create-secret --name "${PREFIX}/FRED_API_KEY" --secret-string "<key>"
aws secretsmanager create-secret --name "${PREFIX}/FINNHUB_API_KEY" --secret-string "<key>"
aws secretsmanager create-secret --name "${PREFIX}/NEWSAPI_KEY" --secret-string "<key>"
aws secretsmanager create-secret --name "${PREFIX}/GUARDIAN_API_KEY" --secret-string "<key>"
aws secretsmanager create-secret --name "${PREFIX}/NYT_API_KEY" --secret-string "<key>"
aws secretsmanager create-secret --name "${PREFIX}/NEWSDATA_API_KEY" --secret-string "<key>"
aws secretsmanager create-secret --name "${PREFIX}/TWITTER_BEARER_TOKEN" --secret-string "<key>"

# Embedding provider (required for RAG — set ONE, not both; Voyage preferred)
aws secretsmanager create-secret --name "${PREFIX}/VOYAGE_API_KEY"  --secret-string "<voyage-key>"
# aws secretsmanager create-secret --name "${PREFIX}/OPENAI_API_KEY" --secret-string "<openai-key>"
```

**AI-layer tuning env vars** (set as plain env in the task definition, not secrets):

| Var | Default | Purpose |
|---|---|---|
| `LLM_SENTIMENT_DAILY_BUDGET` | `5000` | Max Haiku calls per UTC day for news + social sentiment scoring. Prevents runaway cost. |
| `SCHEDULER_ENABLED` | `true` | Master switch for all cron jobs (learning, RAG backfill, social scoring, rollback monitor). Set to `false` on non-leader tasks when running multi-replica ECS. |

> `NEXT_PUBLIC_SENTRY_DSN` must be present at **build time** for the client bundle (Next.js inlines `NEXT_PUBLIC_*` at build). Pass it to `docker build --build-arg` or bake it into the task definition's build pipeline — reading it at runtime alone will leave the browser bundle silent. `SENTRY_DSN` is server-only and can be pulled from Secrets Manager at container start.

The app reads these automatically when `USE_SECRETS_MANAGER=true` + `SECRETS_PREFIX=agencio-predict-prod/` are set.

## Step 5: Create the ECS task IAM role

This is the role the container runs as. It needs:

- Read from Secrets Manager (the prefix above)
- Read/write to the three S3 buckets
- Cognito admin operations (if using SSO)
- CloudWatch Logs

Create a trust policy file `ecs-trust.json`:

```json
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": { "Service": "ecs-tasks.amazonaws.com" },
    "Action": "sts:AssumeRole"
  }]
}
```

```bash
aws iam create-role \
  --role-name AgencioPredictTaskRole \
  --assume-role-policy-document file://ecs-trust.json
```

Attach inline policy `task-permissions.json`:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "SecretsManagerRead",
      "Effect": "Allow",
      "Action": ["secretsmanager:GetSecretValue", "secretsmanager:DescribeSecret"],
      "Resource": "arn:aws:secretsmanager:us-east-1:<account-id>:secret:agencio-predict-prod/*"
    },
    {
      "Sid": "S3Access",
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::agencio-reports", "arn:aws:s3:::agencio-reports/*",
        "arn:aws:s3:::agencio-exports", "arn:aws:s3:::agencio-exports/*",
        "arn:aws:s3:::agencio-uploads", "arn:aws:s3:::agencio-uploads/*"
      ]
    },
    {
      "Sid": "CognitoAdmin",
      "Effect": "Allow",
      "Action": [
        "cognito-idp:AdminCreateUser", "cognito-idp:AdminDeleteUser",
        "cognito-idp:AdminDisableUser", "cognito-idp:AdminEnableUser",
        "cognito-idp:AdminResetUserPassword", "cognito-idp:AdminSetUserPassword",
        "cognito-idp:AdminGetUser", "cognito-idp:ListUsers",
        "cognito-idp:AdminSetUserMFAPreference", "cognito-idp:DescribeUserPool",
        "cognito-idp:DescribeUserPoolClient", "cognito-idp:InitiateAuth",
        "cognito-idp:SignUp", "cognito-idp:GlobalSignOut"
      ],
      "Resource": "arn:aws:cognito-idp:us-east-1:<account-id>:userpool/<pool-id>"
    }
  ]
}
```

```bash
aws iam put-role-policy \
  --role-name AgencioPredictTaskRole \
  --policy-name AgencioPredictTaskPolicy \
  --policy-document file://task-permissions.json
```

Also create a task execution role (for pulling the ECR image + writing logs):

```bash
aws iam create-role --role-name AgencioPredictExecutionRole \
  --assume-role-policy-document file://ecs-trust.json
aws iam attach-role-policy --role-name AgencioPredictExecutionRole \
  --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
```

## Step 6: Register the ECS task definition

Create `task-def.json`:

```json
{
  "family": "agencio-predict-web",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "1024",
  "memory": "2048",
  "executionRoleArn": "arn:aws:iam::<account-id>:role/AgencioPredictExecutionRole",
  "taskRoleArn": "arn:aws:iam::<account-id>:role/AgencioPredictTaskRole",
  "containerDefinitions": [
    {
      "name": "predict-web",
      "image": "<account-id>.dkr.ecr.us-east-1.amazonaws.com/agencio-predict-web:latest",
      "portMappings": [{ "containerPort": 3000, "protocol": "tcp" }],
      "environment": [
        { "name": "NODE_ENV", "value": "production" },
        { "name": "PORT", "value": "3000" },
        { "name": "AWS_REGION", "value": "us-east-1" },
        { "name": "USE_SECRETS_MANAGER", "value": "true" },
        { "name": "SECRETS_PREFIX", "value": "agencio-predict-prod/" },
        { "name": "STORAGE_PROVIDER", "value": "s3" },
        { "name": "S3_BUCKET_PREFIX", "value": "agencio-" },
        { "name": "AUTH_PROVIDER", "value": "cognito" }
      ],
      "secrets": [
        { "name": "JWT_SECRET",                 "valueFrom": "arn:aws:secretsmanager:us-east-1:<account-id>:secret:agencio-predict-prod/JWT_SECRET" },
        { "name": "DATABASE_URL",               "valueFrom": "arn:aws:secretsmanager:us-east-1:<account-id>:secret:agencio-predict-prod/DATABASE_URL" },
        { "name": "REDIS_URL",                  "valueFrom": "arn:aws:secretsmanager:us-east-1:<account-id>:secret:agencio-predict-prod/REDIS_URL" },
        { "name": "CREDENTIALS_ENCRYPTION_KEY", "valueFrom": "arn:aws:secretsmanager:us-east-1:<account-id>:secret:agencio-predict-prod/CREDENTIALS_ENCRYPTION_KEY" },
        { "name": "CLAUDE_API_KEY",             "valueFrom": "arn:aws:secretsmanager:us-east-1:<account-id>:secret:agencio-predict-prod/CLAUDE_API_KEY" },
        { "name": "COGNITO_USER_POOL_ID",       "valueFrom": "arn:aws:secretsmanager:us-east-1:<account-id>:secret:agencio-predict-prod/COGNITO_USER_POOL_ID" },
        { "name": "COGNITO_CLIENT_ID",          "valueFrom": "arn:aws:secretsmanager:us-east-1:<account-id>:secret:agencio-predict-prod/COGNITO_CLIENT_ID" },
        { "name": "COGNITO_CLIENT_SECRET",      "valueFrom": "arn:aws:secretsmanager:us-east-1:<account-id>:secret:agencio-predict-prod/COGNITO_CLIENT_SECRET" }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/agencio-predict-web",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs",
          "awslogs-create-group": "true"
        }
      },
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:3000/api/health || exit 1"],
        "interval": 30, "timeout": 5, "retries": 3, "startPeriod": 60
      }
    }
  ]
}
```

```bash
aws ecs register-task-definition --cli-input-json file://task-def.json
```

## Step 7: Create the cluster + service

```bash
aws ecs create-cluster --cluster-name agencio-predict-prod

aws ecs create-service \
  --cluster agencio-predict-prod \
  --service-name predict-web \
  --task-definition agencio-predict-web:1 \
  --launch-type FARGATE \
  --desired-count 2 \
  --network-configuration "awsvpcConfiguration={subnets=[<private-subnet-1>,<private-subnet-2>],securityGroups=[<task-sg>],assignPublicIp=DISABLED}" \
  --load-balancers "targetGroupArn=<tg-arn>,containerName=predict-web,containerPort=3000" \
  --health-check-grace-period-seconds 120
```

(ALB + target group + ACM cert setup omitted for brevity — standard AWS plumbing.)

## Step 8: First-boot validation

Once the service is running, hit these in order:

```bash
# Health
curl https://predict.yourcompany.com/api/health

# Create an admin user (replace <JWT> with an admin JWT if you have one,
# or use the initial super-admin flow from /admin/users page)

# Log in and get an admin JWT, then:
TOKEN="<jwt>"

# Confirm AWS wiring via the admin page
curl -H "Authorization: Bearer $TOKEN" https://predict.yourcompany.com/api/predict/v1/admin/aws
# Expected: credentialSource="ecs-task-role", region=us-east-1, services.cognito.configured=true

# Run all AWS probes
curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"service":"all"}' \
  https://predict.yourcompany.com/api/predict/v1/admin/aws
# Expected: 3 results, all ok=true
```

If `credentialSource` shows `explicit-env` or `none`, the IAM task role isn't wired correctly — do not launch. ECS tasks should resolve credentials from the container credentials endpoint, never from env vars in production.

### Verify Sentry is actually capturing

Both the server and client DSN need to be present for coverage to be complete. Quickest smoke test:

```bash
# Trigger a server-side 500 on any throwaway path
curl https://predict.yourcompany.com/api/predict/v1/__nonexistent__
```

Open the Sentry UI within a minute — you should see the 404/500 event with `environment=production` and the git SHA under `release`. If nothing shows up:

- Check the container logs for `[Sentry]` init lines on boot — absent lines mean `SENTRY_DSN` wasn't injected.
- For the client bundle, view-source on any page and search for your project ID — if the DSN isn't baked in, rebuild with `NEXT_PUBLIC_SENTRY_DSN` as a `--build-arg` and redeploy.

### Verify Redis is serving rate-limit + OAuth state

```bash
# Hit any rate-limited endpoint twice, then inspect
redis-cli -u $REDIS_URL KEYS 'ratelimit:*'
# Expect: ratelimit:hour:<keyId>, ratelimit:day:<keyId>

# Start any ad-platform OAuth flow from /settings, then
redis-cli -u $REDIS_URL KEYS 'oauth:state:*'
# Expect: at least one 10-minute TTL entry
```

If these show empty, the app is falling back to in-memory state — check that `REDIS_URL` is reachable from the task's security group.

## Step 9: Configure news + social + flow sources

Once the admin user is in, go to `/admin/news`, `/admin/social-follows`, `/admin/flow`, and enable the sources you have keys for. The scheduler will start polling within 5-15 minutes.

## Scaling notes

The architecture above handles ~100 concurrent users. Before scaling further:

1. **Rate limiter + OAuth state are Redis-backed.** Multi-task deploys are safe as long as `REDIS_URL` points at ElastiCache. The in-memory path is only a fallback for single-instance dev — confirm it's not active in prod using the `redis-cli KEYS 'ratelimit:*'` check in Step 8.
2. **Scheduled jobs run in every task.** Idempotent for most jobs but inefficient and, for the *learning* jobs, actively wrong (same adjustment applied N times). Pin the scheduler to a single task via `SCHEDULER_ENABLED=false` on non-leader replicas, or introduce leader election. The jobs that are most sensitive to this are:
   - `marketing-learning-cycle` (24h) — writes multiplier adjustments; duplicate runs compound the Δ cap
   - `ai-signal-weight-cycle` (24h) — writes ensemble weights; same issue
   - `learning-rollback-monitor` (24h) — double-rolls if run twice
   - `rag-embedding-backfill` (1h) — safe-ish but burns embedding API calls
   - `social-llm-sentiment` (15m) — same, burns Haiku calls
3. **Database connection pooling.** The default pool is per-task; for >3 tasks against a small RDS, introduce RDS Proxy.
4. **SSE / WebSocket handling.** The `/api/predict/v1/sse/*` routes use long-lived connections. ALB idle timeout must be bumped to ≥300s.
5. **RAG retrieval latency.** Below ~50k articles the JSONB + Node-side cosine path is fine (~50-150ms). Above that, apply migration 053 and call `POST /api/predict/v1/rag/migrate-pgvector` to copy vectors into the pgvector column; HNSW-backed queries are ~5-20ms even at millions of rows.
6. **LLM cost ceiling.** Watch `LLM_SENTIMENT_DAILY_BUDGET` (Haiku) + embedding call volume (backfill job emits counts to CloudWatch logs). Rough rule: 1000 news articles × daily cycle ≈ $0.30/day on Voyage, $0.10/day on Haiku scoring.

## Rollback procedure

```bash
# Revert task def to previous revision
aws ecs update-service --cluster agencio-predict-prod --service predict-web \
  --task-definition agencio-predict-web:<prev-revision>

# If a migration broke the DB, restore from the latest automated RDS snapshot
aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier agencio-predict-rollback \
  --db-snapshot-identifier <snapshot-id>
```

## Cost estimate (beta, us-east-1)

| Component | Size | Monthly |
|---|---|---|
| ECS Fargate | 2 × 1 vCPU / 2 GB | ~$60 |
| RDS Postgres | db.t4g.small, 20 GB gp3 (PG 16.4 + pgvector) | ~$30 |
| ElastiCache Redis | cache.t4g.micro | ~$12 |
| S3 | 10 GB + requests | ~$2 |
| ALB | 1 | ~$20 |
| Secrets Manager | ~18 secrets | ~$7 |
| CloudWatch logs | ~5 GB/mo | ~$3 |
| Cognito | <50 MAU | Free |
| Data transfer | 50 GB out | ~$5 |
| Claude Haiku (LLM sentiment + RAG synthesis) | budget 5k calls/day | ~$6 |
| Voyage AI embeddings (RAG backfill + queries) | ~1k articles/day | ~$5 |
| Anthropic Claude Sonnet (algo critique / jury) | on-demand, light | ~$10 |
| **Total** | | **~$160/mo** |

Scaling to 500 MAU + 4 ECS tasks + `db.t4g.medium` bumps this to around $400/mo (LLM costs scale faster than infra once the corpus grows — watch `LLM_SENTIMENT_DAILY_BUDGET`).

## Related docs

- `docs/BETA_READINESS.md` — scorecard of what's ready vs sharp-edged before launch
- `docs/11-security-overview.md` — auth / RBAC / credential flows
- `docs/10-infrastructure-deployment.md` — original (more generic) deployment guide
- `/admin/aws` — the in-product diagnostic panel that verifies a live deploy
