# EC2 Scheduler Deployment System

> **Document:** 62-ec2-scheduler-deployment.md
> **Created:** 2026-05-10
> **Status:** Production

This document describes the deployment system for the Agencio Predict scheduler service running on EC2.

---

## Architecture Overview

```
┌─────────────────────────────────────────────────────────────────┐
│                        GitHub                                    │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────────┐  │
│  │ Push to     │───▶│ CI Workflow │───▶│ Deploy to EC2       │  │
│  │ main        │    │ (tests)     │    │ Workflow            │  │
│  └─────────────┘    └─────────────┘    └──────────┬──────────┘  │
└──────────────────────────────────────────────────┼──────────────┘
                                                   │
                                                   ▼
┌─────────────────────────────────────────────────────────────────┐
│                    AWS (ap-southeast-1)                          │
│                                                                  │
│  ┌─────────────────────┐      ┌─────────────────────────────┐   │
│  │ Security Group      │      │ EC2 (i-030715bbe93611746)   │   │
│  │ sg-00c0ecc701b7533dc│      │ 54.255.100.122              │   │
│  │                     │      │                             │   │
│  │ • Dynamic SSH rule  │─────▶│ ┌─────────────────────┐     │   │
│  │   added/removed     │      │ │ PM2                 │     │   │
│  │   per deployment    │      │ │ ├── agencio-scheduler│    │   │
│  └─────────────────────┘      │ │ └── silo-heartbeat  │     │   │
│                               │ └─────────────────────┘     │   │
│                               │                             │   │
│                               │ ┌─────────────────────┐     │   │
│                               │ │ PostgreSQL :5432    │     │   │
│                               │ └─────────────────────┘     │   │
│                               │                             │   │
│                               │ ┌─────────────────────┐     │   │
│                               │ │ Redis :6379         │     │   │
│                               └─┴─────────────────────┴─────┘   │
└─────────────────────────────────────────────────────────────────┘
```

---

## Deployment Methods

### 1. Automatic Deployment (GitHub Actions)

Deployments are triggered automatically when:
- A push to `main` passes CI
- Manual workflow dispatch

**Workflow:** `.github/workflows/deploy-ec2.yml`

**Process:**
1. CI workflow runs tests and type checking
2. On CI success, deploy workflow triggers
3. Workflow gets runner's public IP
4. Adds temporary SSH rule to security group
5. Connects via EC2 Instance Connect
6. Uploads tarball with code + migrations
7. Runs npm install
8. Applies pending migrations
9. Restarts PM2 scheduler
10. Removes temporary SSH rule

**Security:**
- SSH access is granted only for the duration of deployment
- Uses EC2 Instance Connect (no permanent SSH keys)
- Security group rule is removed even on failure

### 2. Manual Deployment (Local)

Use `scripts/deploy-scheduler.sh` from your local machine:

```bash
./scripts/deploy-scheduler.sh
```

**Requirements:**
- AWS CLI configured with credentials
- SSH key at `~/.ssh/id_rsa.pub`
- Access to EC2 Instance Connect

---

## File Structure

```
agencio-predict/
├── .github/workflows/
│   ├── ci.yml                    # CI: tests, typecheck, build
│   └── deploy-ec2.yml            # Deploy to EC2 after CI passes
├── scripts/
│   ├── deploy-scheduler.sh       # Manual deployment script
│   └── run-migrations.sh         # Migration runner
├── packages/be/src/scheduler/
│   └── index.ts                  # Scheduler entry point
└── db/migrations/
    └── *.sql                     # Database migrations
```

---

## GitHub Actions Workflow

### Triggers

```yaml
on:
  workflow_run:
    workflows: ["CI"]
    branches: [main]
    types: [completed]
  workflow_dispatch:  # Manual trigger
```

### Environment Variables

| Variable | Value | Description |
|----------|-------|-------------|
| `AWS_REGION` | ap-southeast-1 | AWS region |
| `EC2_INSTANCE_ID` | i-030715bbe93611746 | EC2 instance ID |
| `EC2_PUBLIC_IP` | 54.255.100.122 | EC2 elastic IP |
| `EC2_USER` | ec2-user | SSH user |
| `SECURITY_GROUP_ID` | sg-00c0ecc701b7533dc | Security group ID |

### Required GitHub Secrets

| Secret | Description |
|--------|-------------|
| `AWS_ACCESS_KEY_ID` | AWS access key |
| `AWS_SECRET_ACCESS_KEY` | AWS secret key |

### Workflow Steps

1. **Get runner IP and add to security group**
   - Fetches runner's public IP via `checkip.amazonaws.com`
   - Adds temporary SSH rule to security group

2. **Check EC2 instance state**
   - Verifies instance is running
   - Starts instance if stopped

3. **Setup SSH via EC2 Instance Connect**
   - Generates temporary SSH key
   - Pushes public key via Instance Connect API
   - Key valid for 60 seconds

4. **Create deployment tarball**
   - Packages: `packages/`, `db/migrations/`, `package.json`, etc.
   - Excludes: `node_modules/`, `.git/`, `dist/`, `.next/`

5. **Upload to EC2**
   - SCP tarball to `~/scheduler-deploy.tar.gz`

6. **Deploy scheduler on EC2**
   - Stops PM2 scheduler
   - Backs up current packages
   - Extracts new code
   - Runs `npm install`
   - Applies pending migrations
   - Restarts PM2 scheduler

7. **Verify deployment**
   - Checks PM2 status
   - Verifies scheduler is online

8. **Remove runner IP from security group**
   - Always runs (even on failure)
   - Removes temporary SSH rule

---

## Database Migrations

### Migration Format

Migrations are SQL files in `db/migrations/` with the naming convention:

```
NNN_description.sql
```

Example: `192_stock_hunter_tables.sql`

### Migration Tracking

Applied migrations are tracked in `public.schema_migrations`:

```sql
CREATE TABLE public.schema_migrations (
  version TEXT PRIMARY KEY,
  checksum TEXT,
  applied_at TIMESTAMPTZ DEFAULT NOW()
);
```

### Running Migrations

**Automatic (during deployment):**
Migrations are applied automatically during both GitHub Actions and manual deployments.

**Manual:**
```bash
./scripts/run-migrations.sh          # Dry run - show pending
./scripts/run-migrations.sh --apply  # Apply pending migrations
```

### Migration Script Logic

```bash
# Get applied migrations
APPLIED=$(psql -c "SELECT version FROM schema_migrations")

# For each local migration
for migration in db/migrations/*.sql; do
  if ! echo "$APPLIED" | grep -q "$MIGRATION_NAME"; then
    # Apply migration
    psql -f "$migration"
    # Record as applied
    INSERT INTO schema_migrations (version, checksum) VALUES (...)
  fi
done
```

---

## PM2 Process Management

### Scheduler Configuration

**Ecosystem file:** `/opt/agencio-predict/ecosystem.config.js`

```javascript
module.exports = {
  apps: [{
    name: 'agencio-scheduler',
    script: 'node_modules/.bin/tsx',
    args: '/tmp/scheduler-runner.ts',
    cwd: '/opt/agencio-predict',
    env: {
      NODE_ENV: 'production'
    }
  }]
};
```

### PM2 Commands

```bash
# View processes
pm2 list

# Restart scheduler
pm2 restart agencio-scheduler

# View logs
pm2 logs agencio-scheduler --lines 50

# Monitor
pm2 monit
```

### Scheduler Entry Point

**File:** `/tmp/scheduler-runner.ts`

```typescript
import { config } from 'dotenv';
config({ path: '/opt/agencio-predict/.env.production' });

import('./packages/be/src/scheduler/index.ts')
  .then(() => console.log('[scheduler-runner] Scheduler loaded'))
  .catch((err) => {
    console.error('[scheduler-runner] Load failed:', err);
    process.exit(1);
  });

setInterval(() => {}, 1 << 30); // Keep alive
```

---

## Security

### EC2 Instance Connect

We use EC2 Instance Connect instead of permanent SSH keys:
- Temporary SSH keys pushed via AWS API
- Keys expire after 60 seconds
- No key management needed

### Dynamic Security Group Rules

The deployment workflow:
1. Gets runner's public IP
2. Adds SSH rule for that IP only
3. Removes rule after deployment (always)

This is more secure than:
- Opening SSH to all IPs (0.0.0.0/0)
- Maintaining a list of GitHub Actions IP ranges

### Database Credentials

Database password is stored in:
- `/opt/agencio-predict/.env.production` on EC2
- `scripts/run-migrations.sh` (for migration runner)
- GitHub workflow (for automated migrations)

**Note:** Consider moving to AWS Secrets Manager for production.

---

## Troubleshooting

### Scheduler Not Running

```bash
# Check status
pm2 list

# View error logs
pm2 logs agencio-scheduler --err --lines 50

# Restart
pm2 restart agencio-scheduler
```

### Common Errors

**"Cannot find module tsx"**
```bash
cd /opt/agencio-predict
npm install tsx dotenv
pm2 restart agencio-scheduler
```

**"system-discovery is not a valid UUID"**
- Fixed in `packages/be/src/scheduler/index.ts`
- Uses default `SYSTEM_USER_ID` constant

**SSH Connection Timeout**
- Check security group rules
- Verify EC2 Instance Connect is working
- Re-push SSH key if expired

### Checking Deployment Status

```bash
# GitHub Actions
gh run list --workflow=deploy-ec2.yml --limit=5

# View specific run
gh run view <run-id> --log
```

---

## Monitoring

### Health Checks

The scheduler includes a `platform-health-checks` job that monitors:
- Database connectivity
- Redis connectivity
- External API availability

### Logs

```bash
# PM2 logs
pm2 logs agencio-scheduler

# Log files
/var/log/agencio/scheduler-out-3.log
/var/log/agencio/scheduler-error-3.log
```

---

## Related Documentation

- `docs/29-cicd.md` - Full CI/CD pipeline documentation
- `docs/10-infrastructure-deployment.md` - Infrastructure overview
- `infra/README.md` - Infrastructure configuration
- `infra/aws/scheduler-deployment.md` - AWS-specific details
