# Mailbox Monitoring & Processing

> **Email Intelligence Feature** — Office 365 mailbox monitoring with rule-based processing and LLM entity extraction.

## Overview

The Mailbox Monitoring system allows users to connect their Office 365 mailboxes to automatically extract trading signals from newsletters, analyst reports, market alerts, and other financial communications. It supports both platform-wide admin mailboxes (app-only authentication) and user-connected personal mailboxes (delegated OAuth 2.0).

## Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                    MICROSOFT GRAPH API                          │
│                  (Office 365 Mail Service)                      │
└─────────────────────────────────────────────────────────────────┘
                              │
          ┌───────────────────┴───────────────────┐
          │                                       │
          ▼                                       ▼
┌─────────────────────┐               ┌─────────────────────┐
│   APP-ONLY AUTH     │               │   DELEGATED OAUTH   │
│   (Admin Mailboxes) │               │   (User Mailboxes)  │
│                     │               │                     │
│ • Client credentials│               │ • PKCE flow         │
│ • No user login     │               │ • User consent      │
│ • Platform shared   │               │ • Per-user tokens   │
│ • support@, alerts@ │               │ • personal inbox    │
└─────────────────────┘               └─────────────────────┘
          │                                       │
          └───────────────────┬───────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    MAILBOX SERVICE                               │
│  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────┐  │
│  │   Sync Service   │  │   Rule Engine    │  │  Extractors  │  │
│  │                  │  │                  │  │              │  │
│  │ • Delta queries  │  │ • Condition      │  │ • Ticker     │  │
│  │ • Webhooks       │  │   matching       │  │ • Price      │  │
│  │ • Batch sync     │  │ • Action exec    │  │ • Sentiment  │  │
│  └──────────────────┘  └──────────────────┘  │ • LLM (Claude)│
│                                               └──────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    POSTGRESQL DATABASE                           │
│  • mailbox_connections (OAuth tokens encrypted)                  │
│  • mailbox_rules (conditions + actions as JSONB)                 │
│  • mailbox_processed_emails (processing status)                  │
│  • mailbox_extracted_entities (normalized entities)              │
│  • user_mailbox_settings (per-user preferences)                  │
│  • mailbox_webhook_log (debugging)                               │
└─────────────────────────────────────────────────────────────────┘
```

## Two Authentication Modes

### 1. Admin/Platform Mailboxes (App-Only)

Used for platform-wide shared mailboxes that don't require user login:

- **Authentication**: Client credentials flow (Azure AD app permissions)
- **Permissions**: `Mail.Read` application permission
- **Use cases**: support@, subscriptions@, alerts@, research@
- **Configuration**: Admin configures in `/admin/mailbox`

```typescript
// Example: Admin creates an app-only connection
POST /api/predict/v1/admin/mailbox/connections
{
  "name": "Research Newsletters",
  "emailAddress": "research@company.com",
  "authType": "app_only"
}
```

### 2. User Mailboxes (Delegated OAuth 2.0)

Users connect their own mailboxes via OAuth:

- **Authentication**: OAuth 2.0 with PKCE flow
- **Permissions**: `Mail.Read` delegated permission
- **Token refresh**: Automatic (30-day tokens, refreshed hourly)
- **Configuration**: User configures in `/settings/mailbox`

```typescript
// User initiates OAuth flow
POST /api/predict/v1/mailbox/oauth/start
{
  "returnUrl": "/settings/mailbox"
}
// Returns: { authorizationUrl: "https://login.microsoftonline.com/..." }
```

## Database Schema (Migration 267)

```sql
-- Platform-configured mailboxes (admin + user)
CREATE TABLE predict.mailbox_connections (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name TEXT NOT NULL,
  email_address TEXT NOT NULL UNIQUE,
  auth_type TEXT NOT NULL CHECK (auth_type IN ('app_only', 'delegated')),
  user_id UUID REFERENCES auth.users(id),      -- NULL for app_only
  -- OAuth tokens (AES-256-GCM encrypted)
  access_token_enc TEXT,
  refresh_token_enc TEXT,
  token_expires_at TIMESTAMPTZ,
  -- Graph delta sync
  delta_link TEXT,
  last_sync_at TIMESTAMPTZ,
  -- Webhook subscription
  webhook_subscription_id TEXT,
  webhook_subscription_secret TEXT,
  webhook_expires_at TIMESTAMPTZ,
  -- Status
  status TEXT NOT NULL DEFAULT 'active',
  error_message TEXT,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Processing rules
CREATE TABLE predict.mailbox_rules (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  connection_id UUID NOT NULL REFERENCES predict.mailbox_connections(id),
  name TEXT NOT NULL,
  priority INT NOT NULL DEFAULT 100,
  enabled BOOLEAN NOT NULL DEFAULT TRUE,
  conditions JSONB NOT NULL DEFAULT '[]',
  actions JSONB NOT NULL DEFAULT '[]',
  matched_count INT NOT NULL DEFAULT 0,
  last_matched_at TIMESTAMPTZ,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Processed emails log
CREATE TABLE predict.mailbox_processed_emails (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  connection_id UUID NOT NULL REFERENCES predict.mailbox_connections(id),
  message_id TEXT NOT NULL,
  internet_message_id TEXT,
  from_address TEXT NOT NULL,
  from_name TEXT,
  subject TEXT,
  body_preview TEXT,
  received_at TIMESTAMPTZ NOT NULL,
  has_attachments BOOLEAN DEFAULT FALSE,
  rule_id UUID REFERENCES predict.mailbox_rules(id),
  processing_status TEXT NOT NULL DEFAULT 'pending',
  extracted_data JSONB,
  applied_tags TEXT[] DEFAULT '{}',
  error_message TEXT,
  processed_at TIMESTAMPTZ,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  UNIQUE(connection_id, message_id)
);

-- Extracted entities (normalized for querying)
CREATE TABLE predict.mailbox_extracted_entities (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email_id UUID NOT NULL REFERENCES predict.mailbox_processed_emails(id),
  entity_type TEXT NOT NULL,
  entity_value TEXT NOT NULL,
  confidence DECIMAL(3,2),
  metadata JSONB,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- User mailbox settings
CREATE TABLE predict.user_mailbox_settings (
  user_id UUID PRIMARY KEY REFERENCES auth.users(id),
  auto_process BOOLEAN NOT NULL DEFAULT TRUE,
  extract_tickers BOOLEAN NOT NULL DEFAULT TRUE,
  extract_prices BOOLEAN NOT NULL DEFAULT TRUE,
  extract_sentiment BOOLEAN NOT NULL DEFAULT TRUE,
  extract_dates BOOLEAN NOT NULL DEFAULT TRUE,
  notify_on_extraction BOOLEAN NOT NULL DEFAULT FALSE,
  filter_senders TEXT[] DEFAULT '{}',
  exclude_senders TEXT[] DEFAULT '{}',
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Webhook notification log (debugging)
CREATE TABLE predict.mailbox_webhook_log (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  connection_id UUID REFERENCES predict.mailbox_connections(id),
  subscription_id TEXT,
  change_type TEXT,
  resource TEXT,
  payload JSONB NOT NULL,
  processed BOOLEAN NOT NULL DEFAULT FALSE,
  error_message TEXT,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
```

## Rule Engine

### Condition Types

| Type | Description | Example |
|------|-------------|---------|
| `from_contains` | Sender address contains | `newsletter` |
| `from_domain` | Sender domain matches | `@seekingalpha.com` |
| `subject_contains` | Subject line contains | `price target` |
| `subject_regex` | Subject matches regex | `AAPL\|MSFT\|GOOGL` |
| `body_contains` | Body text contains | `buy rating` |
| `has_attachment` | Has attachments | `true` |
| `attachment_type` | Attachment MIME type | `application/pdf` |

### Action Types

| Action | Description | Config |
|--------|-------------|--------|
| `extract_entities` | Run LLM extraction | `{ extractTickers, extractPrices, extractSentiment, extractDates }` |
| `forward_to` | Forward email | `{ address, includeOriginal }` |
| `create_alert` | Create platform alert | `{ alertType, priority }` |
| `add_to_watchlist` | Add extracted tickers to watchlist | `{ watchlistId }` |
| `tag` | Apply tags | `{ tags: ["research", "bullish"] }` |
| `skip` | Skip processing | — |

### Rule Evaluation

Rules are evaluated in priority order (lower number = higher priority). First matching rule's actions are executed:

```typescript
// Example rule configuration
{
  "name": "Seeking Alpha Research",
  "priority": 10,
  "conditions": [
    { "type": "from_domain", "value": "@seekingalpha.com" },
    { "type": "subject_contains", "value": "price target" }
  ],
  "actions": [
    {
      "type": "extract_entities",
      "config": {
        "extractTickers": true,
        "extractPrices": true,
        "extractSentiment": true
      }
    },
    { "type": "tag", "config": { "tags": ["research", "seeking-alpha"] } }
  ]
}
```

## Entity Extraction

### Pattern-Based Extractors

**Ticker Extractor** (`extractors/ticker-extractor.ts`):
- Matches common patterns: `$AAPL`, `NASDAQ: AAPL`, `(AAPL)`
- Validates against known symbol lists
- Confidence scoring based on context

**Price Extractor** (`extractors/price-extractor.ts`):
- Detects price targets: "price target of $150"
- Identifies analyst actions: "upgraded to buy"
- Extracts current vs target price comparisons

### LLM Extractor (`extractors/llm-extractor.ts`)

Uses Claude for comprehensive entity extraction with structured output:

```typescript
interface ExtractedData {
  tickers?: Array<{
    symbol: string;
    confidence: number;
    context?: string;
  }>;
  priceTargets?: Array<{
    symbol: string;
    current?: number;
    target: number;
    action?: 'buy' | 'sell' | 'hold';
    analyst?: string;
    confidence: number;
  }>;
  sentiment?: {
    overall: 'bullish' | 'bearish' | 'neutral';
    score: number; // -1 to 1
    confidence: number;
  };
  dates?: Array<{
    date: string;
    eventType?: string;
    description?: string;
    confidence: number;
  }>;
}
```

**Billing**: Uses `llmCallWithBilling(userId, ...)` — charged against user's AI credits or BYOK API key.

## Content Analysis

The system provides aggregated analysis across processed emails:

### Theme Detection
- Identifies common themes across multiple emails
- Groups similar topics and tracks frequency
- Detects emerging trends

### Consensus Detection
- When 60%+ of sources agree on a topic, marks as "consensus"
- Shows agreement level and contributing sources

### Divergence Alerts
- Identifies when sources disagree on the same topic
- Highlights contrarian views
- Useful for detecting uncertainty or controversy

## API Endpoints

### Admin Endpoints (require admin role)

| Method | Path | Description |
|--------|------|-------------|
| GET | `/admin/mailbox/connections` | List all connections |
| POST | `/admin/mailbox/connections` | Create app-only connection |
| GET | `/admin/mailbox/connections/:id` | Get connection details |
| PUT | `/admin/mailbox/connections/:id` | Update connection |
| DELETE | `/admin/mailbox/connections/:id` | Delete connection |
| POST | `/admin/mailbox/connections/:id/test` | Test connection |
| POST | `/admin/mailbox/connections/:id/sync` | Trigger manual sync |
| POST | `/admin/mailbox/connections/:id/pause` | Pause connection |
| POST | `/admin/mailbox/connections/:id/resume` | Resume connection |
| GET | `/admin/mailbox/rules` | List all rules |
| POST | `/admin/mailbox/rules` | Create rule |
| GET | `/admin/mailbox/rules/:id` | Get rule |
| PUT | `/admin/mailbox/rules/:id` | Update rule |
| DELETE | `/admin/mailbox/rules/:id` | Delete rule |
| GET | `/admin/mailbox/emails` | List processed emails |
| POST | `/admin/mailbox/emails/:id/reprocess` | Reprocess email |
| GET | `/admin/mailbox/stats` | Processing statistics |
| GET | `/admin/mailbox/webhook-log` | Webhook debug log |
| POST | `/admin/mailbox/analyze` | Analyze content themes |

### User Endpoints (require auth)

| Method | Path | Description |
|--------|------|-------------|
| GET | `/mailbox/connection` | Get user's connection |
| POST | `/mailbox/oauth/start` | Start OAuth flow |
| GET | `/mailbox/oauth/callback` | OAuth callback |
| DELETE | `/mailbox/connection` | Disconnect mailbox |
| GET | `/mailbox/settings` | Get processing settings |
| PUT | `/mailbox/settings` | Update settings |
| GET | `/mailbox/emails` | View processed emails |
| GET | `/mailbox/entities` | View extracted entities |
| POST | `/mailbox/sync` | Trigger manual sync |

### Webhook Endpoint (public, validated)

| Method | Path | Description |
|--------|------|-------------|
| POST | `/mailbox/webhook` | Graph webhook notifications |

## Scheduler Jobs

```typescript
// packages/be/src/scheduler/index.ts

// Poll mailboxes without webhooks (fallback, every 5 min)
registerJob('mailbox-poll-sync', 5 * 60 * 1000, mailboxPollSync);

// Process pending emails (every 5 min)
registerJob('mailbox-process-pending', 5 * 60 * 1000, mailboxProcessPending);

// Retry failed processing (every 15 min)
registerJob('mailbox-retry-failed', 15 * 60 * 1000, mailboxRetryFailed);

// Refresh webhook subscriptions (every 6 hours)
registerJob('mailbox-webhook-refresh', 6 * 60 * 60 * 1000, mailboxWebhookRefresh);

// Refresh OAuth tokens (every hour)
registerJob('mailbox-token-refresh', 60 * 60 * 1000, mailboxTokenRefresh);
```

## UI Components

### Admin UI (`/admin/mailbox`)

5-tab interface:

1. **Connections Tab**
   - Table: name, email, auth type, status, last sync, email count
   - Add Connection modal
   - Test/Sync/Pause/Resume/Delete actions
   - Status badges (active/paused/error)

2. **Rules Tab**
   - Table: name, connection, conditions summary, actions, match count
   - Rule builder modal with condition/action editors
   - Enable/disable toggle
   - Priority reorder

3. **Processing Tab**
   - Recent processed emails table
   - Filter by connection, status, date
   - View extracted entities
   - Reprocess button

4. **Analysis Tab**
   - Theme analysis with consensus detection
   - Top themes grouped by category
   - Divergence alerts
   - Ticker insights aggregation

5. **Stats Tab**
   - Emails processed today/week/month
   - Extraction success rate
   - Top senders
   - Entity type breakdown

### User UI (`/settings/mailbox`)

4-tab interface:

1. **Connection Tab**
   - Connect/Disconnect button
   - OAuth status indicator
   - Last sync time
   - Token expiry warning

2. **Settings Tab**
   - Toggle: Auto-process new emails
   - Toggle: Extract stock tickers
   - Toggle: Extract price targets
   - Toggle: Extract sentiment
   - Toggle: Extract dates
   - Toggle: Notify on extraction
   - Sender filters (include/exclude)

3. **Emails Tab**
   - Processed emails list
   - Expandable details with extracted data
   - Status indicators

4. **Entities Tab**
   - Filter by entity type
   - Grouped display
   - Confidence scores

## Environment Variables

```bash
# Azure AD App Registration
AZURE_CLIENT_ID=                    # Application (client) ID
AZURE_CLIENT_SECRET=                # Client secret (for app-only)
AZURE_TENANT_ID=                    # Directory (tenant) ID

# OAuth redirect (delegated auth)
MAILBOX_OAUTH_REDIRECT_URI=https://predict.agencio.cloud/api/predict/v1/mailbox/oauth/callback

# Webhook endpoint (must be publicly accessible)
MAILBOX_WEBHOOK_URL=https://predict.agencio.cloud/api/predict/v1/mailbox/webhook
```

## Security Considerations

1. **Token Encryption**: All OAuth tokens encrypted with AES-256-GCM (existing `brokers/crypto.ts` pattern)
2. **Webhook Validation**: Graph webhooks validated using subscription secret
3. **Scope Minimization**: Only `Mail.Read` permission requested
4. **Rate Limiting**: Respects Graph API limits (10,000 requests/10 minutes)
5. **PII Handling**: Email content not logged, only metadata for debugging
6. **Credential Isolation**: User tokens only accessible to that user

## Graph API Details

### Delta Queries

Used for efficient incremental sync:

```typescript
// Initial sync - get all messages
GET /users/{email}/mailFolders/Inbox/messages/delta

// Subsequent syncs - only changes since last deltaLink
GET {deltaLink}
```

### Webhook Subscriptions

Real-time notifications for new mail:

```typescript
POST /subscriptions
{
  "changeType": "created",
  "resource": "/users/{email}/mailFolders/Inbox/messages",
  "notificationUrl": "https://predict.agencio.cloud/api/predict/v1/mailbox/webhook",
  "expirationDateTime": "2026-05-24T00:00:00Z",  // Max 3 days for mail
  "clientState": "{secret}"
}
```

**Important**: Mail subscriptions expire after 3 days maximum. The `mailbox-webhook-refresh` scheduler job renews subscriptions 24 hours before expiry.

## Files

| Path | Description |
|------|-------------|
| `db/migrations/267_mailbox_monitoring.sql` | Database schema |
| `packages/be/src/mailbox/types.ts` | TypeScript interfaces |
| `packages/be/src/mailbox/repository.ts` | Database CRUD |
| `packages/be/src/mailbox/graph-client.ts` | Microsoft Graph API client |
| `packages/be/src/mailbox/oauth.ts` | OAuth 2.0 PKCE flow |
| `packages/be/src/mailbox/sync-service.ts` | Delta sync logic |
| `packages/be/src/mailbox/webhook-handler.ts` | Webhook processing |
| `packages/be/src/mailbox/rule-engine.ts` | Condition matching + actions |
| `packages/be/src/mailbox/extractors/*.ts` | Entity extraction modules |
| `packages/be/src/mailbox/service.ts` | Main orchestration |
| `packages/be/src/api/predict/v1/mailbox/handlers.ts` | User API handlers |
| `packages/be/src/api/predict/v1/admin/mailbox/handlers.ts` | Admin API handlers |
| `apps/web/src/app/admin/mailbox/page.tsx` | Admin UI |
| `apps/web/src/app/settings/mailbox/page.tsx` | User settings UI |

## Troubleshooting

### Common Issues

**"Invalid client" error during OAuth**
- Verify `AZURE_CLIENT_ID` and `AZURE_CLIENT_SECRET` are correct
- Ensure redirect URI matches exactly in Azure AD app registration

**Webhooks not receiving notifications**
- Verify `MAILBOX_WEBHOOK_URL` is publicly accessible (not localhost)
- Check webhook subscription status in Graph Explorer
- Review `mailbox_webhook_log` table for debugging

**Token refresh failing**
- Tokens may have been revoked by user
- Check `token_expires_at` and `error_message` in connection record
- User may need to re-authenticate

**No entities extracted**
- Check if extraction is enabled in user settings
- Verify LLM credits are available (BYOK or subscription)
- Review rule conditions - may be too restrictive

### Debugging

```sql
-- Check connection status
SELECT id, name, email_address, status, error_message, last_sync_at
FROM predict.mailbox_connections
WHERE status = 'error';

-- View recent webhook activity
SELECT * FROM predict.mailbox_webhook_log
ORDER BY created_at DESC
LIMIT 20;

-- Check processing failures
SELECT id, from_address, subject, processing_status, error_message
FROM predict.mailbox_processed_emails
WHERE processing_status = 'error'
ORDER BY created_at DESC
LIMIT 20;

-- Entity extraction stats
SELECT entity_type, COUNT(*) as count
FROM predict.mailbox_extracted_entities
GROUP BY entity_type;
```

---

## Phase 4-5: Subscriber Insights & DSL Trading Signals

> **Shipped 2026-05-21** — Aggregates extracted entities into visual heat maps, sentiment gauges, consensus tracking, and 8 DSL primitives for algorithmic trading.

### Architecture

```
mailbox_extracted_entities (raw)
        ↓
mailbox_topic_hourly (15-min aggregation)
        ↓
mailbox_consensus_snapshots (hourly)
        ↓
mailbox_insights_cache (6-hour refresh)
        ↓
DSL Primitives → Algorithm Builder / Paper / Live Trading
        ↓
Heat Maps + Sentiment Gauges (UI)
```

### Database Schema (Migration 268)

```sql
-- Hourly topic aggregation from extracted entities
CREATE TABLE predict.mailbox_topic_hourly (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  topic TEXT NOT NULL,                           -- ticker symbol or keyword
  topic_type TEXT NOT NULL CHECK (topic_type IN ('ticker', 'keyword', 'theme')),
  hour_bucket TIMESTAMPTZ NOT NULL,
  avg_sentiment DECIMAL(4,3),                    -- -1.0 to +1.0
  bullish_count INT NOT NULL DEFAULT 0,
  bearish_count INT NOT NULL DEFAULT 0,
  neutral_count INT NOT NULL DEFAULT 0,
  mention_count INT NOT NULL DEFAULT 0,
  email_count INT NOT NULL DEFAULT 0,
  sender_count INT NOT NULL DEFAULT 0,
  top_senders TEXT[],
  sample_subjects TEXT[],
  computed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  UNIQUE(topic, topic_type, hour_bucket)
);

-- Per-sender statistics (for credibility weighting)
CREATE TABLE predict.mailbox_sender_metrics (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  sender_domain TEXT NOT NULL UNIQUE,
  sender_name TEXT,
  total_mentions INT NOT NULL DEFAULT 0,
  bullish_mentions INT NOT NULL DEFAULT 0,
  bearish_mentions INT NOT NULL DEFAULT 0,
  avg_sentiment DECIMAL(4,3),
  sentiment_volatility DECIMAL(4,3),
  email_count INT NOT NULL DEFAULT 0,
  last_email_at TIMESTAMPTZ,
  first_email_at TIMESTAMPTZ,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Consensus/divergence tracking per ticker
CREATE TABLE predict.mailbox_consensus_snapshots (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  symbol TEXT NOT NULL,
  snapshot_date DATE NOT NULL,
  bullish_senders INT NOT NULL DEFAULT 0,
  bearish_senders INT NOT NULL DEFAULT 0,
  neutral_senders INT NOT NULL DEFAULT 0,
  total_senders INT NOT NULL DEFAULT 0,
  consensus_score DECIMAL(4,3),                  -- 0=divergent, 1=unanimous
  directional_bias DECIMAL(4,3),                 -- -1=bearish, +1=bullish
  confidence DECIMAL(3,2),
  bullish_sources TEXT[],
  bearish_sources TEXT[],
  computed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  UNIQUE(symbol, snapshot_date)
);

-- Daily composite insights (for DSL caching)
CREATE TABLE predict.mailbox_insights_cache (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  symbol TEXT NOT NULL,
  cache_date DATE NOT NULL,
  sentiment_7d DECIMAL(4,3),
  sentiment_trend DECIMAL(4,3),
  mention_frequency INT,
  bullish_ratio DECIMAL(4,3),
  sender_diversity INT,
  consensus_score DECIMAL(4,3),
  inputs_hash TEXT,
  valid_until TIMESTAMPTZ,
  computed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  UNIQUE(symbol, cache_date)
);
```

### 8 DSL Primitives

Wired to Algorithm Builder (backtest, paper, live). Backtest uses stubbed neutral values; paper/live uses real-time data via tick-context prefetch.

| Primitive | Category | Returns | Description |
|-----------|----------|---------|-------------|
| `mailbox_sentiment(symbol)` | sentiment | number\|null | 7-day weighted avg sentiment from emails (-1 to +1) |
| `mailbox_mention_frequency(symbol)` | sentiment | number | Average daily mentions over 7 days |
| `mailbox_bullish_ratio(symbol)` | sentiment | number\|null | Ratio of bullish to total mentions (0-1) |
| `mailbox_consensus_score(symbol)` | sentiment | number\|null | Sender agreement score (0=divergent, 1=unanimous) |
| `mailbox_sender_diversity(symbol)` | sentiment | number | Unique sender domains mentioning symbol |
| `mailbox_sentiment_trend(symbol)` | sentiment | number\|null | Sentiment change vs prior 7d period |
| `mailbox_has_bullish_consensus(symbol)` | sentiment | boolean | True if >70% senders bullish with consensus >0.6 |
| `mailbox_has_bearish_consensus(symbol)` | sentiment | boolean | True if >70% senders bearish with consensus >0.6 |

**Example DSL Usage:**

```yaml
entry:
  when: and(
    gt(mailbox_sentiment('NVDA'), 0.5),
    gt(mailbox_consensus_score('NVDA'), 0.7),
    mailbox_has_bullish_consensus('NVDA')
  )
```

### Scheduler Jobs

```typescript
// Topic aggregation — every 15 minutes
registerJob('mailbox-topic-aggregation', 15 * 60 * 1000, runMailboxTopicAggregation);

// Consensus snapshots — every hour
registerJob('mailbox-consensus-compute', 60 * 60 * 1000, runMailboxConsensusComputation);

// DSL cache refresh — every 6 hours
registerJob('mailbox-insights-cache', 6 * 60 * 60 * 1000, runMailboxInsightsCacheRefresh);
```

### API Endpoints (Insights)

| Method | Path | Description |
|--------|------|-------------|
| GET | `/admin/mailbox/insights` | Full insights dashboard data |
| GET | `/admin/mailbox/insights/heatmap` | Topic frequency heat map |
| GET | `/admin/mailbox/insights/sentiment` | Sentiment gauges by topic |
| GET | `/admin/mailbox/insights/consensus` | Consensus/divergence table |
| GET | `/admin/mailbox/insights/senders` | Sender credibility metrics |
| POST | `/admin/mailbox/insights/refresh` | Trigger manual aggregation |

### UI Components

**Admin UI (`/admin/mailbox`) — Insights Tab (6th tab):**

1. **Topic Heat Map** — Grid of topics color-coded by sentiment (green=bullish, red=bearish)
2. **Sentiment Gauges** — Visual gauges for top topics showing -1 to +1 scale with trend arrows
3. **Consensus/Divergence Table** — Per-symbol consensus scores with expandable sender breakdown
4. **Sender Credibility Table** — Sender domains with email count, avg sentiment, volatility

All components support dark mode (`dark:` Tailwind prefixes) and mobile responsiveness.

### Files

| Path | Description |
|------|-------------|
| `db/migrations/268_subscriber_insights.sql` | Schema for insights tables |
| `packages/be/src/mailbox/insights/types.ts` | TypeScript interfaces |
| `packages/be/src/mailbox/insights/repository.ts` | Database CRUD |
| `packages/be/src/mailbox/insights/aggregator.ts` | Topic hourly aggregation |
| `packages/be/src/mailbox/insights/consensus-engine.ts` | Consensus computation |
| `packages/be/src/mailbox/insights/dsl-wrappers.ts` | DSL primitive implementations |
| `packages/be/src/mailbox/insights/service.ts` | Main insights service |
| `packages/be/src/api/predict/v1/admin/mailbox/insights-handlers.ts` | API handlers |
| `apps/web/src/app/admin/mailbox/page.tsx` | Insights tab UI |

### Integration Points

- **Backtest Engine** (`algorithms/backtest/engine.ts`):
  - Pre-fetches historical mailbox data for the entire backtest date range via `prefetchMailboxInsightsForBacktest()`
  - Stores in `MailboxBacktestCache` (keyed by "YYYY-MM-DD:SYMBOL") for O(1) per-bar lookup
  - Uses `getMailboxFromBacktestCache()` for sync access during bar iteration
  - Returns NaN-safe values (null/0/false) when no historical data exists
- **Tick Context** (`algorithms/paper/tick-context.ts`): Prefetches mailbox insights for paper/live trading
- **Executor** (`algorithms/paper/executor.ts`): Binds mailbox primitives to EvalContext for algorithm execution
- **Multi-Symbol Support**: For strategies with multiple symbols in `universe`, mailbox data is prefetched for all symbols using `prefetchMailboxInsightsMulti()`

### Multi-Symbol Strategy Example

```yaml
universe: ['AAPL', 'MSFT', 'NVDA', 'GOOGL']
entry:
  when: and(
    gt(mailbox_sender_diversity(symbol), 0),  # Ensure data exists for this symbol
    gt(mailbox_sentiment(symbol), 0.3),       # Works for any symbol in universe
    mailbox_has_bullish_consensus(symbol)
  )
```

### Safety: NaN for Missing Data

When no mailbox data exists for a symbol, primitives return `NaN` instead of misleading defaults (like 0 for sentiment). This prevents false signals:

- `gt(NaN, 0.5)` → `false` (no false positive)
- `lt(NaN, -0.5)` → `false` (no false positive)

**Best Practice:** Always check `mailbox_sender_diversity > 0` before using other mailbox primitives to confirm data exists.

### Backtest Historical Data Support

The backtest engine now supports actual historical mailbox data lookups instead of stubs:

**How it works:**

1. Before the bar loop, `prefetchMailboxInsightsForBacktest()` is called with the backtest date range
2. For each date and symbol, historical data is fetched from:
   - `mailbox_insights_cache` table (cached daily metrics)
   - `mailbox_consensus_snapshots` table (consensus data)
   - Fallback: computed from `mailbox_topic_hourly` if no cache exists
3. Data is stored in a `MailboxBacktestCache` Map (keyed by "YYYY-MM-DD:SYMBOL")
4. During bar iteration, `getMailboxFromBacktestCache()` provides O(1) sync lookups

**Data sources for backtest:**

| Data Source | Query Function | Fallback |
|-------------|----------------|----------|
| `mailbox_insights_cache` | `repo.getInsightsCache(symbol, date)` | Compute from hourly |
| `mailbox_consensus_snapshots` | `repo.getConsensusSnapshot(symbol, date)` | Compute from hourly |
| `mailbox_topic_hourly` | `repo.getTopicHourlyByRange()` | Returns null (no data) |

**Limitations:**

- Historical data only exists from when the mailbox monitoring feature was deployed
- For dates before feature deployment, primitives return NaN-safe defaults (null/0/false)
- Backtest with mailbox signals is only meaningful if email data was collected during that period

**Example backtest with mailbox signals:**

```typescript
// Backtest a strategy that uses email sentiment
const result = await runAlgorithmBacktest(ast, {
  symbol: 'NVDA',
  startDate: '2026-03-01',
  endDate: '2026-05-01',
  userId: 'user-123',
});

// Mailbox data is automatically pre-fetched for the date range
// DSL primitives like mailbox_sentiment('NVDA') return historical values
```

---

## Phase 6: Timeline, Trending & Auto-Algorithm

> **Shipped 2026-05-21** — Timeline visualization showing topics "bubbling up" over time, trending topic detection, convert-to-algorithm, and auto-backtest functionality.

### Architecture

```
mailbox_topic_hourly (existing)
        ↓
Timeline Service (period-based clustering)
        ↓
┌───────────────────────────────────────────────┐
│  Timeline Heatmap    │   Trending Detection  │
│  (day/week/month)    │   (activity change)   │
└───────────────────────────────────────────────┘
        ↓                       ↓
        └───────────┬───────────┘
                    ↓
┌───────────────────────────────────────────────┐
│          Algorithm Generator                   │
│  Topic → DSL with mailbox primitives          │
│  30+ topic mappings (tech, banking, crypto)   │
└───────────────────────────────────────────────┘
                    ↓
┌───────────────────────────────────────────────┐
│          Auto-Backtest Service                │
│  • Configurable period (90 days - 5 years)   │
│  • Asset class detection                      │
│  • Sharpe/drawdown evaluation                 │
│  • Pass/flag/fail classification             │
└───────────────────────────────────────────────┘
                    ↓
            Save to Algorithm Library
```

### Key Features

**Timeline Heatmap:**
- Period-based visualization (day/week/month selectors)
- Color-coded cells by sentiment (green=bullish, yellow=neutral, red=bearish)
- Cell intensity based on mention frequency
- Trend indicators per topic (rising/falling/stable/volatile)
- Click to drill down to underlying emails

**Trending Topics Detection:**
- Compares recent vs prior period activity
- Identifies significant changes (rising/falling)
- Calculates confidence based on volume and sender diversity
- Shows sentiment, mentions, and sender count per topic

**Convert to Algorithm:**
- 30+ topic-to-symbol mappings (tech, AI, banking, crypto, energy, healthcare, etc.)
- Generates valid DSL with proper structure:
  - `entry: { when: Expression, direction: 'long'|'short' }`
  - `exit: { when: Expression }`
  - `signals: Record<string, Expression[]>`
- Uses mailbox primitives (sentiment, consensus, bullish_ratio, etc.)
- Dynamic asset class detection (stock, ETF, crypto)

**Auto-Backtest:**
- Configurable lookback period: 90 days to 5+ years
- Preset buttons: 3M, 6M, 1Y, 2Y, 3Y, 5Y
- Default: 1 year for robust validation
- Evaluation thresholds:
  - Minimum Sharpe ratio (default: 1.0)
  - Maximum drawdown (default: 15%)
  - Minimum trades (3+)
  - Positive return required
  - Win rate > 35%
- Classification: Passed / Flagged for Review / Failed
- One-click save to algorithm library

### API Endpoints (Phase 6)

| Method | Path | Description |
|--------|------|-------------|
| GET | `/admin/mailbox/insights/timeline` | Timeline data for heatmap with period-based clusters |
| GET | `/admin/mailbox/insights/trending` | Get trending topics (most activity change) |
| POST | `/admin/mailbox/insights/convert-to-algorithm` | Convert trending topic to DSL algorithm |
| POST | `/admin/mailbox/insights/auto-backtest` | Run auto-backtest on generated algorithm |
| POST | `/admin/mailbox/insights/process-trending` | Batch process: detect trending → generate algos → backtest |

### Request/Response Examples

**GET /admin/mailbox/insights/timeline?periodType=week&lookbackDays=30**

```json
{
  "data": {
    "clusters": [
      {
        "topic": { "id": "NVDA", "name": "NVDA", "type": "ticker" },
        "dataPoints": [
          { "period": "2026-05-14", "intensity": 0.8, "mentions": 45, "sentiment": 0.65, "trending": "up" }
        ],
        "overallTrend": "rising",
        "peakPeriod": "2026-05-19",
        "totalMentions": 210,
        "avgSentiment": 0.58
      }
    ],
    "periods": ["2026-04-21", "2026-04-28", "2026-05-05", "2026-05-12", "2026-05-19"],
    "periodType": "week",
    "maxIntensity": 1.0,
    "generatedAt": "2026-05-21T10:30:00Z"
  }
}
```

**POST /admin/mailbox/insights/convert-to-algorithm**

Request:
```json
{
  "topicId": "NVDA",
  "topicName": "NVDA",
  "topicType": "ticker",
  "direction": "bullish",
  "confidence": 0.75,
  "sentiment": 0.65,
  "recentMentions": 45,
  "senderCount": 12
}
```

Response:
```json
{
  "data": {
    "topicId": "NVDA",
    "direction": "bullish",
    "dsl": {
      "strategy": "mailbox_signal_nvda_bullish",
      "mode": "paper",
      "horizon": "swing",
      "universe": ["NVDA"],
      "asset_class": "stock",
      "risk": { "max_position_usd": 10000, "stop_loss_pct": 0.05 },
      "signals": {
        "mailbox_sentiment": [{ "type": "call", "name": "mailbox_sentiment", "args": [...] }]
      },
      "entry": {
        "when": { "type": "binop", "op": "&&", ... },
        "direction": "long"
      },
      "exit": {
        "when": { "type": "binop", "op": "||", ... }
      }
    },
    "suggestedName": "Mailbox Signal: NVDA Long",
    "description": "Auto-generated strategy based on mailbox sentiment signals for NVDA...",
    "confidence": 0.75,
    "symbols": ["NVDA"]
  }
}
```

**POST /admin/mailbox/insights/auto-backtest**

Request:
```json
{
  "dsl": { /* StrategyAST */ },
  "config": {
    "lookbackDays": 365,
    "minSharpe": 1.0,
    "maxDrawdown": 0.15,
    "autoSave": false
  }
}
```

Response:
```json
{
  "data": {
    "topicId": "nvda",
    "direction": "bullish",
    "dsl": { /* original DSL */ },
    "backtestMetrics": {
      "sharpe": 1.45,
      "maxDrawdown": 0.12,
      "totalReturn": 0.28,
      "winRate": 0.58,
      "tradeCount": 24
    },
    "passed": true,
    "flaggedForReview": false,
    "suggestedName": "mailbox_signal_nvda_bullish"
  }
}
```

### Topic Mappings

The algorithm generator includes 30+ topic-to-symbol mappings:

| Category | Topics | Example Symbols |
|----------|--------|-----------------|
| Technology | tech, ai, semiconductor, software, cloud | QQQ, NVDA, MSFT, AMD |
| Financials | banking, fintech | XLF, JPM, PYPL, V |
| Energy | energy, oil, cleanenergy | XLE, XOM, ICLN |
| Healthcare | healthcare, biotech, pharma | XLV, XBI, PFE |
| Consumer | retail, consumer, ev | XRT, TSLA, AMZN |
| Crypto | bitcoin, ethereum, crypto | BTC, ETH, COIN |
| Macro | inflation, rates, gold | TIP, TLT, GLD |
| Indices | sp500, nasdaq, smallcap | SPY, QQQ, IWM |

### UI Components (Phase 6)

**MailboxTimelineHeatmap** (`packages/fe/src/components/mailbox/MailboxTimelineHeatmap.tsx`):
- Period selector (day/week/month buttons)
- Grid table with topics as rows, periods as columns
- Color-coded cells based on sentiment
- Mention counts displayed in cells
- Trend indicators per topic
- Legend showing color meanings
- Generated timestamp

**TrendingMailboxTopics** (`packages/fe/src/components/mailbox/TrendingMailboxTopics.tsx`):
- Card per trending topic
- Rising/falling trend indicators with magnitude
- Sentiment value with color coding
- Confidence badge (High/Medium/Low)
- Stats: mentions, senders
- Action buttons: "Convert to Algorithm", "Auto-Backtest"
- Refresh button

**AutoAlgorithmModal** (`packages/fe/src/components/mailbox/AutoAlgorithmModal.tsx`):
- Algorithm info card (name, direction badge, description, symbols)
- Collapsible DSL preview (JSON formatted)
- Backtest configuration:
  - Lookback period presets (3M, 6M, 1Y, 2Y, 3Y, 5Y)
  - Custom days input (30-2555)
  - Min Sharpe ratio
  - Max drawdown percentage
  - Auto-save toggle
- Backtest results display:
  - Pass/Flagged/Fail status with color coding
  - Metrics: Sharpe, Drawdown, Return, Win Rate, Trades
  - Color-coded pass/fail indicators per metric
- Action buttons: Cancel, Run Backtest, Save Algorithm

**Updated InsightsTab** (`apps/web/src/app/admin/mailbox/page.tsx`):
- "Process All Trending" bulk action button
- Timeline Heatmap section at top
- Trending Topics panel below
- Existing heat map, gauges, consensus, sender tables
- AutoAlgorithmModal integration

### Files (Phase 6)

| Path | Description |
|------|-------------|
| `packages/be/src/mailbox/insights/timeline-service.ts` | Timeline data aggregation & trending detection |
| `packages/be/src/mailbox/insights/algorithm-generator.ts` | DSL generation from topics (30+ mappings) |
| `packages/be/src/mailbox/insights/auto-backtest.ts` | Auto-backtest service using backtest engine |
| `packages/be/src/api/predict/v1/admin/mailbox/insights-handlers.ts` | Updated with 5 new handlers |
| `apps/web/src/app/api/predict/v1/admin/mailbox/insights/timeline/route.ts` | Timeline endpoint |
| `apps/web/src/app/api/predict/v1/admin/mailbox/insights/trending/route.ts` | Trending endpoint |
| `apps/web/src/app/api/predict/v1/admin/mailbox/insights/convert-to-algorithm/route.ts` | Conversion endpoint |
| `apps/web/src/app/api/predict/v1/admin/mailbox/insights/auto-backtest/route.ts` | Backtest endpoint |
| `apps/web/src/app/api/predict/v1/admin/mailbox/insights/process-trending/route.ts` | Batch process endpoint |
| `packages/fe/src/components/mailbox/MailboxTimelineHeatmap.tsx` | Timeline heatmap component |
| `packages/fe/src/components/mailbox/TrendingMailboxTopics.tsx` | Trending list component |
| `packages/fe/src/components/mailbox/AutoAlgorithmModal.tsx` | Algorithm preview & backtest modal |
| `packages/fe/src/components/mailbox/index.ts` | Component exports |

### DSL Structure Requirements

Generated algorithms follow the `StrategyAST` specification:

```typescript
{
  strategy: string;           // Unique name
  mode: 'paper';              // Always starts as paper
  horizon: 'swing';           // Swing trading horizon
  universe: string[];         // Target symbols
  asset_class: 'stock' | 'etf' | 'crypto';  // Auto-detected
  risk: {
    max_position_usd: number;
    stop_loss_pct: number;
    trailing_stop_pct: number;
    max_daily_loss_pct: number;
  };
  signals: Record<string, Expression[]>;  // MUST be arrays
  entry: {
    when: Expression;         // MUST have 'when' wrapper
    direction: 'long' | 'short';
  };
  exit: {
    when: Expression;         // MUST have 'when' wrapper
  };
}
```

### Validation Best Practices

1. **Use longer backtest periods** (1-5 years) for robust validation
2. **Check trade count** - strategies with < 5 trades may not be statistically significant
3. **Review flagged algorithms** - may have potential despite not passing thresholds
4. **Verify asset class** - crypto/ETF have different characteristics than stocks
5. **Consider market conditions** - backtest period should include different regimes
