Integration Flow Diagrams
Agencio Predict Data Pipeline & Integration Architecture
1. System Overview
Agencio Predict operates a three-stage data pipeline: Ingestion (fetch raw data), Enrichment (sentiment, entities), and Synthesis (divergence, composite scores).
High-Level Architecture
flowchart TB
subgraph Sources["External Data Sources (40+ APIs)"]
PM["Prediction Markets\n(Polymarket, Kalshi, Metaculus)"]
PRICE["Price Data\n(Yahoo, Finnhub, Polygon, CoinGecko)"]
NEWS["News Sources\n(Finnhub, NewsAPI, Guardian, NYT, GDELT)"]
SOCIAL["Social Platforms\n(Reddit, Twitter, Discord, Telegram, Bluesky)"]
DERIV["Derivatives & Macro\n(VIX, FRED, Binance Funding, Coinglass)"]
end
subgraph Scheduler["Scheduler (PM2)"]
S15["15-min Jobs"]
S60["60-min Jobs"]
SDAY["Daily Jobs"]
end
subgraph Ingestion["Stage 1: Ingestion"]
FETCH["Fetch Raw Data"]
DEDUP["Deduplicate"]
STORE1["Store Raw"]
end
subgraph Enrichment["Stage 2: Enrichment"]
LEX["Lexicon Sentiment\n(regex, fast, free)"]
LLM["LLM Sentiment\n(Claude Haiku, selective)"]
ENT["Entity Extraction\n(tickers, companies, people)"]
GEO["Geopolitical Tagging"]
end
subgraph Synthesis["Stage 3: Synthesis"]
ROLL["Sentiment Rollup\n(48h window)"]
DIV["Divergence Engine\n(human vs bot)"]
COMP["Composite Score\n(weighted average)"]
PAT["Pattern Detection"]
end
subgraph Storage["Storage Layer"]
NA["news_archive"]
SP["social_posts"]
SH["sentiment_hourly"]
CI["computed_insights"]
PE["prediction_events"]
MPH["market_price_history"]
end
subgraph Consumers["Consumers"]
DSL["DSL Evaluator\n(Algorithm Builder)"]
UI["Dashboard / Console"]
API["API Endpoints"]
ALERT["Alerts & Notifications"]
end
Sources --> Scheduler
Scheduler --> Ingestion
Ingestion --> Enrichment
Enrichment --> Storage
Storage --> Synthesis
Synthesis --> CI
CI --> Consumers
Storage --> Consumers
2. Data Source Categories
Integration Source Map
flowchart LR
subgraph PredictionMarkets["Prediction Markets"]
POLY["Polymarket\nGamma API (public)"]
KALSHI["Kalshi\nOptional API key"]
META["Metaculus\nMETACULUS_TOKEN"]
PREDIT["PredictIt\n(BROKEN - Cloudflare)"]
end
subgraph PriceData["Price & Market Data"]
YAHOO["Yahoo Finance\nVIX, equities, crypto"]
FINN["Finnhub\n60 calls/min free"]
POLY_IO["Polygon.io\nEquity ticks, FX OHLC"]
CG["CoinGecko\nCrypto OHLC"]
FRANK["Frankfurter\nFX spot rates"]
ALPACA["Alpaca\nStocks, crypto"]
BINANCE["Binance\nSpot, futures, L2"]
DERIBIT["Deribit\nCrypto derivatives"]
end
subgraph NewsSources["News Sources"]
FINN_N["Finnhub News\n4 categories"]
NEWSAPI["NewsAPI\n80K+ sources"]
GUARD["Guardian\n5K req/day"]
NYT["New York Times\n500 req/day"]
GDELT["GDELT\nNo key required"]
GNEWS["Google News RSS\nNo key required"]
NEWSDATA["NewsData.io\n200 req/day"]
FINN_CO["Finnhub Company\nPer-symbol news"]
end
subgraph SocialPlatforms["Social Platforms"]
REDDIT["Reddit\nOAuth2"]
TWITTER["Twitter/X\nBearer token"]
DISCORD["Discord\nBot token"]
TELEGRAM["Telegram\nBot token optional"]
BLUESKY["Bluesky\nAT Protocol"]
TRUTH["Truth Social\nPublic posts"]
RSS["RSS Feeds\nNo auth"]
end
subgraph DerivMacro["Derivatives & Macro"]
VIX["VIX\nvia Yahoo"]
FG["Fear & Greed\nalternative.me"]
FRED["FRED\nTreasury, SOFR, TIPS"]
CFTC["CFTC CoT\nQuandl"]
FXST["FXStreet\nOAuth2 calendar"]
BFUND["Binance Funding\nBot detection"]
CGLASS["Coinglass\nLiquidations"]
end
PredictionMarkets --> PE[(prediction_events)]
PriceData --> MPH[(market_price_history)]
NewsSources --> NA[(news_archive)]
SocialPlatforms --> SP[(social_posts)]
DerivMacro --> DT[(derivatives_data)]
Integration Status
| Category | Source | API Key | Status | Rate Limit |
| Prediction Markets | Polymarket | None (public) | Active | Generous |
| Kalshi | Optional | Active | Rate-limited w/o key |
| Metaculus | METACULUS_TOKEN | Key Required | Enforced 2026-05 |
| PredictIt | N/A | Broken | Cloudflare blocks |
| Price Data | Yahoo Finance | None | Active | Generous |
| Finnhub | FINNHUB_API_KEY | Active | 60 calls/min |
| Polygon.io | POLYGON_API_KEY | Optional | Premium features |
| CoinGecko | None | Active | Generous |
| News | Finnhub News | FINNHUB_API_KEY | Active | Shared w/ price |
| NewsAPI | NEWSAPI_KEY | Active | 100-500/day |
| Guardian | GUARDIAN_API_KEY | Optional | 5K/day |
| GDELT | None | Active | Generous |
| Social | Reddit | OAuth2 | Active | 60 req/min |
| Twitter/X | Bearer token | Paid ($100/mo) | Basic tier |
| Discord/Telegram | Bot token | User-provided | Per-user |
3. News Pipeline
News Data Flow
sequenceDiagram
participant SCH as Scheduler
participant FINN as Finnhub API
participant NA as NewsAPI
participant GDELT as GDELT
participant LEX as Lexicon Scorer
participant LLM as Claude Haiku
participant DB as news_archive
participant WH as Webhooks
Note over SCH: Every 15 minutes
SCH->>FINN: GET /news?category=general,forex,crypto,merger
FINN-->>SCH: Headlines (4 categories)
Note over SCH: Every 60 minutes
SCH->>NA: GET /everything?q=keywords
NA-->>SCH: Headlines (80K+ sources)
SCH->>GDELT: GET /events
GDELT-->>SCH: Global events
SCH->>SCH: Deduplicate by URL
SCH->>SCH: Extract entities (tickers, companies)
SCH->>SCH: Geopolitical event tagging
SCH->>LEX: Score sentiment (regex)
LEX-->>SCH: sentiment_score (-1 to +1)
SCH->>DB: INSERT (sentiment_source='lexicon')
Note over SCH: LLM upgrade pass (signal-worthy only)
SCH->>LLM: Batch score (up to 50/run)
LLM-->>SCH: sentiment, confidence, reasoning
SCH->>DB: UPDATE (sentiment_source='llm')
SCH->>WH: publishEvent('news.headline.archived')
News Adapter Architecture
flowchart TB
subgraph Adapters["News Adapters (packages/be/src/news/adapters/)"]
GA["gdelt.ts\nNo key, geopolitical"]
GN["google-news-rss.ts\nRSS, no key"]
ND["newsdata.ts\nNEWSDATA_API_KEY"]
GU["guardian.ts\nGUARDIAN_API_KEY"]
NY["nyt.ts\nNYT_API_KEY"]
FC["finnhub-company.ts\nFINNHUB_API_KEY"]
NAP["newsapi.ts\nNEWSAPI_KEY"]
end
subgraph Service["News Service"]
LS["listNewsSources()"]
RS["runSourceIngestion()"]
IAE["ingestAllEnabled()"]
end
subgraph Enrichment["Enrichment"]
EE["entity-extractor.ts\n(shared with social)"]
LS2["llm-scorer.ts\nClaude Haiku"]
GEO["geopolitical-events.ts"]
end
subgraph Storage["Storage"]
NST["news_sources\n(config table)"]
NAT["news_archive\n(deduplicated headlines)"]
PST["prediction_signals\n(NEWS type)"]
end
GA & GN & ND & GU & NY & FC & NAP --> Service
Service --> |"getSecret()"| PS[(platform_secrets)]
Service --> Enrichment
Enrichment --> Storage
News Adapter Files
| File | Source | API Key | Rate Limit | Key Features |
gdelt.ts | GDELT Project | None | Generous | Global events, conflict tracking |
google-news-rss.ts | Google News | None | Generous | RSS fallback, broad coverage |
newsdata.ts | NewsData.io | NEWSDATA_API_KEY | 200/day | Real-time, 50K+ sources |
guardian.ts | The Guardian | GUARDIAN_API_KEY | 5K/day | UK/international quality |
nyt.ts | New York Times | NYT_API_KEY | 500/day | US news, article search |
finnhub-company.ts | Finnhub | FINNHUB_API_KEY | 60/min | Per-symbol company news |
newsapi.ts | NewsAPI | NEWSAPI_KEY | 100-500/day | 80K+ sources aggregator |
4. Social Pipeline
Social Data Flow
sequenceDiagram
participant SF as social_follows
participant SCH as Scheduler
participant RED as Reddit API
participant TW as Twitter API
participant DC as Discord
participant TG as Telegram
participant BS as Bluesky
participant EE as Entity Extractor
participant LEX as Lexicon
participant LLM as Claude Haiku
participant SP as social_posts
participant SH as sentiment_hourly
Note over SCH: pollDueFollows (every 15-60 min)
SCH->>SF: Get due follows
SF-->>SCH: Follows to poll
par Platform Polling
SCH->>RED: GET /r/{subreddit}/new
RED-->>SCH: Posts
and
SCH->>TW: GET /2/tweets/search
TW-->>SCH: Tweets
and
SCH->>DC: GET /channels/{id}/messages
DC-->>SCH: Messages
and
SCH->>TG: GET /getUpdates
TG-->>SCH: Messages
and
SCH->>BS: GET /xrpc/app.bsky.feed.getTimeline
BS-->>SCH: Posts
end
SCH->>SCH: Deduplicate (platform + external_id)
SCH->>EE: Extract entities
EE-->>SCH: {tickers, companies, themes}
SCH->>LEX: Lexicon sentiment
LEX-->>SCH: score (-1 to +1)
SCH->>SP: INSERT (sentiment_source='lexicon')
Note over SCH: LLM upgrade (signal-worthy: upvotes>=10 OR comments>=5)
SCH->>SP: Query signal-worthy posts
SP-->>SCH: Posts to upgrade
SCH->>LLM: Batch score
LLM-->>SCH: sentiment, confidence, reasoning
SCH->>SP: UPDATE (sentiment_source='llm')
Note over SCH: rollupSentimentHourly (every 15 min)
SCH->>SP: Aggregate last 48h
SP-->>SCH: Grouped by topic/platform/hour
SCH->>SH: UPSERT hourly buckets
Social Platform Adapters
flowchart TB
subgraph Follows["social_follows Table"]
F1["platform: reddit\ntarget: r/wallstreetbets"]
F2["platform: twitter\ntarget: @elonmusk"]
F3["platform: discord\ntarget: channel_id"]
F4["platform: telegram\ntarget: @cryptosignal"]
F5["platform: bluesky\ntarget: user.bsky.social"]
F6["platform: rss\ntarget: https://feed.url"]
end
subgraph Adapters["Social Adapters (packages/be/src/social/adapters/)"]
RA["reddit.ts\nOAuth2 / public fallback"]
TA["twitter.ts\nBearer token required"]
DA["discord.ts\nBot token + MESSAGE_CONTENT"]
TGA["telegram.ts\nBot token / HTML scrape"]
BA["bluesky.ts\nAT Protocol"]
RSSA["rss.ts\nNo auth"]
end
subgraph Processing["Processing"]
EE["entity-extractor.ts\nTickers, companies, themes"]
LS["lexicon scoring"]
LLM["llm-scorer.ts\nHaiku batch"]
end
subgraph Storage["Storage"]
SP["social_posts\n(raw posts)"]
SH["sentiment_hourly\n(aggregated)"]
end
F1 --> RA
F2 --> TA
F3 --> DA
F4 --> TGA
F5 --> BA
F6 --> RSSA
RA & TA & DA & TGA & BA & RSSA --> Processing
Processing --> SP
SP --> |"rollupSentimentHourly"| SH
Social Platform Configuration
| Platform | Auth Type | Required Credentials | Features |
| Reddit | OAuth2 | CLIENT_ID, SECRET, USERNAME, PASSWORD | Subreddits, users, comments |
| Twitter/X | Bearer Token | TWITTER_BEARER_TOKEN ($100/mo) | Tweets, search, mentions |
| Discord | Bot Token | DISCORD_BOT_TOKEN (user-provided) | Channel messages, reactions |
| Telegram | Bot Token (optional) | TELEGRAM_BOT_TOKEN | Public channels (no auth), private groups (bot) |
| Bluesky | AT Protocol | BLUESKY_IDENTIFIER, APP_PASSWORD | Posts, follows, feeds |
| Truth Social | None (scraping) | None | Public posts only |
| RSS | None | None | Blog feeds, newsletters |
5. Market Data Pipeline
Price Data Flow
flowchart TB
subgraph PriceSources["Price Sources"]
YAHOO["Yahoo Finance\n(stocks, crypto, VIX)"]
FINN["Finnhub\n(real-time quotes)"]
POLY["Polygon.io\n(equity ticks, FX OHLC)"]
CG["CoinGecko\n(crypto OHLC)"]
FRANK["Frankfurter\n(ECB forex rates)"]
ALP["Alpaca\n(per-user, stocks+crypto)"]
BIN["Binance\n(crypto spot+futures)"]
end
subgraph PriceService["price-service.ts"]
GP["getPrice(symbol)"]
GHP["getHistoricalPrices(symbol, days)"]
GMP["getMultiplePrices(symbols)"]
end
subgraph Fallback["Fallback Chain"]
D1["1. Imported datasets\n(user/platform)"]
D2["2. Yahoo Finance\n(primary)"]
D3["3. CoinGecko\n(crypto)"]
D4["4. Frankfurter\n(forex)"]
D5["5. Polygon\n(if key set)"]
end
subgraph Storage["Storage"]
PS["price_snapshots\n(15-min intervals)"]
MPH["market_price_history\n(daily OHLC)"]
IMP["imported_datasets\n(user uploads)"]
end
subgraph Consumers["Consumers"]
BST["Backtest Engine"]
EXEC["Paper/Live Executor"]
PINN["PINN Predictor"]
FVG["FVG Detection"]
end
PriceSources --> PriceService
PriceService --> Fallback
Fallback --> Storage
Storage --> Consumers
Price Collection Scheduler
sequenceDiagram
participant SCH as Scheduler
participant PS as price-service
participant YAHOO as Yahoo Finance
participant CG as CoinGecko
participant DB as price_snapshots
Note over SCH: collectPrices (every 30 seconds)
SCH->>PS: getMultiplePrices(collection_universe)
PS->>YAHOO: batch request (stocks, ETFs)
YAHOO-->>PS: prices
PS->>CG: batch request (crypto)
CG-->>PS: prices
PS-->>SCH: {symbol: price, ...}
SCH->>DB: UPSERT price_snapshots
Note right of DB: Indexed for fast\nhistorical lookups
6. Prediction Markets
Prediction Market Integration
flowchart TB
subgraph Markets["Prediction Market APIs"]
POLY["Polymarket\nGamma API\ngamma-api.polymarket.com"]
KALSHI["Kalshi\napi.elections.kalshi.com"]
META["Metaculus\nmetaculus.com/api"]
PRED["PredictIt\n(BROKEN)"]
end
subgraph Adapters["Feed Adapters"]
PA["polymarket.ts\nNo auth required"]
KA["kalshi.ts\nOptional API key"]
MA["metaculus.ts\nMETACULUS_TOKEN required"]
end
subgraph Transform["Transform & Normalize"]
NORM["Normalize to common schema:\n- event_id\n- title\n- probability\n- volume\n- end_date\n- category"]
end
subgraph Storage["Storage"]
PE["prediction_events\n(market metadata)"]
PS["prediction_signals\n(probability changes)"]
end
subgraph DSL["DSL Primitives"]
P1["polymarket_probability(slug)"]
P2["kalshi_probability(ticker)"]
P3["metaculus_probability(id)"]
P4["prediction_market_consensus(topic)"]
end
Markets --> Adapters
Adapters --> Transform
Transform --> Storage
Storage --> DSL
Prediction Market Data Model
| Field | Description | Example |
| market_id | Unique ID from source | polymarket:0x123abc |
| source | Market provider | polymarket, kalshi, metaculus |
| title | Event question | "Will Fed cut rates in June 2026?" |
| probability | Current YES probability | 0.65 (65%) |
| volume | Trading volume | $1,234,567 |
| category | Topic category | economics, politics, crypto |
| end_date | Resolution date | 2026-06-15 |
7. Derivatives & Macro Data
Derivatives Data Flow
flowchart TB
subgraph Sources["Data Sources"]
VIX["VIX\nvia Yahoo Finance"]
FG["Fear & Greed\nalternative.me"]
FRED["FRED\nTreasury, SOFR, TIPS"]
BF["Binance Funding\nfapi/v1/fundingRate"]
CGL["Coinglass\nLiquidations, OI"]
DER["Deribit\nIV, options"]
end
subgraph Integration["integrations/derivatives.ts"]
GV["getVixData()"]
GFG["getFearGreedIndex()"]
GYC["getYieldCurve()"]
GFR["getFundingRates()"]
GLQ["getLiquidations()"]
end
subgraph Storage["Storage"]
DD["derivatives_data\n(VIX, funding, OI)"]
OV["overlay_series\n(yield curves)"]
CI["computed_insights\n(bot detection)"]
end
subgraph Analysis["Analysis"]
REGIME["Regime Detection\n(VIX > 75th pct)"]
BOT["Bot Detection\n(funding extremity)"]
YCS["Yield Curve Signals\n(inversion, spreads)"]
end
subgraph DSL["DSL Primitives"]
D1["vix_level()"]
D2["fear_greed_index()"]
D3["funding_rate(symbol)"]
D4["yield_spread(tenor1, tenor2)"]
D5["is_yield_curve_inverted()"]
end
Sources --> Integration
Integration --> Storage
Storage --> Analysis
Analysis --> DSL
FRED Data Series
| Series | Description | Update Frequency |
| DGS2, DGS5, DGS10, DGS30 | Treasury yields (2y, 5y, 10y, 30y) | Daily |
| T10Y2Y | 10Y-2Y spread (inversion indicator) | Daily |
| DFII10 | 10Y TIPS (real yield) | Daily |
| T10YIE | 10Y breakeven inflation | Daily |
| SOFR | Secured Overnight Financing Rate | Daily |
| MOVE | Bond volatility index | Daily |
8. Enrichment Pipeline
Two-Tier Sentiment Scoring
flowchart TB
subgraph Input["Raw Content"]
NEWS["News Headlines"]
SOCIAL["Social Posts"]
end
subgraph Tier1["Tier 1: Lexicon (All Items)"]
LEX["lexicon-scorer.ts\nRegex patterns:\nPOS: surge, rally, beat...\nNEG: crash, plunge, miss..."]
L_OUT["Output:\nsentiment: -1 to +1\nsentiment_source: 'lexicon'"]
end
subgraph Filter["Signal-Worthy Filter"]
F1["News: has ticker entity\nOR >=2 topics\nOR tracked sector"]
F2["Social: upvotes>=10\nOR comments>=5\nAND has keywords\nAND <7 days old"]
end
subgraph Tier2["Tier 2: LLM (Selective)"]
LLM["llm-scorer.ts\nClaude Haiku\ntemperature: 0\nbatch: 8 items/call"]
L2_OUT["Output:\nsentiment: -1 to +1\nconfidence: 0 to 1\nreasoning: 'Fed rate cut...'"]
end
subgraph Budget["Budget Control"]
BUD["Daily cap: 5000 calls\nResets at UTC midnight"]
end
subgraph Storage["Storage"]
NA["news_archive"]
SP["social_posts"]
end
Input --> Tier1
Tier1 --> Storage
Storage --> Filter
Filter --> |"~5-10% of items"| Tier2
Tier2 --> Budget
Budget --> Storage
Entity Extraction Flow
flowchart LR
subgraph Input["Input Text"]
TXT["'NVDA surges 5% after\nJensen Huang announces\nBlackwell AI chip'"]
end
subgraph Extractor["entity-extractor.ts"]
TICK["Ticker Detection\nRegex + symbol lookup"]
COMP["Company Names\nFuzzy matching"]
PERSON["Person Names\nNER patterns"]
THEME["Theme Classification\nKeyword matching"]
end
subgraph Output["Extracted Entities"]
OUT["{\n tickers: ['NVDA'],\n companies: ['NVIDIA'],\n people: ['Jensen Huang'],\n themes: ['ai', 'semiconductors'],\n sectors: ['technology']\n}"]
end
Input --> Extractor
Extractor --> Output
9. Signal Synthesis
Divergence Engine (Human vs Bot)
flowchart TB
subgraph Inputs["Inputs (7 days)"]
SH["sentiment_hourly\n(per-symbol sentiment)"]
PRICES["Historical Prices\n(daily returns)"]
end
subgraph Compute["divergence-engine.ts"]
CORR["Compute Pearson correlation\n(price_return vs sentiment_delta)"]
CLASS["Classify by correlation threshold"]
end
subgraph States["5 Classification States"]
S1["HUMAN-DRIVEN\ncorr > 0.30"]
S2["BOT-LIKELY\nprice moved, sentiment flat"]
S3["SOCIAL-NOISE\nsentiment moved, price flat"]
S4["QUIET\nneither moved"]
S5["INSUFFICIENT\nsparse data"]
end
subgraph Storage["computed_insights"]
CI["kind: 'divergence_humanbot'\nscope_type: 'symbol'\nscope_id: 'AAPL'\nclassification: 'HUMAN-DRIVEN'\nconfidence: 0.75"]
end
Inputs --> Compute
Compute --> States
States --> Storage
Composite Score Calculation
flowchart TB
subgraph SubInputs["Sub-Inputs"]
DIV["Divergence Score\n(from divergence engine)\nweight: 1.0"]
FUND["Crypto Funding Extremity\n|funding_rate| > 0.05%\nweight: 0.8"]
VIX["VIX Regime\n> 75th percentile\nweight: 0.6"]
end
subgraph Calculate["composite-score.ts"]
NORM["Normalize each input\nto [-1, +1]"]
WAVG["Weighted average\nΣ(input × weight) / Σ(weight)"]
end
subgraph Output["Output"]
SCORE["Composite Score\n-1 = Bot dominated\n 0 = Ambiguous\n+1 = Human driven"]
end
subgraph Storage["computed_insights"]
CI["kind: 'human_automation'\nscope_type: 'symbol'\nscope_id: 'AAPL'\nvalue: 0.65"]
end
SubInputs --> Calculate
Calculate --> Output
Output --> Storage
10. Scheduler Orchestration
Scheduler Job Map
flowchart TB
subgraph Every15Min["Every 15 Minutes"]
J1["collectPrices\n→ price_snapshots"]
J2["collectNewsHeadlines\n→ news_archive"]
J3["rollupSentimentHourly\n→ sentiment_hourly"]
J4["runSocialPostSentimentPass\n→ social_posts (LLM)"]
J5["computeDivergence\n→ computed_insights"]
J6["computeHumanAutomation\n→ computed_insights"]
J7["scanForWarmingPatterns\n→ pattern_warming_state"]
end
subgraph Every60Min["Every 60 Minutes"]
J8["pollDueFollows\n→ social_posts"]
J9["ingestAllEnabledNewsSources\n→ prediction_signals"]
J10["syncAllPredictionMarkets\n→ prediction_events"]
J11["tickAllActiveRuns\n→ algorithm_trades"]
end
subgraph Daily["Daily / Periodic"]
J12["syncEconomicCalendar (6h)\n→ economic_calendar_events"]
J13["sync13FFilings (weekly)\n→ institutional_holdings"]
J14["syncActivistFilings (daily)\n→ activist_filings"]
J15["refreshCompanyRelationships (weekly)\n→ company_relationships"]
J16["eodCriticSweep (daily)\n→ algorithm_eod_critic"]
end
subgraph PM2["PM2 Process"]
SCH["agencio-scheduler\nscheduler-runner.ts"]
end
PM2 --> Every15Min
PM2 --> Every60Min
PM2 --> Daily
Job Frequency Table
| Job Name | Interval | Target Table(s) | Purpose |
| price-collection | 30s | price_snapshots | Real-time price collection |
| news-archive | 15m | news_archive | Finnhub headlines + LLM scoring |
| news-ingestion | 15m | prediction_signals | All news adapters |
| sentiment-rollup | 15m | sentiment_hourly | Aggregate social sentiment |
| social-poll | 15-60m | social_posts | Platform polling |
| llm-sentiment-pass | 15m | social_posts | LLM upgrade for signal-worthy |
| divergence-compute | 15m | computed_insights | Human vs bot classification |
| human-automation | 15m | computed_insights | Composite score |
| prediction-sync | 60m | prediction_events | Market probability updates |
| algorithm-tick | 60m | algorithm_trades | Paper/live trading execution |
11. Storage Schema
Database Schema Relationships
erDiagram
news_archive {
text url PK
text title
text summary
text source_name
text source_category
timestamptz published_at
jsonb topics
jsonb entities
float sentiment_score
text sentiment_source
float sentiment_confidence
text sentiment_reasoning
text[] geopolitical_event_ids
}
social_posts {
uuid id PK
uuid follow_id FK
text platform
text external_id
text content
float sentiment_score
text sentiment_source
float sentiment_confidence
jsonb keywords
int upvotes
int comments
timestamptz posted_at
}
social_follows {
uuid id PK
text platform
text target_type
text target_id
boolean enabled
interval poll_interval
timestamptz next_poll_at
}
sentiment_hourly {
uuid id PK
text topic
text platform
timestamptz hour_bucket
float avg_sentiment
int post_count
int positive_count
int negative_count
}
computed_insights {
uuid id PK
text kind
text scope_type
text scope_id
float value
jsonb components
text inputs_hash
timestamptz computed_at
timestamptz validity_until
}
prediction_events {
text id PK
text market_id
text source
text title
float probability
float volume
text category
timestamptz end_date
}
market_price_history {
uuid id PK
text symbol
date bar_date
float open
float high
float low
float close
bigint volume
text source
}
prediction_signals {
uuid id PK
text source_type
text source_name
text signal_type
text direction
float magnitude
float confidence
text title
text source_url
jsonb metadata
}
social_follows ||--o{ social_posts : "generates"
social_posts ||--o{ sentiment_hourly : "aggregates_to"
news_archive ||--o{ prediction_signals : "feeds"
sentiment_hourly ||--o{ computed_insights : "inputs_to"
market_price_history ||--o{ computed_insights : "inputs_to"
Table Growth Estimates
| Table | Daily Growth | Annual Growth | Retention |
| news_archive | ~140 rows | ~50K rows | Indefinite |
| social_posts | ~3K rows | ~1.1M rows | Indefinite |
| sentiment_hourly | ~200 rows | ~73K rows | 2+ years |
| computed_insights | ~500 rows | ~180K rows | Per validity_until |
| price_snapshots | ~2.8K rows | ~1M rows | 30 days |
| prediction_events | ~50 rows | ~18K rows | Indefinite |
12. File Reference
Integration Files
| File Path | Purpose |
packages/be/src/integrations/finnhub.ts | Finnhub API client (quotes, sentiment, news, calendar) |
packages/be/src/integrations/derivatives.ts | VIX, funding rates, OI, liquidations |
packages/be/src/integrations/polygon.ts | Polygon.io FX OHLC + equity ticks |
packages/be/src/integrations/binance-trades.ts | Binance aggTrades (crypto per-trade) |
packages/be/src/integrations/fred.ts | FRED economic data (yields, SOFR, TIPS) |
packages/be/src/integrations/platform-secrets.ts | DB-first API key management |
News Adapter Files
| File Path | Source |
packages/be/src/news/adapters/gdelt.ts | GDELT Project |
packages/be/src/news/adapters/google-news-rss.ts | Google News RSS |
packages/be/src/news/adapters/newsdata.ts | NewsData.io |
packages/be/src/news/adapters/guardian.ts | The Guardian |
packages/be/src/news/adapters/nyt.ts | New York Times |
packages/be/src/news/adapters/finnhub-company.ts | Finnhub Company News |
packages/be/src/news/adapters/newsapi.ts | NewsAPI |
Social Adapter Files
| File Path | Platform |
packages/be/src/social/adapters/reddit.ts | Reddit |
packages/be/src/social/adapters/twitter.ts | Twitter/X |
packages/be/src/social/adapters/discord.ts | Discord |
packages/be/src/social/adapters/telegram.ts | Telegram |
packages/be/src/social/adapters/bluesky.ts | Bluesky |
packages/be/src/social/adapters/rss.ts | RSS Feeds |
Scheduler Files
| File Path | Purpose |
packages/be/src/scheduler/index.ts | Main scheduler, job registration |
packages/be/src/scheduler/news-collector.ts | Finnhub news collection + LLM scoring |
packages/be/src/scheduler/price-collector.ts | Price snapshot collection |
packages/be/src/scheduler/sentiment-rollup.ts | Hourly sentiment aggregation |
Enrichment Files
| File Path | Purpose |
packages/be/src/social/entity-extractor.ts | Ticker/company/person extraction |
packages/be/src/sentiment/llm-scorer.ts | Claude Haiku sentiment scoring |
packages/be/src/insights/divergence-engine.ts | Human vs bot classification |
packages/be/src/insights/composite-score.ts | Weighted composite score |
Integration Flow Diagrams - Agencio Predict
Generated: 2026-05-19 | 40+ integrations mapped