Integration Flow Diagrams

Agencio Predict Data Pipeline & Integration Architecture

40+
External APIs
8
News Adapters
7
Social Platforms
4
Prediction Markets
19
Scheduler Jobs/Hour

Table of Contents

1. System Overview

Agencio Predict operates a three-stage data pipeline: Ingestion (fetch raw data), Enrichment (sentiment, entities), and Synthesis (divergence, composite scores).

High-Level Architecture
flowchart TB subgraph Sources["External Data Sources (40+ APIs)"] PM["Prediction Markets\n(Polymarket, Kalshi, Metaculus)"] PRICE["Price Data\n(Yahoo, Finnhub, Polygon, CoinGecko)"] NEWS["News Sources\n(Finnhub, NewsAPI, Guardian, NYT, GDELT)"] SOCIAL["Social Platforms\n(Reddit, Twitter, Discord, Telegram, Bluesky)"] DERIV["Derivatives & Macro\n(VIX, FRED, Binance Funding, Coinglass)"] end subgraph Scheduler["Scheduler (PM2)"] S15["15-min Jobs"] S60["60-min Jobs"] SDAY["Daily Jobs"] end subgraph Ingestion["Stage 1: Ingestion"] FETCH["Fetch Raw Data"] DEDUP["Deduplicate"] STORE1["Store Raw"] end subgraph Enrichment["Stage 2: Enrichment"] LEX["Lexicon Sentiment\n(regex, fast, free)"] LLM["LLM Sentiment\n(Claude Haiku, selective)"] ENT["Entity Extraction\n(tickers, companies, people)"] GEO["Geopolitical Tagging"] end subgraph Synthesis["Stage 3: Synthesis"] ROLL["Sentiment Rollup\n(48h window)"] DIV["Divergence Engine\n(human vs bot)"] COMP["Composite Score\n(weighted average)"] PAT["Pattern Detection"] end subgraph Storage["Storage Layer"] NA["news_archive"] SP["social_posts"] SH["sentiment_hourly"] CI["computed_insights"] PE["prediction_events"] MPH["market_price_history"] end subgraph Consumers["Consumers"] DSL["DSL Evaluator\n(Algorithm Builder)"] UI["Dashboard / Console"] API["API Endpoints"] ALERT["Alerts & Notifications"] end Sources --> Scheduler Scheduler --> Ingestion Ingestion --> Enrichment Enrichment --> Storage Storage --> Synthesis Synthesis --> CI CI --> Consumers Storage --> Consumers

2. Data Source Categories

Integration Source Map
flowchart LR subgraph PredictionMarkets["Prediction Markets"] POLY["Polymarket\nGamma API (public)"] KALSHI["Kalshi\nOptional API key"] META["Metaculus\nMETACULUS_TOKEN"] PREDIT["PredictIt\n(BROKEN - Cloudflare)"] end subgraph PriceData["Price & Market Data"] YAHOO["Yahoo Finance\nVIX, equities, crypto"] FINN["Finnhub\n60 calls/min free"] POLY_IO["Polygon.io\nEquity ticks, FX OHLC"] CG["CoinGecko\nCrypto OHLC"] FRANK["Frankfurter\nFX spot rates"] ALPACA["Alpaca\nStocks, crypto"] BINANCE["Binance\nSpot, futures, L2"] DERIBIT["Deribit\nCrypto derivatives"] end subgraph NewsSources["News Sources"] FINN_N["Finnhub News\n4 categories"] NEWSAPI["NewsAPI\n80K+ sources"] GUARD["Guardian\n5K req/day"] NYT["New York Times\n500 req/day"] GDELT["GDELT\nNo key required"] GNEWS["Google News RSS\nNo key required"] NEWSDATA["NewsData.io\n200 req/day"] FINN_CO["Finnhub Company\nPer-symbol news"] end subgraph SocialPlatforms["Social Platforms"] REDDIT["Reddit\nOAuth2"] TWITTER["Twitter/X\nBearer token"] DISCORD["Discord\nBot token"] TELEGRAM["Telegram\nBot token optional"] BLUESKY["Bluesky\nAT Protocol"] TRUTH["Truth Social\nPublic posts"] RSS["RSS Feeds\nNo auth"] end subgraph DerivMacro["Derivatives & Macro"] VIX["VIX\nvia Yahoo"] FG["Fear & Greed\nalternative.me"] FRED["FRED\nTreasury, SOFR, TIPS"] CFTC["CFTC CoT\nQuandl"] FXST["FXStreet\nOAuth2 calendar"] BFUND["Binance Funding\nBot detection"] CGLASS["Coinglass\nLiquidations"] end PredictionMarkets --> PE[(prediction_events)] PriceData --> MPH[(market_price_history)] NewsSources --> NA[(news_archive)] SocialPlatforms --> SP[(social_posts)] DerivMacro --> DT[(derivatives_data)]

Integration Status

CategorySourceAPI KeyStatusRate Limit
Prediction MarketsPolymarketNone (public)ActiveGenerous
KalshiOptionalActiveRate-limited w/o key
MetaculusMETACULUS_TOKENKey RequiredEnforced 2026-05
PredictItN/ABrokenCloudflare blocks
Price DataYahoo FinanceNoneActiveGenerous
FinnhubFINNHUB_API_KEYActive60 calls/min
Polygon.ioPOLYGON_API_KEYOptionalPremium features
CoinGeckoNoneActiveGenerous
NewsFinnhub NewsFINNHUB_API_KEYActiveShared w/ price
NewsAPINEWSAPI_KEYActive100-500/day
GuardianGUARDIAN_API_KEYOptional5K/day
GDELTNoneActiveGenerous
SocialRedditOAuth2Active60 req/min
Twitter/XBearer tokenPaid ($100/mo)Basic tier
Discord/TelegramBot tokenUser-providedPer-user

3. News Pipeline

News Data Flow
sequenceDiagram participant SCH as Scheduler participant FINN as Finnhub API participant NA as NewsAPI participant GDELT as GDELT participant LEX as Lexicon Scorer participant LLM as Claude Haiku participant DB as news_archive participant WH as Webhooks Note over SCH: Every 15 minutes SCH->>FINN: GET /news?category=general,forex,crypto,merger FINN-->>SCH: Headlines (4 categories) Note over SCH: Every 60 minutes SCH->>NA: GET /everything?q=keywords NA-->>SCH: Headlines (80K+ sources) SCH->>GDELT: GET /events GDELT-->>SCH: Global events SCH->>SCH: Deduplicate by URL SCH->>SCH: Extract entities (tickers, companies) SCH->>SCH: Geopolitical event tagging SCH->>LEX: Score sentiment (regex) LEX-->>SCH: sentiment_score (-1 to +1) SCH->>DB: INSERT (sentiment_source='lexicon') Note over SCH: LLM upgrade pass (signal-worthy only) SCH->>LLM: Batch score (up to 50/run) LLM-->>SCH: sentiment, confidence, reasoning SCH->>DB: UPDATE (sentiment_source='llm') SCH->>WH: publishEvent('news.headline.archived')
News Adapter Architecture
flowchart TB subgraph Adapters["News Adapters (packages/be/src/news/adapters/)"] GA["gdelt.ts\nNo key, geopolitical"] GN["google-news-rss.ts\nRSS, no key"] ND["newsdata.ts\nNEWSDATA_API_KEY"] GU["guardian.ts\nGUARDIAN_API_KEY"] NY["nyt.ts\nNYT_API_KEY"] FC["finnhub-company.ts\nFINNHUB_API_KEY"] NAP["newsapi.ts\nNEWSAPI_KEY"] end subgraph Service["News Service"] LS["listNewsSources()"] RS["runSourceIngestion()"] IAE["ingestAllEnabled()"] end subgraph Enrichment["Enrichment"] EE["entity-extractor.ts\n(shared with social)"] LS2["llm-scorer.ts\nClaude Haiku"] GEO["geopolitical-events.ts"] end subgraph Storage["Storage"] NST["news_sources\n(config table)"] NAT["news_archive\n(deduplicated headlines)"] PST["prediction_signals\n(NEWS type)"] end GA & GN & ND & GU & NY & FC & NAP --> Service Service --> |"getSecret()"| PS[(platform_secrets)] Service --> Enrichment Enrichment --> Storage

News Adapter Files

FileSourceAPI KeyRate LimitKey Features
gdelt.tsGDELT ProjectNoneGenerousGlobal events, conflict tracking
google-news-rss.tsGoogle NewsNoneGenerousRSS fallback, broad coverage
newsdata.tsNewsData.ioNEWSDATA_API_KEY200/dayReal-time, 50K+ sources
guardian.tsThe GuardianGUARDIAN_API_KEY5K/dayUK/international quality
nyt.tsNew York TimesNYT_API_KEY500/dayUS news, article search
finnhub-company.tsFinnhubFINNHUB_API_KEY60/minPer-symbol company news
newsapi.tsNewsAPINEWSAPI_KEY100-500/day80K+ sources aggregator

4. Social Pipeline

Social Data Flow
sequenceDiagram participant SF as social_follows participant SCH as Scheduler participant RED as Reddit API participant TW as Twitter API participant DC as Discord participant TG as Telegram participant BS as Bluesky participant EE as Entity Extractor participant LEX as Lexicon participant LLM as Claude Haiku participant SP as social_posts participant SH as sentiment_hourly Note over SCH: pollDueFollows (every 15-60 min) SCH->>SF: Get due follows SF-->>SCH: Follows to poll par Platform Polling SCH->>RED: GET /r/{subreddit}/new RED-->>SCH: Posts and SCH->>TW: GET /2/tweets/search TW-->>SCH: Tweets and SCH->>DC: GET /channels/{id}/messages DC-->>SCH: Messages and SCH->>TG: GET /getUpdates TG-->>SCH: Messages and SCH->>BS: GET /xrpc/app.bsky.feed.getTimeline BS-->>SCH: Posts end SCH->>SCH: Deduplicate (platform + external_id) SCH->>EE: Extract entities EE-->>SCH: {tickers, companies, themes} SCH->>LEX: Lexicon sentiment LEX-->>SCH: score (-1 to +1) SCH->>SP: INSERT (sentiment_source='lexicon') Note over SCH: LLM upgrade (signal-worthy: upvotes>=10 OR comments>=5) SCH->>SP: Query signal-worthy posts SP-->>SCH: Posts to upgrade SCH->>LLM: Batch score LLM-->>SCH: sentiment, confidence, reasoning SCH->>SP: UPDATE (sentiment_source='llm') Note over SCH: rollupSentimentHourly (every 15 min) SCH->>SP: Aggregate last 48h SP-->>SCH: Grouped by topic/platform/hour SCH->>SH: UPSERT hourly buckets
Social Platform Adapters
flowchart TB subgraph Follows["social_follows Table"] F1["platform: reddit\ntarget: r/wallstreetbets"] F2["platform: twitter\ntarget: @elonmusk"] F3["platform: discord\ntarget: channel_id"] F4["platform: telegram\ntarget: @cryptosignal"] F5["platform: bluesky\ntarget: user.bsky.social"] F6["platform: rss\ntarget: https://feed.url"] end subgraph Adapters["Social Adapters (packages/be/src/social/adapters/)"] RA["reddit.ts\nOAuth2 / public fallback"] TA["twitter.ts\nBearer token required"] DA["discord.ts\nBot token + MESSAGE_CONTENT"] TGA["telegram.ts\nBot token / HTML scrape"] BA["bluesky.ts\nAT Protocol"] RSSA["rss.ts\nNo auth"] end subgraph Processing["Processing"] EE["entity-extractor.ts\nTickers, companies, themes"] LS["lexicon scoring"] LLM["llm-scorer.ts\nHaiku batch"] end subgraph Storage["Storage"] SP["social_posts\n(raw posts)"] SH["sentiment_hourly\n(aggregated)"] end F1 --> RA F2 --> TA F3 --> DA F4 --> TGA F5 --> BA F6 --> RSSA RA & TA & DA & TGA & BA & RSSA --> Processing Processing --> SP SP --> |"rollupSentimentHourly"| SH

Social Platform Configuration

PlatformAuth TypeRequired CredentialsFeatures
RedditOAuth2CLIENT_ID, SECRET, USERNAME, PASSWORDSubreddits, users, comments
Twitter/XBearer TokenTWITTER_BEARER_TOKEN ($100/mo)Tweets, search, mentions
DiscordBot TokenDISCORD_BOT_TOKEN (user-provided)Channel messages, reactions
TelegramBot Token (optional)TELEGRAM_BOT_TOKENPublic channels (no auth), private groups (bot)
BlueskyAT ProtocolBLUESKY_IDENTIFIER, APP_PASSWORDPosts, follows, feeds
Truth SocialNone (scraping)NonePublic posts only
RSSNoneNoneBlog feeds, newsletters

5. Market Data Pipeline

Price Data Flow
flowchart TB subgraph PriceSources["Price Sources"] YAHOO["Yahoo Finance\n(stocks, crypto, VIX)"] FINN["Finnhub\n(real-time quotes)"] POLY["Polygon.io\n(equity ticks, FX OHLC)"] CG["CoinGecko\n(crypto OHLC)"] FRANK["Frankfurter\n(ECB forex rates)"] ALP["Alpaca\n(per-user, stocks+crypto)"] BIN["Binance\n(crypto spot+futures)"] end subgraph PriceService["price-service.ts"] GP["getPrice(symbol)"] GHP["getHistoricalPrices(symbol, days)"] GMP["getMultiplePrices(symbols)"] end subgraph Fallback["Fallback Chain"] D1["1. Imported datasets\n(user/platform)"] D2["2. Yahoo Finance\n(primary)"] D3["3. CoinGecko\n(crypto)"] D4["4. Frankfurter\n(forex)"] D5["5. Polygon\n(if key set)"] end subgraph Storage["Storage"] PS["price_snapshots\n(15-min intervals)"] MPH["market_price_history\n(daily OHLC)"] IMP["imported_datasets\n(user uploads)"] end subgraph Consumers["Consumers"] BST["Backtest Engine"] EXEC["Paper/Live Executor"] PINN["PINN Predictor"] FVG["FVG Detection"] end PriceSources --> PriceService PriceService --> Fallback Fallback --> Storage Storage --> Consumers
Price Collection Scheduler
sequenceDiagram participant SCH as Scheduler participant PS as price-service participant YAHOO as Yahoo Finance participant CG as CoinGecko participant DB as price_snapshots Note over SCH: collectPrices (every 30 seconds) SCH->>PS: getMultiplePrices(collection_universe) PS->>YAHOO: batch request (stocks, ETFs) YAHOO-->>PS: prices PS->>CG: batch request (crypto) CG-->>PS: prices PS-->>SCH: {symbol: price, ...} SCH->>DB: UPSERT price_snapshots Note right of DB: Indexed for fast\nhistorical lookups

6. Prediction Markets

Prediction Market Integration
flowchart TB subgraph Markets["Prediction Market APIs"] POLY["Polymarket\nGamma API\ngamma-api.polymarket.com"] KALSHI["Kalshi\napi.elections.kalshi.com"] META["Metaculus\nmetaculus.com/api"] PRED["PredictIt\n(BROKEN)"] end subgraph Adapters["Feed Adapters"] PA["polymarket.ts\nNo auth required"] KA["kalshi.ts\nOptional API key"] MA["metaculus.ts\nMETACULUS_TOKEN required"] end subgraph Transform["Transform & Normalize"] NORM["Normalize to common schema:\n- event_id\n- title\n- probability\n- volume\n- end_date\n- category"] end subgraph Storage["Storage"] PE["prediction_events\n(market metadata)"] PS["prediction_signals\n(probability changes)"] end subgraph DSL["DSL Primitives"] P1["polymarket_probability(slug)"] P2["kalshi_probability(ticker)"] P3["metaculus_probability(id)"] P4["prediction_market_consensus(topic)"] end Markets --> Adapters Adapters --> Transform Transform --> Storage Storage --> DSL

Prediction Market Data Model

FieldDescriptionExample
market_idUnique ID from sourcepolymarket:0x123abc
sourceMarket providerpolymarket, kalshi, metaculus
titleEvent question"Will Fed cut rates in June 2026?"
probabilityCurrent YES probability0.65 (65%)
volumeTrading volume$1,234,567
categoryTopic categoryeconomics, politics, crypto
end_dateResolution date2026-06-15

7. Derivatives & Macro Data

Derivatives Data Flow
flowchart TB subgraph Sources["Data Sources"] VIX["VIX\nvia Yahoo Finance"] FG["Fear & Greed\nalternative.me"] FRED["FRED\nTreasury, SOFR, TIPS"] BF["Binance Funding\nfapi/v1/fundingRate"] CGL["Coinglass\nLiquidations, OI"] DER["Deribit\nIV, options"] end subgraph Integration["integrations/derivatives.ts"] GV["getVixData()"] GFG["getFearGreedIndex()"] GYC["getYieldCurve()"] GFR["getFundingRates()"] GLQ["getLiquidations()"] end subgraph Storage["Storage"] DD["derivatives_data\n(VIX, funding, OI)"] OV["overlay_series\n(yield curves)"] CI["computed_insights\n(bot detection)"] end subgraph Analysis["Analysis"] REGIME["Regime Detection\n(VIX > 75th pct)"] BOT["Bot Detection\n(funding extremity)"] YCS["Yield Curve Signals\n(inversion, spreads)"] end subgraph DSL["DSL Primitives"] D1["vix_level()"] D2["fear_greed_index()"] D3["funding_rate(symbol)"] D4["yield_spread(tenor1, tenor2)"] D5["is_yield_curve_inverted()"] end Sources --> Integration Integration --> Storage Storage --> Analysis Analysis --> DSL

FRED Data Series

SeriesDescriptionUpdate Frequency
DGS2, DGS5, DGS10, DGS30Treasury yields (2y, 5y, 10y, 30y)Daily
T10Y2Y10Y-2Y spread (inversion indicator)Daily
DFII1010Y TIPS (real yield)Daily
T10YIE10Y breakeven inflationDaily
SOFRSecured Overnight Financing RateDaily
MOVEBond volatility indexDaily

8. Enrichment Pipeline

Two-Tier Sentiment Scoring
flowchart TB subgraph Input["Raw Content"] NEWS["News Headlines"] SOCIAL["Social Posts"] end subgraph Tier1["Tier 1: Lexicon (All Items)"] LEX["lexicon-scorer.ts\nRegex patterns:\nPOS: surge, rally, beat...\nNEG: crash, plunge, miss..."] L_OUT["Output:\nsentiment: -1 to +1\nsentiment_source: 'lexicon'"] end subgraph Filter["Signal-Worthy Filter"] F1["News: has ticker entity\nOR >=2 topics\nOR tracked sector"] F2["Social: upvotes>=10\nOR comments>=5\nAND has keywords\nAND <7 days old"] end subgraph Tier2["Tier 2: LLM (Selective)"] LLM["llm-scorer.ts\nClaude Haiku\ntemperature: 0\nbatch: 8 items/call"] L2_OUT["Output:\nsentiment: -1 to +1\nconfidence: 0 to 1\nreasoning: 'Fed rate cut...'"] end subgraph Budget["Budget Control"] BUD["Daily cap: 5000 calls\nResets at UTC midnight"] end subgraph Storage["Storage"] NA["news_archive"] SP["social_posts"] end Input --> Tier1 Tier1 --> Storage Storage --> Filter Filter --> |"~5-10% of items"| Tier2 Tier2 --> Budget Budget --> Storage
Entity Extraction Flow
flowchart LR subgraph Input["Input Text"] TXT["'NVDA surges 5% after\nJensen Huang announces\nBlackwell AI chip'"] end subgraph Extractor["entity-extractor.ts"] TICK["Ticker Detection\nRegex + symbol lookup"] COMP["Company Names\nFuzzy matching"] PERSON["Person Names\nNER patterns"] THEME["Theme Classification\nKeyword matching"] end subgraph Output["Extracted Entities"] OUT["{\n tickers: ['NVDA'],\n companies: ['NVIDIA'],\n people: ['Jensen Huang'],\n themes: ['ai', 'semiconductors'],\n sectors: ['technology']\n}"] end Input --> Extractor Extractor --> Output

9. Signal Synthesis

Divergence Engine (Human vs Bot)
flowchart TB subgraph Inputs["Inputs (7 days)"] SH["sentiment_hourly\n(per-symbol sentiment)"] PRICES["Historical Prices\n(daily returns)"] end subgraph Compute["divergence-engine.ts"] CORR["Compute Pearson correlation\n(price_return vs sentiment_delta)"] CLASS["Classify by correlation threshold"] end subgraph States["5 Classification States"] S1["HUMAN-DRIVEN\ncorr > 0.30"] S2["BOT-LIKELY\nprice moved, sentiment flat"] S3["SOCIAL-NOISE\nsentiment moved, price flat"] S4["QUIET\nneither moved"] S5["INSUFFICIENT\nsparse data"] end subgraph Storage["computed_insights"] CI["kind: 'divergence_humanbot'\nscope_type: 'symbol'\nscope_id: 'AAPL'\nclassification: 'HUMAN-DRIVEN'\nconfidence: 0.75"] end Inputs --> Compute Compute --> States States --> Storage
Composite Score Calculation
flowchart TB subgraph SubInputs["Sub-Inputs"] DIV["Divergence Score\n(from divergence engine)\nweight: 1.0"] FUND["Crypto Funding Extremity\n|funding_rate| > 0.05%\nweight: 0.8"] VIX["VIX Regime\n> 75th percentile\nweight: 0.6"] end subgraph Calculate["composite-score.ts"] NORM["Normalize each input\nto [-1, +1]"] WAVG["Weighted average\nΣ(input × weight) / Σ(weight)"] end subgraph Output["Output"] SCORE["Composite Score\n-1 = Bot dominated\n 0 = Ambiguous\n+1 = Human driven"] end subgraph Storage["computed_insights"] CI["kind: 'human_automation'\nscope_type: 'symbol'\nscope_id: 'AAPL'\nvalue: 0.65"] end SubInputs --> Calculate Calculate --> Output Output --> Storage

10. Scheduler Orchestration

Scheduler Job Map
flowchart TB subgraph Every15Min["Every 15 Minutes"] J1["collectPrices\n→ price_snapshots"] J2["collectNewsHeadlines\n→ news_archive"] J3["rollupSentimentHourly\n→ sentiment_hourly"] J4["runSocialPostSentimentPass\n→ social_posts (LLM)"] J5["computeDivergence\n→ computed_insights"] J6["computeHumanAutomation\n→ computed_insights"] J7["scanForWarmingPatterns\n→ pattern_warming_state"] end subgraph Every60Min["Every 60 Minutes"] J8["pollDueFollows\n→ social_posts"] J9["ingestAllEnabledNewsSources\n→ prediction_signals"] J10["syncAllPredictionMarkets\n→ prediction_events"] J11["tickAllActiveRuns\n→ algorithm_trades"] end subgraph Daily["Daily / Periodic"] J12["syncEconomicCalendar (6h)\n→ economic_calendar_events"] J13["sync13FFilings (weekly)\n→ institutional_holdings"] J14["syncActivistFilings (daily)\n→ activist_filings"] J15["refreshCompanyRelationships (weekly)\n→ company_relationships"] J16["eodCriticSweep (daily)\n→ algorithm_eod_critic"] end subgraph PM2["PM2 Process"] SCH["agencio-scheduler\nscheduler-runner.ts"] end PM2 --> Every15Min PM2 --> Every60Min PM2 --> Daily

Job Frequency Table

Job NameIntervalTarget Table(s)Purpose
price-collection30sprice_snapshotsReal-time price collection
news-archive15mnews_archiveFinnhub headlines + LLM scoring
news-ingestion15mprediction_signalsAll news adapters
sentiment-rollup15msentiment_hourlyAggregate social sentiment
social-poll15-60msocial_postsPlatform polling
llm-sentiment-pass15msocial_postsLLM upgrade for signal-worthy
divergence-compute15mcomputed_insightsHuman vs bot classification
human-automation15mcomputed_insightsComposite score
prediction-sync60mprediction_eventsMarket probability updates
algorithm-tick60malgorithm_tradesPaper/live trading execution

11. Storage Schema

Database Schema Relationships
erDiagram news_archive { text url PK text title text summary text source_name text source_category timestamptz published_at jsonb topics jsonb entities float sentiment_score text sentiment_source float sentiment_confidence text sentiment_reasoning text[] geopolitical_event_ids } social_posts { uuid id PK uuid follow_id FK text platform text external_id text content float sentiment_score text sentiment_source float sentiment_confidence jsonb keywords int upvotes int comments timestamptz posted_at } social_follows { uuid id PK text platform text target_type text target_id boolean enabled interval poll_interval timestamptz next_poll_at } sentiment_hourly { uuid id PK text topic text platform timestamptz hour_bucket float avg_sentiment int post_count int positive_count int negative_count } computed_insights { uuid id PK text kind text scope_type text scope_id float value jsonb components text inputs_hash timestamptz computed_at timestamptz validity_until } prediction_events { text id PK text market_id text source text title float probability float volume text category timestamptz end_date } market_price_history { uuid id PK text symbol date bar_date float open float high float low float close bigint volume text source } prediction_signals { uuid id PK text source_type text source_name text signal_type text direction float magnitude float confidence text title text source_url jsonb metadata } social_follows ||--o{ social_posts : "generates" social_posts ||--o{ sentiment_hourly : "aggregates_to" news_archive ||--o{ prediction_signals : "feeds" sentiment_hourly ||--o{ computed_insights : "inputs_to" market_price_history ||--o{ computed_insights : "inputs_to"

Table Growth Estimates

TableDaily GrowthAnnual GrowthRetention
news_archive~140 rows~50K rowsIndefinite
social_posts~3K rows~1.1M rowsIndefinite
sentiment_hourly~200 rows~73K rows2+ years
computed_insights~500 rows~180K rowsPer validity_until
price_snapshots~2.8K rows~1M rows30 days
prediction_events~50 rows~18K rowsIndefinite

12. File Reference

Integration Files

File PathPurpose
packages/be/src/integrations/finnhub.tsFinnhub API client (quotes, sentiment, news, calendar)
packages/be/src/integrations/derivatives.tsVIX, funding rates, OI, liquidations
packages/be/src/integrations/polygon.tsPolygon.io FX OHLC + equity ticks
packages/be/src/integrations/binance-trades.tsBinance aggTrades (crypto per-trade)
packages/be/src/integrations/fred.tsFRED economic data (yields, SOFR, TIPS)
packages/be/src/integrations/platform-secrets.tsDB-first API key management

News Adapter Files

File PathSource
packages/be/src/news/adapters/gdelt.tsGDELT Project
packages/be/src/news/adapters/google-news-rss.tsGoogle News RSS
packages/be/src/news/adapters/newsdata.tsNewsData.io
packages/be/src/news/adapters/guardian.tsThe Guardian
packages/be/src/news/adapters/nyt.tsNew York Times
packages/be/src/news/adapters/finnhub-company.tsFinnhub Company News
packages/be/src/news/adapters/newsapi.tsNewsAPI

Social Adapter Files

File PathPlatform
packages/be/src/social/adapters/reddit.tsReddit
packages/be/src/social/adapters/twitter.tsTwitter/X
packages/be/src/social/adapters/discord.tsDiscord
packages/be/src/social/adapters/telegram.tsTelegram
packages/be/src/social/adapters/bluesky.tsBluesky
packages/be/src/social/adapters/rss.tsRSS Feeds

Scheduler Files

File PathPurpose
packages/be/src/scheduler/index.tsMain scheduler, job registration
packages/be/src/scheduler/news-collector.tsFinnhub news collection + LLM scoring
packages/be/src/scheduler/price-collector.tsPrice snapshot collection
packages/be/src/scheduler/sentiment-rollup.tsHourly sentiment aggregation

Enrichment Files

File PathPurpose
packages/be/src/social/entity-extractor.tsTicker/company/person extraction
packages/be/src/sentiment/llm-scorer.tsClaude Haiku sentiment scoring
packages/be/src/insights/divergence-engine.tsHuman vs bot classification
packages/be/src/insights/composite-score.tsWeighted composite score

Integration Flow Diagrams - Agencio Predict
Generated: 2026-05-19 | 40+ integrations mapped