Literature Readings · Prediction Markets · Reflexivity Framework v2
Reflexivity in Prediction Markets
Binary Framework + Empirical Channel Evidence + Matching-Based Identification
v2 framework · May 19, 2026 · 替换 4-tier 用 binary 0/1; integrates direct evidence on actor engagement
⚠️ v2 改了什么 (vs v1)
- L1 (info aggregation) 移除 — 它 trivially true for any active market, 不是 differentiator
- 4-tier (HIGH/MED/LOW/ZERO) 改成 binary (1/0)
- 结构性 classification 改成 consensus type (T0 mechanical / T1 overall / T2 individual / T3 team)
- Channel status (open/blocked) 是 empirical question, not theoretical
- 纳入 direct evidence: Trump 实际上 not engaged during campaign; Fed has zero documented FOMC engagement
1. The Right Question
用户 critique 切中: my 4-tier framework 把 "reflexivity 的 gradient" 当成 continuous, 但实际上 reflexivity 是 binary。 价格要么 affects outcome (1), 要么 不 (0)。 中间 "low / medium" 是 false precision。
L1 (information aggregation) — almost trivially true for any active market — 不是 differentiator。 The real question collapses to:
Does the belief held by people who read this market actually impact the outcome? Binary answer: 1 or 0.
2. Two-Layer Framework
Layer 1: Consensus type (structural — where outcome comes from)
| Type | Definition | Examples |
|---|---|---|
| T0. Mechanical | No human decides; physics/economy/biology determines | Weather, CPI, climate, earnings, disease cases, scientific measurements |
| T1. Overall consensus | Outcome aggregates many anonymous decisions | Elections, cultural trends, crypto spot prices, market prices |
| T2. Individual consensus | One specific person decides | Trump pardons, Powell resigns, CEO actions, celebrity decisions |
| T3. Team / small group | Small group decides | Fed FOMC, Senate, court rulings, board votes, Oscar voters, sports teams |
Layer 2: Channel status (binary — is the channel open?)
For each market in T1/T2/T3, channel is OPEN iff:
- Decision-maker(s) actually AWARE of the market — verifiable via citations / public statements
- Decision-makers don't face norm / institutional blocking (sports anti-betting rules, judicial shielding, etc.)
- Decision-makers show responsiveness to belief signals from market
Reflexivity = (consensus_type ≠ T0) AND (channel open)
Both conditions must hold. If T0 → reflexivity = 0 (no actor). If T1/T2/T3 但 channel blocked → reflexivity = 0 (actor exists but doesn't engage).
3. Empirical Bombshell — Most Channels Are Blocked
关键 insight: "channel open" 是 empirical question。 We can directly check via textual analysis of decision-makers' public statements。 Done this for major actor types:
🚨 Headline finding
Most assumed-reflexive markets are actually NOT reflexive because the decision-makers don't publicly engage with prediction markets during the relevant decision window.
This contradicts the common narrative (Wolfers/Hanson/Sethi-style discussions of "PolyMarket as focal point"). Direct evidence on actor engagement shows the focal-point story is largely empirically unsupported for the actors that would matter most。
3.1 Trump did NOT cite PolyMarket during 2024 campaign
Source: Web research May 2026 · Confidence: HIGH
Comprehensive search of Truth Social (trumpstruth.org), press conferences, speeches, and news archives 2023-2026. Zero documented instances of Trump naming "Polymarket", "Kalshi", or "prediction markets" before Nov 5, 2024.
First documented mention: March 22-23, 2026 phone interview with NYU's Max Raskin (published WaPo Mar 25, 2026) — "16 months after his win". Quote: PMs "predicted me pretty right ... by a landslide" but did not name Polymarket / Kalshi。
Implication: For "Will Trump do X" markets during 2024 campaign, channel was CLOSED. Trump-action market reflexivity claims (Mitts-Ofir 2026, etc.) need re-examination — the actor wasn't reading the market during the relevant decisions.
3.2 Fed has ZERO documented FOMC engagement with PMs
Source: Web research May 2026 · Confidence: HIGH
Search across FOMC minutes 2022-2026, Powell press conferences, Fed Chair speeches, governor speeches (Waller, Jefferson, Cook, Bowman, Kugler), and regional Fed bank publications (NY, SF, Chicago).
Only documented engagement: Diercks-Katz-Wright FEDS 2026-010 / NBER 34702 (Feb 2026) — staff research paper, explicit disclaimer that views do not necessarily reflect the FOMC. No FOMC principal has cited Kalshi / Polymarket in any public document.
Implication: Paper C (Fed-reads-Kalshi) test result is largely negative. Fed channel = blocked at institutional level. Diercks-Katz-Wright is one Fed researcher arguing for adoption — not evidence FOMC has adopted。
3.3 Where the channel IS open
Direct evidence supports open channels in only a few categories:
| Market | Channel evidence | Mechanism |
|---|---|---|
| Elections (2024 cycle) | Elon Musk repeatedly tweeted PolyMarket odds during 2024 campaign; media coverage | Surrogate amplification → voter / donor reach |
| Crypto BTC near expiration | Documented spot-price manipulation incentives (CME study); arbitrage flow | Settlement-price manipulation profitable |
| VP/Surrogate actions | JD Vance "Marco Polymarket bet" (Apr 2026); Don Jr. paid Kalshi advisor (Jan 2025+) | Direct family / advisor channel |
| CEO succession (some cases) | Anecdotal; board members track market signals | Public-confidence proxy |
Almost everything else: channel closed due to (a) actor not engaging publicly, or (b) institutional norms blocking, or (c) mechanical determination.
4. Binary Classification of All Market Types
| Market Type | Consensus Type | Channel Evidence | Reflexivity |
|---|---|---|---|
| Election outcomes | T1 Overall | Musk amplification documented; donor flow responsive to PolyMarket coverage | 1 |
| Crypto BTC near expiry | T1 Overall | Spot manipulation incentive at expiration; documented in crypto literature | 1 |
| Trump-action (pardons, nominations) | T2 Individual | ⚠️ NO documented engagement during decision-making window; first mention Mar 2026 (16 months post-election) | 0 ⚠️ |
| Fed FOMC decisions | T3 Team | ⚠️ Zero documented FOMC engagement; only Diercks-Katz-Wright staff paper | 0 ⚠️ |
| CEO resignation | T2 Individual | ~ Plausible but not systematically documented; board exposure variable | ? |
| Geopolitical (Iran, Maduro) | T2/T3 | ~ State actors may signal via PMs (Iran-strike whale case); limited evidence | ? |
| Senate confirmations | T3 Team | ✗ Senators don't systematically cite PMs | 0 |
| Sports — game outcomes | T3 Team | ✗ Blocked by anti-betting professional norms / contracts | 0 |
| Oscar winners | T3 Team | ✗ Voting secrecy norms; no documented voter cite | 0 |
| Court rulings | T3 Team | ✗ Judicial ethics; explicit shielding | 0 |
| Crypto ETF approval | T3 Team | ✗ SEC shielded from market signals | 0 |
| Mergers / acquisitions completion | T3 Team | ✗ Decisions locked in by agreements | 0 |
| Weather — temperature, snow | T0 Mechanical | N/A (no actor) | 0 |
| CPI / NFP / GDP prints | T0 Mechanical | N/A (economy mechanical) | 0 |
| Climate — hurricane season | T0 Mechanical | N/A | 0 |
| Earnings beat | T0 Mechanical | N/A (sales mechanical) | 0 |
| Disease cases | T0 Mechanical | N/A (biology mechanical) | 0 |
| Astronomical events | T0 Mechanical | N/A | 0 |
Striking conclusion: Of ~18 major market types, only 2 clearly reflexive (= 1), 2 questionable (?), and rest are 0. Reflexivity is the exception, not the rule. This is itself a major empirical finding.
5. Econometric Framework — Matching-Based Identification
5.1 Strategy: Matching + Heterogeneity + Falsification
Given we can't run field experiments on real-money platforms, use matched comparison + triangulation:
- Binary classify each market using consensus-type + channel-evidence rules above (Section 4 table)
- Match reflexive (=1) markets to non-reflexive (=0) within similar consensus types where possible; otherwise across types with stringent observable matching
- Compare microstructure / outcome metrics between matched pairs
- Heterogeneity within reflexive: stronger channel evidence → stronger reflexivity footprint
- Falsification: T0 (mechanical) markets — no channel possible — show zero reflexivity footprint regardless of salience / volume / etc.
5.2 Main specifications
Where Y is microstructure / outcome metric, Reflexive_m ∈ {0, 1}, X is matching variables (volume, time-to-resolution, salience). Run on propensity-score-matched sample.
Where ChannelStrength is a continuous measure of how documented the actor engagement is (citation count, Musk-amplification volume, etc.). Test: β > 0 for reflexive markets, β = 0 for non-reflexive.
For T0 (mechanical) markets, β should be ~0 regardless of salience. If β > 0 here, our matching is picking up generic attention effects, not reflexivity-specific.
5.3 Outcome variables Y
- Brier score — calibration of market vs realized outcome
- Late-stage price drift — does price systematically move toward focal points (50%, 90%) near resolution?
- Wash trading share — using Sirolly-Sethi style detection; higher in reflexive markets (manipulation incentive)
- Spread / liquidity — bid-ask asymmetries at behavioral thresholds
- Price-volume elasticity — different curvature in reflexive vs non-reflexive
- Tail behavior — fat tails in dp/dt distribution suggesting feedback amplification
5.4 Matching variables X
- Log cumulative volume at time of measurement
- Time-to-resolution (days)
- Number of distinct traders (proxy via on-chain wallet count for PolyMarket)
- Salience: Google Trends + news mentions (NewsBank / GDELT)
- Platform (Kalshi vs PolyMarket)
- Topic category (within same category preferred)
- Time period (FE)
6. Data Preparation Plan (Realistic)
6.1 PolyMarket data — Cong dataset access status
Status: ⚠️ Not directly downloadable
Per arXiv 2604.20421, Cong et al. dataset is described in paper but no bulk download URL published. Only access: web UI at polymonitor.club (JS SPA, requires browser inspection for backend API).
Realistic options:
- Email corresponding author Huaiyu Jia (
hjia351@connect.hkust-gz.edu.cn) requesting bulk export - Use the alternative SII-WANGZJ/Polymarket_data on HuggingFace (107GB, 1.1B records — different group but comparable scale)
- Self-collect via Polygon RPC + The Graph subgraph (most rigorous, slower)
- Use Dune Analytics SQL access (subset for development)
6.2 Kalshi data
Status: Academic API available — apply Day 1
Email research@kalshi.com with institutional affiliation + research statement. Typical 2-4 week turnaround. Backup: public market data via web scraping (rate-limited).
What we need: All market-level OHLCV + (if granted) trade-level data 2022-present. Particularly Weather, CPI/NFP/GDP, Sports for T0/T3 baseline; Politics post-Oct 2024 for T1; Fed FOMC for T3 test.
6.3 Actor engagement evidence (textual analysis)
Status: ✅ Verified available + preliminary results in
Already done preliminary searches (May 2026):
- Trump Truth Social: trumpstruth.org searchable archive. Zero pre-Nov-2024 PM mentions. First mention Mar 2026.
- FOMC minutes: federalreserve.gov public, all years. Zero "Polymarket / Kalshi" mentions.
- Fed speeches: federalreserve.gov public. Zero.
- Musk tweets: archive.org Musk X archive. Multiple PolyMarket mentions documented during 2024 campaign.
- Don Jr. (Kalshi advisor since Jan 2025): public statements documented.
- Other actors (Senate, judges, CEOs): spot-check shows minimal engagement; can systematize.
6.4 Salience data (matching variables)
Status: ✅ Easily accessible
- GDELT (free) — news event tracking per topic per day
- Google Trends (free) — search interest per query
- NewsBank (academic) — newspaper article search
- Twitter / X archive — for major posts referencing PM markets
6.5 Realistic timeline
| Phase | Week | Output |
|---|---|---|
| Apply Kalshi API + email Cong group | 1 | Access requests submitted |
| Set up PolyMarket alternative pipeline (SII-WANGZJ or Polygon RPC) | 1-2 | Working PolyMarket data pull |
| Textual analysis: systematize actor engagement evidence | 2 | Channel-open / blocked classification per market |
| Salience data: GDELT + Google Trends per market | 3 | Matching variables ready |
| Binary classification of all markets | 3 | Reflexive {0,1} per market |
| Propensity score matching | 4 | Matched sample for analysis |
| Main specs + heterogeneity + falsification | 5-6 | First-pass results |
| Write up | 7-8 | Working paper draft |
7. The Paper
7.1 Title
"Reflexivity in Prediction Markets: A Binary Framework and Empirical Evidence Against the Focal-Point Hypothesis"
7.2 Story
Hypothesis under test: PolyMarket / Kalshi prices causally affect outcomes via decision-maker awareness (Nechepurenko 2026 "focal point" hypothesis; Hanson manipulability framework; common policy narrative around election integrity)。
Our two main contributions:
- Conceptual: Reflexivity is binary, not gradient. Classify by (consensus-type, channel-status). Mechanical markets fail by L4 structure; norm-blocked markets fail by institutional rules; low-engagement markets fail by empirical channel-closure.
- Empirical: Direct textual evidence shows most assumed-reflexive markets have closed channels. Trump did not engage with PolyMarket during 2024 decisions. FOMC has zero documented engagement. The focal-point hypothesis is empirically overstated.
7.3 Section structure
- § 1 — Introduction (binary framing vs gradient framing)
- § 2 — Conceptual framework (consensus type + channel status)
- § 3 — Empirical channel evidence (Trump non-engagement; FOMC non-engagement; sports norm-block; mechanical determination)
- § 4 — Binary classification table (all market types)
- § 5 — Matched microstructure comparison (reflexive vs non-reflexive)
- § 6 — Heterogeneity within reflexive (channel-strength gradient)
- § 7 — Falsification on T0 markets (mechanical baseline)
- § 8 — Discussion: implications for regulation, manipulation, market design
7.4 Target journals
Top finance: J Finance / RFS / JFE — empirical microstructure + a strong policy claim
Management Science: regulatory framework angle
AEJ Applied: clean empirical with surprising negative results
AEJ Policy: 适合 "reflexivity is overstated" policy implication
8. Honest Limits — What Matching Can & Can't Do
⚠️ Matching alone won't prove causality
Matching on observable characteristics 后, residual differences could be reflexivity OR unobserved confounders (topic complexity, trader pool composition, behavioral biases differ across market types).
We protect against this via:
- Within-type matching when possible (compare reflexive T1 to non-reflexive T1 if any exist)
- Heterogeneity within reflexive — channel-strength gradient is harder to attribute to confounders
- T0 falsification — mechanical markets have no reflexivity by construction; if matched differences persist with T0 vs others, our identification is biased
- Direct evidence as primary identification — textual evidence of channel-status is the cleanest piece
8.1 Better identification approaches for the future
- Field experiment on Manifold — extend Rasooly-Rozzi (2025) to test reflexivity specifically (manipulate prices, measure outcome change). Play money but cleanest causal ID. Possible companion paper.
- Within-actor shock events — if a specific politician starts citing PMs at a known date, before/after comparison for their decisions. Limited by sample size.
- Regulatory natural experiments — CFTC ruling Oct 2, 2024 enabled Kalshi election markets. Pre vs post within Kalshi politics gives some leverage (but main treatment is actor reach, not market existence).
- Trump's Mar 2026 Raskin interview — Trump finally engaged with PMs publicly. Trump-action markets pre vs post Mar 2026 → potential reflexivity activation test.
8.2 The honest bottom line
Matching is good enough to publish a strong paper if:
- We frame it as "establishing the binary classification + measuring footprints", not "proving causal reflexivity"
- The direct textual evidence on channel status (Trump, Fed, etc.) carries most of the empirical weight
- Matched microstructure comparison is positioned as "consistent with the binary classification", not "proves reflexivity"
- T0 falsification + within-reflexive heterogeneity provide secondary identification
This is honest, defensible, and publishable. The headline finding "most reflexivity claims are empirically unsupported" is novel and surprising enough for a strong submission target.
Reflexivity Framework v2 · May 19, 2026 · Replaces v1 4-tier framework