Literature Readings · Prediction Markets · Reflexivity Framework v2

Reflexivity in Prediction Markets

Binary Framework + Empirical Channel Evidence + Matching-Based Identification

v2 framework · May 19, 2026 · 替换 4-tier 用 binary 0/1; integrates direct evidence on actor engagement

⚠️ v2 改了什么 (vs v1)

L1 (info aggregation) 移除 — 它 trivially true for any active market, 不是 differentiator
4-tier (HIGH/MED/LOW/ZERO) 改成 binary (1/0)
结构性 classification 改成 consensus type (T0 mechanical / T1 overall / T2 individual / T3 team)
Channel status (open/blocked) 是 empirical question, not theoretical
纳入 direct evidence: Trump 实际上 not engaged during campaign; Fed has zero documented FOMC engagement

1. The Right Question

用户 critique 切中: my 4-tier framework 把 "reflexivity 的 gradient" 当成 continuous, 但实际上 reflexivity 是 binary。价格要么 affects outcome (1), 要么不 (0)。中间 "low / medium" 是 false precision。

L1 (information aggregation) — almost trivially true for any active market — 不是 differentiator。 The real question collapses to:

Does the belief held by people who read this market actually impact the outcome? Binary answer: 1 or 0.

2. Two-Layer Framework

Layer 1: Consensus type (structural — where outcome comes from)

Type	Definition	Examples
T0. Mechanical	No human decides; physics/economy/biology determines	Weather, CPI, climate, earnings, disease cases, scientific measurements
T1. Overall consensus	Outcome aggregates many anonymous decisions	Elections, cultural trends, crypto spot prices, market prices
T2. Individual consensus	One specific person decides	Trump pardons, Powell resigns, CEO actions, celebrity decisions
T3. Team / small group	Small group decides	Fed FOMC, Senate, court rulings, board votes, Oscar voters, sports teams

Layer 2: Channel status (binary — is the channel open?)

For each market in T1/T2/T3, channel is OPEN iff:

Decision-maker(s) actually AWARE of the market — verifiable via citations / public statements
Decision-makers don't face norm / institutional blocking (sports anti-betting rules, judicial shielding, etc.)
Decision-makers show responsiveness to belief signals from market

Reflexivity = (consensus_type ≠ T0) AND (channel open)

Both conditions must hold. If T0 → reflexivity = 0 (no actor). If T1/T2/T3 但 channel blocked → reflexivity = 0 (actor exists but doesn't engage).

3. Empirical Bombshell — Most Channels Are Blocked

关键 insight: "channel open" 是 empirical question。 We can directly check via textual analysis of decision-makers' public statements。 Done this for major actor types:

🚨 Headline finding

Most assumed-reflexive markets are actually NOT reflexive because the decision-makers don't publicly engage with prediction markets during the relevant decision window.

This contradicts the common narrative (Wolfers/Hanson/Sethi-style discussions of "PolyMarket as focal point"). Direct evidence on actor engagement shows the focal-point story is largely empirically unsupported for the actors that would matter most。

3.1 Trump did NOT cite PolyMarket during 2024 campaign

Source: Web research May 2026 · Confidence: HIGH

Comprehensive search of Truth Social (trumpstruth.org), press conferences, speeches, and news archives 2023-2026. Zero documented instances of Trump naming "Polymarket", "Kalshi", or "prediction markets" before Nov 5, 2024.

First documented mention: March 22-23, 2026 phone interview with NYU's Max Raskin (published WaPo Mar 25, 2026) — "16 months after his win". Quote: PMs "predicted me pretty right ... by a landslide" but did not name Polymarket / Kalshi。

Implication: For "Will Trump do X" markets during 2024 campaign, channel was CLOSED. Trump-action market reflexivity claims (Mitts-Ofir 2026, etc.) need re-examination — the actor wasn't reading the market during the relevant decisions.

3.2 Fed has ZERO documented FOMC engagement with PMs

Source: Web research May 2026 · Confidence: HIGH

Search across FOMC minutes 2022-2026, Powell press conferences, Fed Chair speeches, governor speeches (Waller, Jefferson, Cook, Bowman, Kugler), and regional Fed bank publications (NY, SF, Chicago).

Only documented engagement: Diercks-Katz-Wright FEDS 2026-010 / NBER 34702 (Feb 2026) — staff research paper, explicit disclaimer that views do not necessarily reflect the FOMC. No FOMC principal has cited Kalshi / Polymarket in any public document.

Implication: Paper C (Fed-reads-Kalshi) test result is largely negative. Fed channel = blocked at institutional level. Diercks-Katz-Wright is one Fed researcher arguing for adoption — not evidence FOMC has adopted。

3.3 Where the channel IS open

Direct evidence supports open channels in only a few categories:

Market	Channel evidence	Mechanism
Elections (2024 cycle)	Elon Musk repeatedly tweeted PolyMarket odds during 2024 campaign; media coverage	Surrogate amplification → voter / donor reach
Crypto BTC near expiration	Documented spot-price manipulation incentives (CME study); arbitrage flow	Settlement-price manipulation profitable
VP/Surrogate actions	JD Vance "Marco Polymarket bet" (Apr 2026); Don Jr. paid Kalshi advisor (Jan 2025+)	Direct family / advisor channel
CEO succession (some cases)	Anecdotal; board members track market signals	Public-confidence proxy

Almost everything else: channel closed due to (a) actor not engaging publicly, or (b) institutional norms blocking, or (c) mechanical determination.

4. Binary Classification of All Market Types

Market Type	Consensus Type	Channel Evidence	Reflexivity
Election outcomes	T1 Overall	Musk amplification documented; donor flow responsive to PolyMarket coverage	1
Crypto BTC near expiry	T1 Overall	Spot manipulation incentive at expiration; documented in crypto literature	1
Trump-action (pardons, nominations)	T2 Individual	⚠️ NO documented engagement during decision-making window; first mention Mar 2026 (16 months post-election)	0 ⚠️
Fed FOMC decisions	T3 Team	⚠️ Zero documented FOMC engagement; only Diercks-Katz-Wright staff paper	0 ⚠️
CEO resignation	T2 Individual	~ Plausible but not systematically documented; board exposure variable	?
Geopolitical (Iran, Maduro)	T2/T3	~ State actors may signal via PMs (Iran-strike whale case); limited evidence	?
Senate confirmations	T3 Team	✗ Senators don't systematically cite PMs	0
Sports — game outcomes	T3 Team	✗ Blocked by anti-betting professional norms / contracts	0
Oscar winners	T3 Team	✗ Voting secrecy norms; no documented voter cite	0
Court rulings	T3 Team	✗ Judicial ethics; explicit shielding	0
Crypto ETF approval	T3 Team	✗ SEC shielded from market signals	0
Mergers / acquisitions completion	T3 Team	✗ Decisions locked in by agreements	0
Weather — temperature, snow	T0 Mechanical	N/A (no actor)	0
CPI / NFP / GDP prints	T0 Mechanical	N/A (economy mechanical)	0
Climate — hurricane season	T0 Mechanical	N/A	0
Earnings beat	T0 Mechanical	N/A (sales mechanical)	0
Disease cases	T0 Mechanical	N/A (biology mechanical)	0
Astronomical events	T0 Mechanical	N/A	0

Striking conclusion: Of ~18 major market types, only 2 clearly reflexive (= 1), 2 questionable (?), and rest are 0. Reflexivity is the exception, not the rule. This is itself a major empirical finding.

5. Econometric Framework — Matching-Based Identification

5.1 Strategy: Matching + Heterogeneity + Falsification

Given we can't run field experiments on real-money platforms, use matched comparison + triangulation:

Binary classify each market using consensus-type + channel-evidence rules above (Section 4 table)
Match reflexive (=1) markets to non-reflexive (=0) within similar consensus types where possible; otherwise across types with stringent observable matching
Compare microstructure / outcome metrics between matched pairs
Heterogeneity within reflexive: stronger channel evidence → stronger reflexivity footprint
Falsification: T0 (mechanical) markets — no channel possible — show zero reflexivity footprint regardless of salience / volume / etc.

5.2 Main specifications

spec 1 — matched difference Y_m = α + β · Reflexive_m + γ · X_m + ε_m

Where Y is microstructure / outcome metric, Reflexive_m ∈ {0, 1}, X is matching variables (volume, time-to-resolution, salience). Run on propensity-score-matched sample.

spec 2 — heterogeneity within reflexive Y_m = α + β · ChannelStrength_m · Reflexive_m + γ · X_m + ε_m

Where ChannelStrength is a continuous measure of how documented the actor engagement is (citation count, Musk-amplification volume, etc.). Test: β > 0 for reflexive markets, β = 0 for non-reflexive.

spec 3 — falsification on T0 Y_m,T0 = α + β · Salience_m + γ · X_m + ε_m

For T0 (mechanical) markets, β should be ~0 regardless of salience. If β > 0 here, our matching is picking up generic attention effects, not reflexivity-specific.

5.3 Outcome variables Y

Brier score — calibration of market vs realized outcome
Late-stage price drift — does price systematically move toward focal points (50%, 90%) near resolution?
Wash trading share — using Sirolly-Sethi style detection; higher in reflexive markets (manipulation incentive)
Spread / liquidity — bid-ask asymmetries at behavioral thresholds
Price-volume elasticity — different curvature in reflexive vs non-reflexive
Tail behavior — fat tails in dp/dt distribution suggesting feedback amplification

5.4 Matching variables X

Log cumulative volume at time of measurement
Time-to-resolution (days)
Number of distinct traders (proxy via on-chain wallet count for PolyMarket)
Salience: Google Trends + news mentions (NewsBank / GDELT)
Platform (Kalshi vs PolyMarket)
Topic category (within same category preferred)
Time period (FE)

6. Data Preparation Plan (Realistic)

6.1 PolyMarket data — Cong dataset access status

Status: ⚠️ Not directly downloadable

Per arXiv 2604.20421, Cong et al. dataset is described in paper but no bulk download URL published. Only access: web UI at polymonitor.club (JS SPA, requires browser inspection for backend API).

Realistic options:

Email corresponding author Huaiyu Jia (hjia351@connect.hkust-gz.edu.cn) requesting bulk export
Use the alternative SII-WANGZJ/Polymarket_data on HuggingFace (107GB, 1.1B records — different group but comparable scale)
Self-collect via Polygon RPC + The Graph subgraph (most rigorous, slower)
Use Dune Analytics SQL access (subset for development)

6.2 Kalshi data

Status: Academic API available — apply Day 1

Email research@kalshi.com with institutional affiliation + research statement. Typical 2-4 week turnaround. Backup: public market data via web scraping (rate-limited).

What we need: All market-level OHLCV + (if granted) trade-level data 2022-present. Particularly Weather, CPI/NFP/GDP, Sports for T0/T3 baseline; Politics post-Oct 2024 for T1; Fed FOMC for T3 test.

6.3 Actor engagement evidence (textual analysis)

Status: ✅ Verified available + preliminary results in

Already done preliminary searches (May 2026):

Trump Truth Social: trumpstruth.org searchable archive. Zero pre-Nov-2024 PM mentions. First mention Mar 2026.
FOMC minutes: federalreserve.gov public, all years. Zero "Polymarket / Kalshi" mentions.
Fed speeches: federalreserve.gov public. Zero.
Musk tweets: archive.org Musk X archive. Multiple PolyMarket mentions documented during 2024 campaign.
Don Jr. (Kalshi advisor since Jan 2025): public statements documented.
Other actors (Senate, judges, CEOs): spot-check shows minimal engagement; can systematize.

6.4 Salience data (matching variables)

Status: ✅ Easily accessible

GDELT (free) — news event tracking per topic per day
Google Trends (free) — search interest per query
NewsBank (academic) — newspaper article search
Twitter / X archive — for major posts referencing PM markets

6.5 Realistic timeline

Phase	Week	Output
Apply Kalshi API + email Cong group	1	Access requests submitted
Set up PolyMarket alternative pipeline (SII-WANGZJ or Polygon RPC)	1-2	Working PolyMarket data pull
Textual analysis: systematize actor engagement evidence	2	Channel-open / blocked classification per market
Salience data: GDELT + Google Trends per market	3	Matching variables ready
Binary classification of all markets	3	Reflexive {0,1} per market
Propensity score matching	4	Matched sample for analysis
Main specs + heterogeneity + falsification	5-6	First-pass results
Write up	7-8	Working paper draft

7. The Paper

7.1 Title

"Reflexivity in Prediction Markets: A Binary Framework and Empirical Evidence Against the Focal-Point Hypothesis"

7.2 Story

Hypothesis under test: PolyMarket / Kalshi prices causally affect outcomes via decision-maker awareness (Nechepurenko 2026 "focal point" hypothesis; Hanson manipulability framework; common policy narrative around election integrity)。

Our two main contributions:

Conceptual: Reflexivity is binary, not gradient. Classify by (consensus-type, channel-status). Mechanical markets fail by L4 structure; norm-blocked markets fail by institutional rules; low-engagement markets fail by empirical channel-closure.
Empirical: Direct textual evidence shows most assumed-reflexive markets have closed channels. Trump did not engage with PolyMarket during 2024 decisions. FOMC has zero documented engagement. The focal-point hypothesis is empirically overstated.

7.3 Section structure

§ 1 — Introduction (binary framing vs gradient framing)
§ 2 — Conceptual framework (consensus type + channel status)
§ 3 — Empirical channel evidence (Trump non-engagement; FOMC non-engagement; sports norm-block; mechanical determination)
§ 4 — Binary classification table (all market types)
§ 5 — Matched microstructure comparison (reflexive vs non-reflexive)
§ 6 — Heterogeneity within reflexive (channel-strength gradient)
§ 7 — Falsification on T0 markets (mechanical baseline)
§ 8 — Discussion: implications for regulation, manipulation, market design

7.4 Target journals

Top finance: J Finance / RFS / JFE — empirical microstructure + a strong policy claim
Management Science: regulatory framework angle
AEJ Applied: clean empirical with surprising negative results
AEJ Policy: 适合 "reflexivity is overstated" policy implication

8. Honest Limits — What Matching Can & Can't Do

⚠️ Matching alone won't prove causality

Matching on observable characteristics 后, residual differences could be reflexivity OR unobserved confounders (topic complexity, trader pool composition, behavioral biases differ across market types).

We protect against this via:

Within-type matching when possible (compare reflexive T1 to non-reflexive T1 if any exist)
Heterogeneity within reflexive — channel-strength gradient is harder to attribute to confounders
T0 falsification — mechanical markets have no reflexivity by construction; if matched differences persist with T0 vs others, our identification is biased
Direct evidence as primary identification — textual evidence of channel-status is the cleanest piece

8.1 Better identification approaches for the future

Field experiment on Manifold — extend Rasooly-Rozzi (2025) to test reflexivity specifically (manipulate prices, measure outcome change). Play money but cleanest causal ID. Possible companion paper.
Within-actor shock events — if a specific politician starts citing PMs at a known date, before/after comparison for their decisions. Limited by sample size.
Regulatory natural experiments — CFTC ruling Oct 2, 2024 enabled Kalshi election markets. Pre vs post within Kalshi politics gives some leverage (but main treatment is actor reach, not market existence).
Trump's Mar 2026 Raskin interview — Trump finally engaged with PMs publicly. Trump-action markets pre vs post Mar 2026 → potential reflexivity activation test.

8.2 The honest bottom line

Matching is good enough to publish a strong paper if:

We frame it as "establishing the binary classification + measuring footprints", not "proving causal reflexivity"
The direct textual evidence on channel status (Trump, Fed, etc.) carries most of the empirical weight
Matched microstructure comparison is positioned as "consistent with the binary classification", not "proves reflexivity"
T0 falsification + within-reflexive heterogeneity provide secondary identification

This is honest, defensible, and publishable. The headline finding "most reflexivity claims are empirically unsupported" is novel and surprising enough for a strong submission target.

Reflexivity Framework v2 · May 19, 2026 · Replaces v1 4-tier framework

← Lit Review ← Paper B (v1 archived)