Literature Readings · Prediction Markets · Paper B Proposal

Self-Fulfilling Prophecy in Prediction Markets

PolyMarket Pricing, Donor Flows, Campaign Strategy & Voter Turnout in the 2024 US Election

Working draft research proposal · May 19, 2026 · Does the market price cause the political outcome?

📋 Executive Summary

Paper A asks whether asset prices respond to PolyMarket. Paper B asks the deeper question: does PolyMarket pricing CAUSE real political outcomes? Did the perception of Trump's near-certainty in Oct-Nov 2024 (PolyMarket Trump > 60%) cause donor flight from Harris, depress Democratic turnout, and shift Republican campaign strategy?

Three causal channels: (i) donor flow via FEC daily filings, (ii) candidate strategic behavior via Vivvix TV ad data and campaign-stop logs, (iii) voter turnout via county-level outcomes interacted with PolyMarket media coverage variation. Identification combines (a) whale-induced PolyMarket shifts as IV for "effective Trump probability" perceived by political actors, (b) DiD on media markets that prominently covered PolyMarket vs didn't, (c) salience-timing shocks (debate, assassination, dropout).

The political-economy stakes: if PolyMarket pricing causally affects election outcomes, that's a first-order policy concern. Election integrity lawsuits could cite. The Slate "Trump's America" framing of the Maduro case is exactly the kind of high-political-stakes scenario top-5 editors love. Target: AER / QJE / JPE. Timeline: 12 months post Paper A data infrastructure. Uses ~80% of Paper A's pipeline.

1. Research Question

When PolyMarket and Kalshi prices move, do political actors respond, and does this aggregate response affect electoral outcomes?

Three sub-questions, one per channel:

Q1 (Donor Channel). Does a 1-pp increase in PolyMarket Trump probability cause: (i) increased Trump-PAC daily donations, (ii) decreased Harris-PAC donations, (iii) shift in down-ballot money toward winning vs losing party?

Q2 (Candidate Strategic Channel). Does PolyMarket pricing cause: (i) reallocation of TV ad dollars across DMAs/states, (ii) shifts in candidate campaign-stop frequency and location, (iii) endorsement timing for vulnerable down-ballot candidates?

Q3 (Voter Turnout Channel). In media markets where PolyMarket pricing was prominently reported, did turnout differ in 2024 (vs 2020) compared to media markets where PolyMarket was barely covered? Mobilization vs depression effects?

The single headline statistic we are after:

A quantified "PolyMarket reflexivity coefficient" — the causal partial-derivative of donor flows / ad spending / turnout with respect to PolyMarket Trump probability, holding contemporaneous polls and news constant. The headline-grabbing claim: "a 10-pp PolyMarket Trump-favoring shock causes X% reduction in Harris donor flows, Y% shift in swing-state ad spending, Z-pp change in turnout differential."

2. Positioning vs Existing Literature

Paper / Strand	What they did	What we add
Nechepurenko arXiv 2604.24147 "Price as Focal Point" (Apr 2026)	Theoretical frame: PMs as coordination devices producing common knowledge. Argues "social force" varies with persistence + trader breadth + cross-platform consensus.	The empirical complement. Nechepurenko opens the theoretical question; we close the empirical one with causal identification.
Della Vedova SSRN 6191618 (Feb 2026)	Documents PolyMarket whales make ~$133M via execution, not information. Reframes "wisdom of crowds."	Companion question: do those whale-driven moves CAUSE downstream effects on actual political behavior?
Self-fulfilling-prophecy theory (Morris-Shin 1998, global games tradition)	Theoretical models of how a market price can become a focal coordinating device, with comparative statics across information regimes.	First empirical implementation at scale ($5B 2024 election data).
Campaign finance literature (Ansolabehere-Snyder, Bombardini-Trebbi)	Studies of how electoral expectations affect donor behavior, but typically using polls or fundamentals as the expectations variable.	Substitute PolyMarket prices — much higher frequency, market-based, with on-chain identification of exogenous shocks.
Bandwagon/turnout literature (Bartels 1988, McAllister 2007)	Mixed evidence on whether early-leading candidates gain from bandwagon effects vs lose from voter complacency.	First clean test using exogenous shocks to perceived front-runner probability via prediction-market mechanism.
Snowberg-Wolfers-Zitzewitz QJE 2007	Showed that election-market moves CAUSALLY shifted equity valuations.	We extend: do election-market moves causally shift the ELECTION ITSELF (donors, ads, turnout)?
Wesleyan Media Project (Fowler, Franz, Ridout)	Real-time TV ad data + creative content coding for political races.	Use as outcome variable. Match minute-level Vivvix data to daily PolyMarket prices.
Gentzkow-Shapiro media-effects literature	Standard ID strategy: variation in media availability + content across markets to identify media effects on political behavior.	Use same ID logic but apply to PolyMarket coverage variation (Nielsen DMAs × Newsbank PolyMarket mentions).

One-paragraph positioning: This is the empirical complement to Nechepurenko's "Price as Focal Point" framing. It also extends Snowberg-Wolfers-Zitzewitz 2007 in a fundamentally different direction than Paper A: instead of "do asset prices respond," we ask "does the election itself respond?" The methodological combo — Wesleyan Media Project + FEC daily + PolyMarket on-chain + IV — is novel in political-economy empirics.

3. Data Architecture

3.1 PolyMarket data (from Paper A pipeline)

Same Cong et al. dataset + Polygon on-chain feed used in Paper A. Daily aggregate Trump probability time series, plus identification of whale-induced shifts via Tsang-Yang + Chainalysis + Sirolly-Sethi.

3.2 FEC daily donations (free, public)

FEC.gov downloads Form 3 filings on rolling basis. We construct:

~$5B	total 2024 federal political donations
~50M	individual contribution records (post-aggregation)
Daily resolution	by filing date, per candidate, per PAC
Geographic	zip-code-level donor location
Cross-cycle compare	2020 + 2016 + 2024 for historical baselines

3.3 Vivvix / Wesleyan Media Project (TV ad spending)

Vivvix (formerly Kantar/CMAG) tracks essentially all TV political ads. Wesleyan Media Project does the academic data licensing + creative content coding.

~5M+	ads broadcast in 2024 cycle
DMA-level	spending by candidate per Nielsen DMA per week (or day)
Creative content	WMP code-classified by topic (economy, immigration, abortion, etc.)
Cost	~$3-5K academic license; reasonable for top-5 paper

3.4 Campaign-stop logs

Manually compiled from Trump / Harris campaign-press-pool reports + media coverage. ~200 stops per candidate over Sep-Nov 2024. State + city + date resolution.

3.5 Voter turnout

CountyState.com aggregated certified turnout 2024 + 2020 + 2016. ~3,100 US counties. Compare turnout differential to county-level PolyMarket media exposure.

3.6 PolyMarket media coverage measurement

The KEY identifying variation. Three measures:

Newsbank / NexisUni: count of newspaper articles mentioning "Polymarket" or "Kalshi" by week × DMA. Major papers per DMA (NYT in NY DMA, LA Times in LA DMA, etc.)
Nielsen TV monitoring: mentions of PolyMarket on cable + local TV news, weighted by viewership
Google Trends: search interest for "Polymarket" by DMA week-by-week (sanity check for awareness measure)

3.7 Controls

Daily polls (538 + Silver Bulletin aggregates) as control for "actual" Trump probability. Bloomberg news count for general news flow control. RealClearPolitics + NateSilver state polls for state-level baseline.

4. Identification Strategy

The fundamental endogeneity: news → both PolyMarket prices AND political outcomes. Polls also move both. We address with three identification strategies, two adapted from Paper A and one new (media-coverage DiD).

4.1 Whale-induced exogenous PolyMarket shifts as IV

Identical to Paper A §4.2 — use whale-trade timing as IV for PolyMarket price moves that political actors observe.

first stage ΔP_PM,t = π · WhaleTrade_t + δ · X_t + ν_t

second stage (Q1 donor) log(Donations_c,t) = α_c + λ_t + β_c · ΔP̂_PM,t + γ_c · X_c,t + ε_c,t

Exclusion restriction: Whale trades affect donor decisions only through PolyMarket prices, not through any direct channel (whales don't separately call donors). This is strong because whale trades are typically not reported in mainstream media until WSJ identified Théo in Nov 2024 — and we focus on whale events before the WSJ revelation.

4.2 Media-coverage DiD

A different identification angle. Some DMAs got heavy PolyMarket coverage (NYT-dominant markets); others got minimal coverage (rural local-news-dominant markets). Use this variation as treatment intensity.

DiD spec Y_d,t − Y_d,baseline = α + β · PMCoverage_d × Post_t + γ_d + λ_t + ε_d,t

where Y_d,t is outcome (donations, ads, turnout) in DMA d at time t, PMCoverage_d is pre-determined PolyMarket coverage intensity in DMA d, and Post_t indicates the post-debate (post-Jun 27) salience-shift period.

Parallel trends assumption: Pre-debate, high-PMCoverage and low-PMCoverage DMAs trend in donations / ads / turnout similarly. We test using 2023 placebo period.

4.3 Salience-timing shocks

PolyMarket became politically salient in distinct phases: (i) pre-Iowa caucus (low), (ii) post-debate June 27 (high), (iii) post-assassination July 13 (very high), (iv) election eve (peak). Use these salience shifts to identify the "PM-aware" period vs "PM-unaware" period.

Standard interrupted-time-series + event-study DiD where the events are exogenous salience shocks rather than news shocks.

Triangulation: If all three identification strategies yield similar β̂ estimates, we have unusually strong evidence. Each makes a different identifying assumption — whale-IV assumes exclusion through PM prices only; DiD assumes parallel trends across DMA exposure; salience-timing assumes the time-of-shift is exogenous. Convergence is reassuring.

5. Main Specifications

5.1 Donor Channel (Q1)

spec 1a — candidate log(Donations_c,t) = α_c + λ_t + β_c · PM_t + γ · Poll_t + δ · X_c,t + ε_c,t

where c ∈ {Trump, Harris, key down-ballot candidates}, t = day, PM_t is daily PolyMarket Trump probability, Poll_t is Silver Bulletin average. Hypothesis: β_Trump > 0, β_Harris < 0 (donors flow toward winners).

spec 1b — donor type log(Donations_c,d,t) = α_c,d + λ_t + β_c,d · PM_t + ...

where d ∈ {small individual <$200, large individual ≥$200, PAC} — test whether the bandwagon effect is stronger for small donors (less informed, more bandwagon-prone) vs large donors (more strategic).

5.2 Campaign Strategic Channel (Q2)

spec 2a — TV ads log(AdSpend_c,m,t) = α_c,m + λ_t + β_c,m · PM_t + δ · X + ε

where m = Nielsen DMA (210 total). Hypothesis: Rational allocation → β_Trump,swing > 0 (Trump pours more into swing states when winning), β_Harris,swing ambiguous (Harris may withdraw from hopeless DMAs OR concentrate on remaining ones).

spec 2b — campaign stops P(CampaignStop_c,m,t=1) = Λ(α + β · PM_t + γ · SwingState_m + ...)

Logit / Poisson model of campaign-stop frequency.

5.3 Voter Turnout Channel (Q3)

spec 3 — county turnout DiD ΔTurnout_c,2024 = α + β · PMCoverage_c + γ · X_c + ε_c

where ΔTurnout_c,2024 = Turnout_c,2024 − Turnout_c,2020, county-level, and PMCoverage_c is PolyMarket media coverage intensity at the county's DMA. Hypothesis: Ambiguous sign — bandwagon effect (β > 0 for Republican-leaning counties) vs complacency effect (β < 0 if voters think outcome is decided). Net effect is an empirical question.

5.4 Sub-sample tests

Run all specs separately on:

Pre-Sep 10 debate (Trump-Harris equilibrium) vs post-Sep 10 (Trump favored)
Swing states (AZ, GA, MI, NC, PA, NV, WI) vs safe states
By media-environment (cable-heavy DMAs vs newspaper-heavy)
By age cohort (younger donors / voters more PolyMarket-aware)

6. Mechanism Tests

Distinguish between competing accounts of WHY PolyMarket affects political behavior:

Mechanism	Test	Distinguishing prediction
Common-knowledge / focal-point (Nechepurenko 2026; Morris-Shin)	Effect should be stronger in DMAs with HIGHER PolyMarket coverage (signal propagation requires media)	β interacted with PMCoverage_d should be larger
Substitute-for-polls (PolyMarket as superior poll aggregator)	Effect should be similar before/after Polls become noisy (pre vs post-debate)	If true, β stable across regimes
Bandwagon (Bartels 1988)	Effect should be strong specifically for low-information donors / undecided voters	β_{small donor} > β_{large donor}; β_{marginal voter} > β_committed
Strategic / rational (campaigns reallocating)	Effect on AdSpend should follow optimal-allocation theory: more in winnable, less in lost	Nonlinear response: ↑ in marginal states, ↓ in safe states
Complacency / depression (early-leader hypothesis)	Effect on turnout should be NEGATIVE for the winning side's high-confidence supporters	β_{turnout, Trump-leaning, post-Sep-10} < 0

7. Robustness

Alternative PolyMarket variable: use Kalshi probability instead, or median of PM/Kalshi/IEM
Alternative donor measure: dollar amount vs count of donations; per-zip-code vs per-state aggregation
Alternative coverage measure: Newsbank vs Nielsen vs Google Trends as PMCoverage_d
Lag structures: contemporaneous vs 1-day, 3-day, 7-day lags on the PM → outcome regression
Including / excluding extreme events: drop assassination attempt, drop election eve
Outlier-robust regressions: trim 1% tails, M-estimators
State-level fixed effects vs DMA-level vs county-level
Standard errors: two-way cluster by DMA and week; block bootstrap; Romano-Wolf for multiple hypothesis adjustment

8. Falsification

F1 — Pre-2024 placebo. Run all specs on the 2020 election. PolyMarket existed but was much smaller (~$60M total volume vs $3B in 2024). Effects should be MUCH smaller. If not, our coefficients are picking up something other than the unique 2024-PolyMarket-salience effect.

F2 — Sports placebo. Use PolyMarket sports-market prices as a "placebo treatment." They should not predict political donations or ads. If they do, our identification is spurious.

F3 — Down-ballot decoupling. PolyMarket priced presidential markets heavily but down-ballot Senate/House markets thinly. Senate-race outcomes should respond LESS to presidential PolyMarket movements than presidential outcomes do. If they respond equally, our channel is too broad.

F4 — DMA-pair test. Use Nielsen DMA pairs (geographically adjacent but media-different): e.g., border counties of MO/KS pair vs IA/MO pair. Within DMA pair, county exposure to PolyMarket varies. If we still see effects, regional confounders are unlikely.

9. Risks & Mitigations

Risk	Severity	Mitigation
Reverse causality: donor flows feed back to PolyMarket via betting	High	HF identification + whale-IV ensures we're not capturing donor-driven PolyMarket moves. Test: donor flows should not Granger-cause PolyMarket if our story is right.
Omitted variable: news drives BOTH PolyMarket and donations	High	Whale-IV (exclusion: whales don't directly contact donors). Salience-DiD (parallel trends). Polls as control.
Weak first stage: whale shocks may be small relative to news shocks	Medium	Multiple whales documented (Tsang-Yang, Chainalysis, Sirolly-Sethi). Total whale-attributable variance ~3-5pp in Trump probability — sufficient instrument strength.
Media coverage measurement noisy	Medium	Triangulate Newsbank + Nielsen + Google Trends. Hand-validate top 20 DMAs.
Mechanism overlap: bandwagon vs strategic vs complacency hard to separate	Medium	Multiple distinguishing tests (donor type, swing-vs-safe, age cohort).
Insufficient power on turnout (county-level Δ noisy)	Medium	Use sub-county precincts where available. Pre-power calculation.
Crowded field: someone publishes first	Medium	Speed (build on Paper A pipeline → 6-9 month build). Methodological novelty: triangulated IV.
Wesleyan Media Project license delay	Low	3-month standard turnaround. Plan accordingly.

10. Timeline & Resources

Phase	Month	Deliverable
Setup	1	Apply WMP / Vivvix license. Pull FEC bulk data. Build daily PolyMarket aggregation.
Coverage measure	2	Newsbank + Nielsen + Google Trends → PMCoverage_d by DMA-week.
Data integration	3	Join PolyMarket × FEC × Vivvix × turnout into master panel.
Main specs	4-5	Specs 1a, 1b, 2a, 2b, 3 with all three identification strategies.
Mechanism + Falsification	5-6	5-mechanism horse race + 4 falsification tests.
First draft	6-7	Working paper ready. Internal seminar.
Polish + submit	8	Submit to AER (or QJE).

Note: Paper B's timeline starts ~6 months after Paper A's data pipeline is built (since 80% of infrastructure is shared). Total bundle: Paper A submitted Month 6-9, Paper B submitted Month 12-15.

11. Policy & Litigation Relevance

Paper B is the more politically explosive of the bundle. If we find evidence that PolyMarket pricing causally affected the 2024 election outcome — through donor depression, candidate retreat, or voter complacency — it would be:

Cited by election-integrity lawsuits: Both Democratic-funded post-mortems and Republican defenses would invoke our findings.
Direct input to CFTC rulemaking: If prediction markets affect elections, that's a fundamentally different regulatory question than "are they accurate forecasts."
Congressional testimony fodder: Both Brookings (Aaron Klein) and CFR have already framed this in their 2025-2026 events. We'd be the empirical foundation.
Foreign-government and election-defense use: Foreign election authorities increasingly worry about US-style prediction markets affecting their elections.
Litigation against PolyMarket / Kalshi as election influence: Potential plaintiff-side expert-witness work on whether a platform's pricing dynamics constitute election interference.

The political-economy stakes are first-order. If reflexivity is small or null, that's a useful finding too — it neutralizes one of the loudest concerns about PMs. If reflexivity is large, it reshapes the entire policy debate. Either way, this paper gets cited.

Working draft research proposal · Generated May 19, 2026

← Paper A Paper C → ← Lit Review