Literature Readings · Prediction Markets · Paper A · Pre-Registration

Paper A Pre-Registration

Locked Industry Exposure Measure + Analysis Plan for OSF Submission

Draft pre-registration · May 19, 2026 · OSF-style format · Lock before any 2024 data access

📋 Purpose & OSF Submission

The single biggest reviewer concern for Paper A will be: "the industry exposure measure was constructed to maximize the cross-sectional fit on 2024 data." Researcher-degrees-of-freedom in measure construction is the classic credibility-eroding problem in cross-sectional asset pricing.

The solution: pre-register the industry exposure measure construction at OSF BEFORE accessing 2024 PolyMarket data. This pre-registration documents:

The primary composite exposure measure (5-channel, equal-weighted)
4 pre-committed alternative constructions
The 40 event timestamps to be analyzed (verified independently before lock)
The 2016 election placebo validation procedure
Specific decision rules for what counts as positive vs negative evidence

Recommended platform: OSF Registries (osf.io/registries/aspredicted/new). Doable in 1 sitting once this document is finalized.

1. Locked Hypothesis

H1 — Cross-asset response to PolyMarket shocks. Within 30-minute windows around major 2024 election news events, US asset prices respond systematically to PolyMarket Trump probability changes. Specifically, for asset class k:

H1.SPX: β_SPX > 0 (Trump favorable for equity aggregate)
H1.MXN: β_USDMXN > 0 (Mexican peso depreciates on Trump shocks)
H1.10Y: β_10Y-yield > 0 (yields rise on Trump shocks)
H1.VIX: β_VIX > 0 (vol rises on Trump shocks)
H1.BTC: β_BTC > 0 (crypto rises on Trump deregulation expectation)

H2 — Industry cross-section. Within S&P 500 firms, the firm-specific coefficient β̂_i is increasing in pre-determined TrumpExposure_i (defined in §3):

β̂_i = δ₀ + δ₁ · TrumpExposure_i + γ · Z_i + u_i

Primary test: δ₁ > 0 (p < 0.05) under standard heteroskedasticity-robust SEs clustered by industry.

2. Locked Sample Selection

Firms: S&P 500 constituents as of Jan 1, 2024. Excludes financials with high government exposure (defined as >30% revenue from US government) — these are pre-determined Trump-positive but mechanically. Total expected: N ≈ 470 firms.

Events: The 40 events listed in §5, with timestamps verified against ≥ 2 sources (Wikipedia + WSJ + Bloomberg).

Sample period: Jan 1, 2024 - Nov 30, 2024 (PolyMarket Trump prob > 5% throughout).

Event window: 30 minutes (primary). Robustness windows of 5, 15, 60 min, and 1 day pre-committed.

Asset cross-section: all S&P 500 + 11 SPDR sectors + 8 FX pairs + 4 Treasury tenors + IG/HY spreads + VIX + 6 commodities + 5 crypto = ~520 unit panel × 40 events ≈ ~21K observations.

3. Locked Industry Exposure Measure

The composite measure for each firm i, locked as follows:

primary measure TrumpExposure_i = 0.2 · TaxExposure_i + 0.2 · RegExposure_i − 0.2 · TradeExposure_i − 0.2 · ImmigrationExposure_i + 0.2 · GeopoliticalExposure_i

where each component is standardized to mean-0, SD-1 over the S&P 500 cross-section using 2017-2019 data only.

3.1 TaxExposure_i

Definition: Firm i's 2017-2019 average effective tax rate (ETR = total tax / pretax income from Compustat).

Construction: Standardize ETR across S&P 500 firms. Higher TaxExposure = more room to gain from corporate tax cuts.

Compustat fields: txt (income taxes) / pi (pretax income), averaged over fiscal years 2017, 2018, 2019.

Robustness alt: 5-year average (2015-2019) or 3-year median.

3.2 RegExposure_i

Definition: Firm i's industry-level regulatory intensity, measured as the sum of:

(a) 2017-2019 industry lobbying spend on federal agencies (from OpenSecrets, summed over Senate + House + Executive + relevant agencies), and

(b) Industry-level Federal Register mentions of regulatory actions affecting the firm's NAICS-4 industry (2017-2019).

Standardize both components, equal-weighted average.

Higher RegExposure = more to gain from deregulation.

3.3 TradeExposure_i

Definition: Firm i's exposure to import competition from China + Mexico, measured as 2017-2019 industry import share from those countries weighted by tariff differential (column 2 minus column 1 from US HTS).

Source: Atkin & Khandelwal (NBER WP) trade-exposure dataset, or replicate from Census trade data + USITC HTS.

Higher TradeExposure = more to lose from tariffs. NEGATIVE coefficient in composite.

3.4 ImmigrationExposure_i

Definition: Average of two components, both standardized:

(a) H-1B dependency: industry's 2017-2019 share of H-1B visa applications relative to industry employment (USCIS data).

(b) Unauthorized-worker share: industry's 2017-2019 share of unauthorized workers (Borjas-Tienda estimates by NAICS-4, via BLS).

Higher ImmigrationExposure = more to lose from restriction. NEGATIVE coefficient in composite.

3.5 GeopoliticalExposure_i

Definition: Average of three components, standardized:

(a) Defense contract revenue share (Bloomberg Government 2017-2019 contract awards, scaled by firm revenue)

(b) International risk score from 10-K Item 1A "geographic risk" text (NLP-based scoring of Russia/Ukraine/Iran/China mentions in 2018-2019 10-Ks)

Higher GeopoliticalExposure = more sensitive to policy regime change.

4. Locked Alternative Constructions

Four pre-committed alternative constructions to be reported in the paper alongside the primary measure:

Alt 1 — PCA composite. First principal component of the 5 channels (after standardization).

Alt 2 — Lobbying-weighted composite. Same 5 channels but weighted by industry political activity (lobbying spend share).

Alt 3 — 10-K text-mining composite. Pure NLP-based: score each firm's 2018-2019 10-Ks for "political risk" language using a pre-specified lexicon.

Alt 4 — Individual channels. Report β̂_i = δ₁ · Channel_i for each of the 5 channels SEPARATELY (no composite).

Decision rule: Robustness is established if (i) the primary composite has δ₁ > 0 with p < 0.05, AND (ii) at least 3 of 4 alternatives produce qualitatively similar conclusions.

5. Locked Event List (40 Events)

Timestamp standard: all UTC, second resolution. Verified independently against ≥2 of {Wikipedia, WSJ, NYT, Bloomberg, Reuters, AP}. Locked before 2024 PolyMarket data access.

5.1 Nine major events (locked)

#	Event	Date / UTC	Direction prior
1	Iowa caucus — Trump wins	2024-01-15 23:30	Trump+
2	Trump conviction (NY hush money)	2024-05-30 21:00	Harris+
3	Biden-Trump debate (Atlanta)	2024-06-27 01:00	Trump+++
4	Trump assassination attempt #1 (Butler PA)	2024-07-13 22:11	Trump+++
5	Biden drops out (Truth Social post)	2024-07-21 17:46	Harris++ (Biden-)
6	Harris picks Walz as VP	2024-08-06 13:00	Harris+
7	Harris-Trump debate (Philadelphia)	2024-09-11 01:00	Harris++
8	Trump assassination attempt #2 (West Palm Beach)	2024-09-15 18:00	Trump+
9	Election Night (decisive call ~2 AM ET)	2024-11-06 06:00	Trump+++

5.2 Minor events

31 additional events (super Tuesday, court rulings, major polling shifts > 3pp daily, VP debate, sub-presidential primaries, etc.) — full list in OSF appendix. Locked at pre-registration time.

5.3 Decision rule for adding events

Once locked, no events may be added or removed except for the following pre-committed reasons:

Timestamp correction (must reconcile against existing locked sources)
Discovery of an event independently classified as "major" by ≥3 of {WSJ, NYT, Bloomberg, FT} in 2024 retrospective summaries — but must be disclosed as a deviation

6. Locked Primary Specifications

6.1 Spec 1 (asset-class)

ΔY_k,e = α_k + β_k · ΔP_PM,e + γ_k · X_e + ε_k,e

X_e controls: lagged 1-min VIX, lagged 1-min S&P, lagged 1-min DXY. SE clustering: by event. Window: [t_e − 5min, t_e + 30min].

6.2 Spec 2 (industry cross-section)

β̂_i = δ₀ + δ₁ · TrumpExposure_i + γ · Z_i + u_i

Z_i controls: log market cap, book-to-market ratio (B/M), profitability (OP), asset growth (Inv), pre-2024 idiosyncratic vol. SE clustering: by GICS-2 industry. Sample: S&P 500 firms with ≥ 30 events of non-missing returns.

6.3 First-stage / IV

ΔP_PM,t = π · WhaleTrade_t + δ · X_t + ν_t

WhaleTrade_t: net Trump-direction trade flow (in $) from the 11 Chainalysis-identified whale wallets (Fredi9999 cluster) in minute t. First-stage F-stat > 10 required for IV results to be reported.

7. Sample Size & Power

Effect size benchmarked against Snowberg-Wolfers-Zitzewitz (QJE 2007):

SWZ 2007 effect: 1-pp shift in Trump probability → 0.02-0.03% equity return (intraday).
2024 expected effect: larger (more concentrated industries; more polarized policy platforms). Conservative target: 0.04% per 1-pp PM shock for SPX.
Sample size: 40 events × 30-min windows × ~520 assets ≈ ~21,000 observations.
Power at 5% α: > 0.99 for effect sizes of 0.01% or larger (assuming intraday return SD ~ 0.05%).
Pre-committed null: If estimated β̂_SPX CI is [-0.01%, 0.01%] at 95% CI, we report a "well-identified null" finding rather than a positive result.

8. Decision Rules

Primary finding criterion:

(a) Spec 1 yields β̂_k with sign matching H1 prediction at p < 0.05 for at least 3 of 5 named asset classes (SPX, MXN, 10Y, VIX, BTC). AND

(b) Spec 2 yields δ̂₁ > 0 at p < 0.05 for the primary composite measure. AND

If (a), (b), and (c) all satisfied: claim positive evidence for SWZ-redux. If only (a) and (b): claim weaker evidence with caveats. If only (a): report null on cross-section. If none: report null overall.

Multiple testing correction: Bonferroni-corrected α = 0.05/5 = 0.01 for the named-asset-class headlines. Romano-Wolf step-down for industry-level cross-section.

P-hacking guardrails: we commit to NOT reporting any specifications, asset classes, or industries beyond those listed in §6 in the main paper. Additional exploratory analyses go in the online appendix and are clearly labeled as exploratory.

9. 2016 Election Placebo Validation

Before any 2024 data analysis, validate the industry exposure measure on the 2016 election:

Step 1. Run Spec 1 + Spec 2 on 2016 election event windows using IEM + TradeSports + PredictIt data.

Step 2. The 2016 industry exposure measure should be constructed from 2010-2015 firm data (5-year lookback, no 2016 overlap).

Step 3. If 2016 cross-sectional δ₁ is significantly > 0, the measure has external validity. Proceed to 2024.

Step 4. If 2016 δ₁ is null or negative, we have evidence the measure is poorly constructed. Investigate why and revise BEFORE locking 2024 analysis. Document the revision publicly.

10. Pre-Committed Allowed Deviations

Pre-registrations should anticipate the legitimate reasons for deviation. We commit to disclosing deviations on these grounds:

Data unavailability: if a specific Compustat/Bloomberg/Atkin-Khandelwal field is not available as expected, we may substitute the closest available alternative AND disclose
Methodological discovery during pipeline build: if we discover a measurement error or improvement, we apply it AND disclose the original and revised version
Reviewer requests: standard for any pre-registered analysis
Identification refinement: if the whale-trade-IV exclusion restriction is violated (per a placebo test we will conduct), we transparently report and use alternative IDs

NOT allowed without disclosure: exposure measure tweaking, event list extension, sample period changes, decision rule changes after observing 2024 data.

🔒 Action: Submit to OSF

Once this document is finalized, the action is to submit to OSF Registries as an "AsPredicted" or "OSF Standard" pre-registration. Typical OSF lock-and-publish: ~30 minutes once the document is ready. Receive OSF DOI; cite the DOI in Paper A and Paper B drafts.

Estimated time to finalize: 1-2 days of careful review + minor edits. Submit ~2 weeks before any PolyMarket 2024 data access begins.

Pre-registration draft · Generated May 19, 2026

← Proposal ← Pipeline ← Lit Review