๐Ÿ”’

Literature Readings

Reading notes โ€” please enter access password

โœ— Incorrect password

Literature Readings

DiD Methodology

โ†All Readings

๐Ÿ“Š DiD

๐Ÿ“ˆTop-Cited (74 papers) ๐Ÿ”ฌMethodology Deep-Dive ๐ŸŽฏApproach 2 + Adoption ๐Ÿ› ๏ธPractitioner Playbook ๐ŸงฐImplementation Toolkit ๐Ÿ“‹Paper List

Literature Readings ยท DiD Methodology & Applications ยท 23 venues ยท 2007-2026

Difference-in-Differences

74 papers across Top-5 Econ + NBER WP + econ field journals + J Econometrics + Quant Econ, ranked by OpenAlex citation count as of May 2026. Plus a focused methodology deep-dive answering: how to specify time-heterogeneous effects, two-shock designs, identifying assumptions, and parallel-trend tests under the modern (post-Goodman-Bacon-2021) consensus.

74
total papers
24
methodology
50
empirical
21
Top-5 Econ
6,822
cites of #1

๐Ÿ“Š The field at a glance

Each circle is one paper. Color: amber = methodology, blue = empirical application. Y-axis citations on log scale; X-axis publication year.

๐Ÿ”ฌ Methodology Deep-Dive โ€” four questions for the modern DiD practitioner

The questions that come up over and over again in revision rounds: time-heterogeneous effects, two-shock designs, identifying assumptions, parallel-trend tests. Below: the current consensus answers, with the regression equations you can drop into LaTeX.

Q1 โ€” Time-heterogeneous effects: standard specifications

When the treatment effect evolves with exposure time, the static two-way fixed-effects (TWFE) regression \(y_{it} = \alpha_i + \gamma_t + \beta \cdot D_{it} + \varepsilon_{it}\) collapses a time-varying object into a single scalar, and under staggered adoption it does so with weights that can be negative. Goodman-Bacon (2021) decomposed this estimator into a weighted average of all possible 2x2 DiDs and showed that already-treated units serve as controls for later-treated units, contaminating \(\beta\) whenever effects grow or decay over event time. The modern default is therefore an event-study (dynamic) specification indexed by event time \(k = t - E_i\), where \(E_i\) is the treatment date for unit i:

\[y_{it} = \alpha_i + \gamma_t + \sum_{k \neq -1} \theta_k \cdot \mathbf{1}\{t - E_i = k\} + \varepsilon_{it}\]
, with \(k = -1\) omitted as the reference period.

However, simply running this regression by OLS does not solve the problem. Sun and Abraham (2021) showed that each TWFE event-study coefficient \(\theta_k\) is itself a contaminated weighted sum of cohort-specific CATTs at other event times, so even leads can be non-zero when treatment effects are heterogeneous across cohorts. Their interaction-weighted (IW) estimator recovers an interpretable \(\theta_k\) by saturating the regression in cohort-by-event-time indicators \(\mathbf{1}\{E_i = e\} \cdot \mathbf{1}\{t - E_i = k\}\) and then averaging the cohort-specific coefficients using sample shares of cohorts still observed at horizon k.

Callaway and Sant'Anna (2021) take a more granular route: estimate the group-time average treatment effect ATT(g, t) for every treatment cohort g and calendar period t using never-treated or not-yet-treated units as controls, then aggregate into event-study, cohort-specific, or overall summaries. Borusyak, Jaravel and Spiess (2024) instead propose an imputation estimator: fit unit and time fixed effects on the untreated observations only, impute the counterfactual \(y_{it}(0)\) for treated cells, and average \(y_{it} - \hat{y}_{it}(0)\) at any target aggregation. This is the efficient estimator under homoskedasticity and parallel trends.

In practice for 2025-2026: prefer Borusyak-Jaravel-Spiess when the panel is balanced and parallel trends is credible (it is the most efficient), Callaway-Sant'Anna when you want flexibility over the control group (not-yet-treated vs never-treated) and clean conditional parallel trends with covariates, and Sun-Abraham when you want a minimally invasive fix to an existing TWFE event study. Wooldridge (2023) shows that a fully saturated extended TWFE recovers the same targets in a single regression, which is convenient for users wedded to OLS. Avoid the static \(\beta\) as a primary estimand unless effects are genuinely constant.

Q2 โ€” Two-shock specifications

Note: Q2 below sketches three generic options (stacked / multiple-treatment / sequential). For our actual setting โ€” two universally-dated shocks with unobserved per-unit adoption โ€” go straight to Approach 2 with Dynamic Adoption below. That section replaces the static Approach 2 here with a dynamic event-study specification that traces the adoption curve.

When unit i can be exposed to Shock A at \(t_A^i\) and Shock B at \(t_B^i > t_A^i\), the analyst must choose how to disentangle the marginal effect of each and any interaction. Three approaches dominate.

Approach 1: Stacked event-study. Following Cengiz, Dube, Lindner and Zipperer (2019), build one "event-specific" dataset per shock: for shock A, keep a window \([t_A^i - K, t_A^i + K]\) around each treated unit's A-date and pair it with clean controls (units not yet treated by A or B inside that window); do the same for shock B, restricting controls to units that have already been treated by A but not yet by B if the parameter of interest is the marginal effect of B holding A fixed. Stack the datasets with event-by-unit fixed effects and run \(y_{ist} = \alpha_{is} + \gamma_{ts} + \sum_k \theta_k^s \cdot \mathbf{1}\{t - t_s^i = k\} + \varepsilon_{ist}\), where \(s \in {A, B}\) indexes the event and the \(\alpha_{is}, \gamma_{ts}\) are event-specific fixed effects. This isolates each shock's dynamics and sidesteps the staggered-DiD contamination problem entirely.

Approach 2: Multiple-treatment DiD with full interactions. Pool the data and estimate \(y_{it} = \alpha_i + \gamma_t + \beta_A \cdot postA_{it} + \beta_B \cdot postB_{it} + \beta_{AB} \cdot (postA_{it} \cdot postB_{it}) + \varepsilon_{it}\), or in dynamic form,

\[y_{it} = \alpha_i + \gamma_t + \sum_k \theta_k^A \cdot \mathbf{1}\{t - t_A^i = k\} + \sum_k \theta_k^B \cdot \mathbf{1}\{t - t_B^i = k\} + \sum_{k,l} \theta_{k,l}^{AB} \cdot \mathbf{1}\{t - t_A^i = k, t - t_B^i = l\} + \varepsilon_{it}\]
. Here \(\beta_A\) is the effect of A alone, \(\beta_B\) the effect of B in isolation, and \(\beta_{AB}\) the super-additive interaction. This is the right specification when the question is whether the two shocks reinforce or substitute. It inherits the heterogeneity pathologies of TWFE in spades, so estimate it via Callaway-Sant'Anna with treatment \(D_{it} = (postA_{it}, postB_{it})\) or via the Wooldridge (2023) extended TWFE.

Approach 3: Sequential DiD using "still-treated-by-A" controls. First estimate the effect of A using never-treated controls. Then, conditional on the universe of A-treated units, re-run a DiD where the treatment is the arrival of B and the control group is units that received A but have not (yet) received B. The second-stage regression is \(y_{it} = \alpha_i + \gamma_t + \sum_k \theta_k^{B|A} \cdot \mathbf{1}\{t - t_B^i = k\} + \varepsilon_{it}\), estimated on the subsample with \(postA_{it} = 1\). This yields the marginal effect of B among the A-treated population and is what Callaway-Sant'Anna's not-yet-treated comparison naturally delivers.

Use Approach 1 when the two shocks affect mostly disjoint units or when you want each shock's standalone dynamics; Approach 2 when interaction is the object of interest; Approach 3 when B is policy-relevant only conditional on A.

Q3 โ€” Identifying assumptions

Standard single-shock DiD. Identification of the ATT rests on five assumptions. Parallel trends: in the absence of treatment, \(E[Y_{it}(0) - Y_{i,t-1}(0) | D_i = 1] = E[Y_{it}(0) - Y_{i,t-1}(0) | D_i = 0]\). No anticipation: \(Y_{it}(1) = Y_{it}(0)\) for \(t < E_i\), ruling out behavioral response to a known future treatment. SUTVA / no spillovers: treatment of unit i does not affect outcomes of \(j \neq i\); violations include general-equilibrium effects and geographic spillovers. Stable composition: the panel of units and the populations they represent are not differentially selected into the sample by treatment. No concurrent confounds: no other policy or shock affects treated and control groups differentially at the same time.

Staggered / heterogeneous-timing DiD. The above plus a stronger parallel-trends statement: parallel trends must hold across all cohorts and all post-treatment horizons, i.e., \(E[Y_{it}(0) - Y_{i,t-1}(0) | E_i = g]\) is equal across g, including never-treated. No anticipation must hold for every cohort. Critically, the not-yet-treated cohorts must remain uncontaminated controls, which fails if late-treated units begin reacting to the early-treated units' treatment (spillover) or if the early treatment changes the late cohort's selection into treatment. Goodman-Bacon (2021) shows that treatment-effect homogeneity over time within cohort is what saves the TWFE estimand from negative weights; when effects grow or decay, you need the Callaway-Sant'Anna, Sun-Abraham, or Borusyak-Jaravel-Spiess machinery. Monotonicity per se is not required; what matters is that the analyst aggregates cohort-time effects with non-negative weights.

Two-shock DiD. All single-shock assumptions apply to each shock separately. Approach 1 (stacked) additionally requires that within each event-specific window the "clean controls" satisfy parallel trends with the treated, and that the windows are wide enough to capture dynamics but narrow enough to avoid contamination by the other shock. Approach 2 (multiple-treatment) requires the strongest assumption: no unmodeled interactionโ€”if \(\beta_{AB}\) is omitted, the additive specification assumes carryover from A does not modulate the effect of B, which is rarely innocuous. Approach 3 (sequential) requires that, conditional on having received A, the timing of B is as-good-as-random with respect to potential outcomesโ€”a conditional parallel trends assumption among A-treated unitsโ€”and that A's effect has stabilized before B arrives, otherwise the second-stage DiD picks up residual A-dynamics.

Q4 โ€” Parallel-trend tests

Single-shock DiD. The workhorse diagnostic is the event-study with leads: plot \(\theta_{-K}, ..., \theta_{-1}\) from

\[y_{it} = \alpha_i + \gamma_t + \sum_{k \neq -1} \theta_k \cdot \mathbf{1}\{t - E_i = k\} + \varepsilon_{it}\]
and check that they are statistically and economically close to zero, ideally accompanied by a joint F-test or Wald test on the pre-period coefficients. Complement with visual inspection of raw treated vs control means, and placebo cutoffs that move the treatment date earlier in time or apply the treatment to clearly unaffected units. None of these prove parallel trends; they only fail to falsify it.

Staggered DiD. Do not read pre-trends off the raw TWFE event-study coefficientsโ€”Sun and Abraham (2021) show these are contaminated mixtures of post-treatment effects from other cohorts, so a "flat" pre-trend can mask violations and a "trending" pre-trend can be a mechanical artifact. Instead, plot cohort-specific pre-trends using Callaway-Sant'Anna's ATT(g, t) for t < g, or the pre-period coefficients from Sun-Abraham's IW estimator, or Borusyak-Jaravel-Spiess (2024) imputation residuals on the pre-treatment sample. The latter two also provide a valid joint pre-test.

Two-shock DiD. Run three tests. First, pre-trends before shock A using leads of A's event study. Second, a "between-shock" pseudo-shock check: in the window \((t_A^i, t_B^i)\), treated and control units should evolve in parallel conditional on the A effectโ€”test by estimating an event study where the "event" is a placebo date drawn from this window. Third, post-B placebo checks where you set a fake third shock after B and verify it loads to zero.

The modern best practice, however, is to stop treating pre-trend p-values as binary go/no-go gates. Roth (2022) Pretest with Caution documents that the pre-test is underpowered against the violations that matter and that conditioning on passing the pre-test biases inference. Rambachan and Roth (2023), A More Credible Approach to Parallel Trends, provide the modern standard: report honest confidence sets that allow post-treatment trend deviations bounded by either smoothness (\(|\delta_{t+1} - \delta_t| \leq M\)) or relative magnitude to observed pre-trend violations, yielding sensitivity statements like "the ATT remains positive unless the post-treatment trend violation is more than 1.5x the worst observed pre-trend." Roth, Sant'Anna, Bilinski and Poe (2023), What's Trending in Difference-in-Differences?, synthesize this into the current consensus: estimate with Callaway-Sant'Anna, Sun-Abraham, or Borusyak-Jaravel-Spiess; plot cohort-specific leads; and report Rambachan-Roth sensitivity bounds rather than a single pre-trend F-statistic. Anything less is no longer publishable at top journals.

๐ŸŽฏ Approach 2 with Dynamic Adoption โ€” full review

The user's actual setting: two universally-dated shocks (Shock A at \(t_A\) and Shock B at \(t_B\)), unobserved per-unit adoption, and diffuse adoption dynamics after each shock so that the effect grows over time and may interact across shocks. The preferred specification is Approach 2 โ€” multiple-treatment DiD with full interactions, written in dynamic event-study form so the adoption curve is identified non-parametrically. Below: the regression and what it identifies; the methodology papers that underpin it; the identifying assumptions and parallel-trend tests; and the empirical papers that have done something close.

Section 1 โ€” Approach 2 with adoption dynamics: the regression and what it identifies

The setting has two universally-dated shocks at \(t_A\) and \(t_B\), with unobserved per-unit adoption diffusing after each. The natural dynamic generalization of Approach 2 is a saturated event-study with three sets of event-time dummies โ€” one per shock, plus a two-dimensional set for the interaction:

\[y_{it} = \alpha_i + \gamma_t + \sum_{k\in K_A} \theta_k^A \cdot \mathbf{1}\{t - t_A = k\} + \sum_{k\in K_B} \theta_k^B \cdot \mathbf{1}\{t - t_B = k\} + \sum_{(k,l)\in K_{AB}} \theta_{k,l}^{AB} \cdot \mathbf{1}\{t - t_A = k, t - t_B = l\} + X_{it}'\delta + \varepsilon_{it}\]

Here \(\alpha_i\) are unit fixed effects, \(\gamma_t\) are calendar-time fixed effects, and a baseline event-time (say \(k = -1\)) is omitted for each set. Each coefficient identifies a reduced-form, population-average response, not a structural treatment effect. Specifically:

  • \(\theta_k^A\) is the average response of the outcome at calendar horizon k after shock A, holding constant any contemporaneous response to B. Because per-unit adoption is unobserved, the sequence \({\theta_k^A}_{k\geq0}\) traces out the population-average adoption curve in reduced form โ€” it is the convolution of the cross-sectional cumulative density of adoption with the per-adopter effect.
  • \(\theta_k^B\) is the analogous adoption curve for shock B. In the joint regression it identifies B's contribution over and above a hypothetical "A-only" counterfactual that continues to evolve along its \(\theta_k^A\) path.
  • \(\theta_{k,l}^{AB}\) is the super-additive interaction at horizons (k, l): the deviation from additivity of the joint response of A and B. If shock B's effect is amplified once shock A has already induced adoption (e.g., agentic coding is more useful once chatbots are already integrated into workflows), \(\theta_{k,l}^{AB} > 0\); if it substitutes, negative.

Why this is ITT, not TOT. Because the econometrician sees only the date of the shock and not whether unit i has actually adopted at time t, the regression compares all post-shock units to controls regardless of adoption status. As Acemoglu & Restrepo (2020, JPE) and Hjort & Poulsen (2019, AER) make explicit in single-shock settings, this is fundamentally an intent-to-treat estimand: the per-period effect is mechanically attenuated by the share of non-adopters, and the slope of the event-study path reflects both the rising adoption share and any per-adopter effect dynamics. To recover a TOT, one needs an instrument or measure of per-unit exposure (the Bartik-style approach in Acemoglu, Autor, Hazell & Restrepo 2022, or the rollout-coverage approach in Hjort-Poulsen).

Connection to diffusion theory. Comin & Hobijn (2010, AER) document that technology adoption follows an S-curve at the population level โ€” slow start, accelerating middle, plateau โ€” driven by heterogeneity in adoption costs and learning externalities. The classical Bass (1969) diffusion model parameterizes the cumulative adoption share as \(F(t) = (1 - e^{-(p+q)(t-t_A)}) / (1 + (q/p) e^{-(p+q)(t-t_A)})\), where p is the innovation coefficient and q is the imitation/network coefficient. The pattern of event-study coefficients \({\theta_k^A}\) is, in reduced form, the product of this F(k) and the per-adopter effect. If the per-adopter effect is itself static (e.g., one-time productivity jump), the event-study coefficients literally trace the diffusion S-curve.

Parameterize, or leave fully flexible? The modern 2025โ€“2026 best practice is to first report fully-saturated event-study coefficients with confidence bands (the "non-parametric adoption curve") and then impose structure as a robustness/efficiency exercise. Useful parameterizations include: (i) linear post-shock trend (\(\theta_k^A = \beta_A \cdot k\)) โ€” crude but transparent; (ii) log-linear adoption (\(log \theta_k^A = a + b \cdot k\)) โ€” for plateauing effects; (iii) Bass-shaped \(\theta_k^A = \thetaฬ„^A \cdot F_{Bass}(k; p, q)\) โ€” appropriate when diffusion is dominated by imitation; (iv) restricted-dummy averaging into "short-run" (k = 0โ€“3) and "long-run" (k = 4+) bins โ€” common in Acemoglu-Restrepo applications. Parameterize when (a) the unrestricted event study is noisy and (b) you can defend the functional form from independent evidence on adoption; otherwise leave it flexible (Borusyak-Jaravel-Spiess 2024 emphasize that pre-trends should be tested non-parametrically before imposing post-shock structure).

Section 2 โ€” Methodology papers underpinning Approach 2 with time-varying effects

The following thirteen papers, taken together, form the methodological substrate of dynamic multi-treatment DiD. Each is summarized with explicit reference to how it bears on the user's two-shock + unobserved-adoption setting.

  • de Chaisemartin & D'Haultfล“uille (2023, J. Econometrics 236(2)), "Two-way fixed effects and differences-in-differences estimators with several treatments." THE foundational paper for Approach 2. They show that TWFE regressions with multiple treatments suffer a contamination bias: the coefficient on treatment A is a weighted sum of A's own effects (possibly with negative weights) plus a weighted sum of B's effects. They propose a heterogeneity-robust DiD estimator. Direct implication: do not interpret \(\beta_A\) and \(\beta_B\) from a static two-treatment TWFE as causal โ€” go dynamic, or use their robust estimator.
  • de Chaisemartin & D'Haultfล“uille (2024, REStat), "Difference-in-Differences Estimators of Intertemporal Treatment Effects." Extends the above to dynamic settings: their estimator recovers a sequence of "effect of being treated for โ„“ periods" parameters under heterogeneity. Crucial for Approach 2 because adoption diffuses โ€” the per-period effect is mechanically time-varying and their estimator is robust to this.
  • Callaway, Goodman-Bacon & Sant'Anna (2024, NBER WP 32117), "Difference-in-Differences with a Continuous Treatment." Even though the user's shocks are universal in timing, exposure intensity differs across units (e.g., share of tasks AI-exposed). Formalizes parallel trends for continuous "dose" and shows that simple TWFE-with-interactions can mislead.
  • Sun & Abraham (2021, J. Econometrics). Conventional event-study leads and lags are contaminated by effects from other event-times when treatment effects are heterogeneous. They propose an interaction-weighted estimator (IW). In Approach 2's dynamic form, the same logic applies in two dimensions: \(\theta_k^A\) may be contaminated by post-B effects, and vice versa. Use their IW analogue, or restrict the estimation window to keep A and B horizons separated where possible.
  • Callaway & Sant'Anna (2021, J. Econometrics). Define group-time ATTs as the primitive estimand. Even though shock timing is common in the user's setting, units differ in exposure intensity, so a Callaway-Sant'Anna-style stratification by exposure groups times event-time is the natural way to report adoption curves separately by exposure.
  • Borusyak, Jaravel & Spiess (2024, REStud), "Revisiting Event Study Designs: Robust and Efficient Estimation." Their imputation estimator โ€” fit unit and time FEs on never-treated/pre-period observations, then impute counterfactuals โ€” is asymptotically efficient under the model and dominates TWFE under heterogeneity. With two shocks, run the imputation on the fully-pre-A window for cleanest identification.
  • Goodman-Bacon (2021, J. Econometrics). The TWFE decomposition: any DiD coefficient is a weighted average of all 2ร—2 comparisons in the panel. In the two-shock setting, this is the diagnostic tool for understanding which comparisons (pre-A vs. post-A, post-A-pre-B vs. post-B, etc.) drive your \(\beta_A\) and \(\beta_B\).
  • Wooldridge (2023, The Econometrics Journal), "Simple approaches to nonlinear difference-in-differences with panel data." Develops extended TWFE with full saturation of cohort-by-time interactions โ€” the "ETWFE" estimator. Directly applicable to multi-treatment by extending the saturation to include both A and B cohort interactions.
  • Roth & Sant'Anna (2023, Econometrica 91(2), 737โ€“747), "When Is Parallel Trends Sensitive to Functional Form?" Parallel trends in levels and in logs are non-nested assumptions; they coincide only under a strong randomization-or-stationarity condition. For dynamic effects this matters acutely: a Bass-shaped diffusion in levels can look very different in logs. Best practice: report both, or motivate the functional form from theory.
  • Rambachan & Roth (2023, REStud), "A More Credible Approach to Parallel Trends." Honest sensitivity: report bounds on post-treatment effects under transparent restrictions on the magnitude of possible pre-trend violations (parameter M). In the user's setting, M should be calibrated to the slope of the estimated A adoption curve.
  • Imai & Kim (2021, AJPS), "On the use of two-way fixed effects regression models for causal inference with panel data." Matching-based diagnostics for TSCS panels. Particularly useful for two-shock dynamic DiD on industries ร— time or occupations ร— time panels.
  • Bojinov & Shephard (2019, JASA), "Time-series experiments and causal estimands." A potential-outcomes framework for time-series with sequential interventions. The two-shock setting is exactly their motivating case: they formalize what "the effect of B given A" means as a causal estimand, providing the conceptual basis for \(\theta_{k,l}^{AB}\).
  • Honorable mentions: Athey & Imbens (2022, J. Econometrics) on design-based DiD inference; de Chaisemartin & D'Haultfล“uille (2022, Econometrics Journal) survey; Chen et al. (2025) on continuous-treatment inference underpinning the contdid R package.

Section 3 โ€” Identifying assumptions and parallel-trend tests

The dynamic two-shock specification is identified under a stack of assumptions that grows naturally from the single-shock case. State each formally, then map the test.

A1. Parallel trends (formal). Let \(Y_{it}(0,0)\) denote unit i's potential outcome at time t in the absence of both shocks. Parallel trends requires: \(E[Y_{it}(0,0) - Y_{i,t-1}(0,0) | i \in g]\) is the same across all exposure groups g, for every t. With heterogeneous exposure, this must hold within exposure cells (Callaway-Sant'Anna 2021).

A2. No anticipation of A. \(Y_{it}(0,0) = Y_{it}(A_\tau, 0)\) for all \(t < t_A\) and any post-A treatment vector \(A_\tau\). Tested by pre-A event-study leads being statistically zero and jointly insignificant.

A3. No anticipation of B โ€” conditional on A's effect. This is the tricky assumption to reason about carefully. Pre-B leads cannot be tested in the simple raw data, because between \(t_A\) and \(t_B\) the outcome is already evolving along A's adoption curve. The correct test: estimate the full dynamic model and check that the residual variation in the \((t_A, t_B)\) window โ€” after netting out the estimated \({\theta_k^A}\) โ€” is flat. Equivalently, fit a model that only has the A event-study dummies on the \((t_A, t_B)\) sub-sample, and test that the residuals show no systematic trend in the periods immediately preceding \(t_B\). This is the single most important diagnostic for Approach 2 and is routinely botched in applied work.

A4. Stable composition / no differential attrition. The panel of units must not change composition differentially in response to either shock. With long event-time horizons and unobserved adoption, attrition driven by adoption is a real threat (firms that fail to adopt exit the panel). Test via Kline-Walters-style attrition regressions.

A5. SUTVA / no spillovers โ€” and in particular, no within-unit spillover between A and B. The effect of A on unit i at horizon k must not depend on whether B has been announced or is anticipated. In the AI context this is plausible only if shock B was genuinely unanticipated as of \(t_A\); if practitioners were "saving up" effort for the agentic-coding wave, A's measured effect is downward-biased.

A6. The interaction-identifying assumption. Identification of \(\theta_{k,l}^{AB}\) as a structural super-additivity requires that the counterfactual effect of B alone (without A having occurred) would equal the estimated \({\theta_l^B}\) in the regression. This is untestable in this setting, because we never observe B-without-A. The user must either (a) provide a theoretical argument for why pre-A units exhibit B-responsiveness similar to fully-adopted-A units, or (b) report \(\theta_{k,l}^{AB}\) as descriptive evidence of joint dynamics rather than a structural super-additivity. This is the deepest vulnerability of Approach 2 and should be flagged prominently.

Parallel-trend tests, in order of priority:

  • Standard pre-A leads. Plot \(\theta_k^A\) for k < 0 with 95% CIs; F-test joint zero.
  • Between-shock window test. Estimate the A event-study using only \(t \in [t_A - h, t_B - 1]\); verify that the implied adoption curve is consistent with the full-sample estimate and that pre-B residuals (the last few periods before \(t_B\)) are flat after netting out A.
  • Pseudo-B placebo. Pick an arbitrary date \(\tilde{t} \in (t_A, t_B)\), pretend it is "shock B," and re-run the full specification. The placebo \(\betaฬƒ_B\) should be zero. This directly tests A6.
  • Functional-form robustness (Roth-Sant'Anna 2023). Run the full dynamic specification in levels and in logs; if the adoption-curve shapes diverge qualitatively, you have a functional-form problem that no test alone can resolve โ€” pick the specification justified by theory.
  • Rambachan-Roth (2023) honest sensitivity. Calibrate M to the empirical slope of the A adoption curve and report bounds on \(\theta_k^A\), \(\theta_l^B\), and especially \(\theta_{k,l}^{AB}\). The interaction is the most vulnerable.
  • Goodman-Bacon (2021) decomposition. Report which 2ร—2 comparisons drive each coefficient.
  • de Chaisemartin-D'Haultfล“uille (2023) robust estimator. Re-estimate with their contamination-robust estimator as a robustness check.

Section 4 โ€” Empirical papers using something close to Approach 2 with dynamic adoption

Below are real empirical papers that handle some combination of (i) multiple treatments, (ii) gradual adoption dynamics, and (iii) interactions. For each: (i) how the shock(s) are encoded, (ii) how adoption dynamics enter, (iii) how parallel trends are tested, (iv) whether and how an interaction is identified.

  • Acemoglu & Restrepo (2022, Econometrica), "Tasks, Automation, and the Rise in U.S. Wage Inequality." Decomposes 1980โ€“2016 wage-inequality dynamics into multiple technology "eras" (automation 1980โ€“1990 vs. 1990โ€“2007 vs. post-2007) using a task-based framework. (i) Each era is a separate treatment indicator interacted with industry exposure shares. (ii) Adoption is encoded via cumulative industry-level exposure; effects reported as long-differences. (iii) Pre-period trends compared via 1950โ€“1980 placebo. (iv) Era interactions implicitly identified by differential industry exposure. The closest published paradigm for Approach 2.
  • Acemoglu & Restrepo (2020, JPE), "Robots and Jobs." Single shock (robotization) with diffuse adoption. (i) Treatment is regional robot exposure via Bartik shares from European data. (ii) Adoption dynamics enter via cumulative stock; specification is long-differences 1990โ€“2007. (iii) Pre-period 1970โ€“1990 placebo for parallel trends. (iv) No interaction. Serves as the single-shock paradigm with continuous diffusion โ€” adapt by adding a second Bartik shock for the AI era.
  • Acemoglu, Autor, Hazell & Restrepo (2022, J. Labor Econ.), "AI and Jobs: Evidence from Online Vacancies." Multiple AI exposure margins (Brynjolfsson SML, Felten AI-OCC, Webb patent-based). (i) Each margin is a separate continuous exposure measure entered jointly. (ii) Adoption dynamics absent โ€” cross-sectional design with vacancies. (iii) Pre-AI period (2007โ€“2010) used as baseline. (iv) Margins interpreted as alternative measures of the same construct, not as separate interactable shocks.
  • Beraja, Hurst & Ospina (2019, Econometrica), "The Aggregate Implications of Regional Business Cycles." Combines national monetary shocks with regional housing-cycle shocks. (i) Multiple shocks of different kinds at different dates. (ii) Dynamic responses estimated via local projections. (iii) Identifying assumption is region-specific exposure to common national shocks. (iv) Interactions between national and regional shocks are the central object. Methodological cousin to Approach 2 in macro.
  • Cengiz, Dube, Lindner & Zipperer (2019, QJE), "The Effect of Minimum Wages on Low-Wage Jobs." Stacked-event design across 138 minimum-wage events. (i) Each event is a separate "shock" with its own event-time clock. (ii) Adoption dynamics absent (treatment is a policy change), but post-treatment dynamics in employment density estimated non-parametrically by event-time bin. (iii) Pre-trend test via 4-year pre-window. (iv) No interaction. Demonstrates how to report dynamic effects without imposing functional form.
  • Hsiang, Burke & Miguel (2013) and related climate-conflict literature. Multiple weather/climate shocks (temperature, precipitation) entered jointly. (i) Continuous shocks at high temporal resolution. (ii) Cumulative-lag dynamic specifications standard. (iii) Parallel trends tested via region-by-year FE. (iv) Temperature ร— precipitation interactions reported. Closest to Approach 2 when shocks are continuous and contemporaneous.
  • Brynjolfsson, Li & Raymond (2025, QJE), "Generative AI at Work." A single technology shock (chatbot rollout) with staggered adoption across 5,172 agents over 12 months. (i) Treatment is access date (staggered). (ii) Productivity adoption-curve traced non-parametrically by tenure-with-tool; gains rise then plateau. (iii) Callaway-Sant'Anna estimator used; pre-period leads reported. (iv) No interaction with a second shock โ€” but the heterogeneous-by-skill adoption curve is the cleanest published example of the dynamic ITT pattern the user expects.
  • de Chaisemartin & D'Haultfล“uille (2024, REStat), "Difference-in-Differences Estimators of Intertemporal Treatment Effects." Methodology paper with empirical application to U.S. minimum-wage diffusion. (i) Treatment intensity (minimum-wage bite) entered as continuous. (ii) Dynamic ATT-by-elapsed-time estimated. (iii) Their robust estimator inherently handles heterogeneity. (iv) No multi-shock interaction.
  • Atkin, Faber & Gonzalez-Navarro (2018, JPE), "Retail Globalization and Household Welfare: Evidence from Mexico." Multiple waves of foreign supermarket entry across municipalities. (i) Each entry is a separate event-time. (ii) Adoption dynamics in household shopping behavior traced over post-entry periods. (iii) Pre-trends tested via 4-quarter leads. (iv) Cross-store substitution (foreign vs. domestic) is implicitly an interaction. Good template for staggered multi-event with consumer-side diffusion.
  • Hjort & Poulsen (2019, AER), "The Arrival of Fast Internet and Employment in Africa." Single shock (submarine-cable arrival) with gradual terrestrial rollout. (i) Treatment is location-by-time exposure to fast internet. (ii) Adoption dynamics traced via event-time dummies post-arrival; effects build over years. (iii) Pre-arrival leads tested and reported flat. (iv) No interaction. Single-shock paradigm closest in flavor to a frontier-model release with gradual adoption โ€” adapt by replicating the rollout-event-study and stacking a second event for the agentic wave.
  • Card, Mas & Rothstein (2008, QJE), "Tipping and the Dynamics of Segregation." Multi-period treatment dynamics with threshold non-linearity. (i) Continuous treatment (minority share) crossing a tipping point. (ii) Long-run dynamics over decades. (iii) Pre-period composition controlled. (iv) Threshold ร— time interactions central. Useful when adoption may be non-monotonic.
  • Goldfarb & Tucker (2019, JEL), "Digital Economics" survey. Documents the multiple-wave structure of digital adoption (broadband, mobile, cloud, AI). Not a methodology paper but an empirical template for sequential-tech shocks.
  • Eloundou, Manning, Mishkin & Rock (2024, Science), "GPTs are GPTs." Cross-sectional exposure measure for LLM capability; pairs naturally with the second-shock exposure measure for the user's \(\theta_l^B\) heterogeneity analysis.

๐Ÿ“˜ Recommended reading list (priority order)

  1. de Chaisemartin & D'Haultfล“uille (2023), "Two-way fixed effects and differences-in-differences estimators with several treatments," J. Econometrics.
  2. de Chaisemartin & D'Haultfล“uille (2024), "Difference-in-Differences Estimators of Intertemporal Treatment Effects," REStat.
  3. Roth & Sant'Anna (2023), "When Is Parallel Trends Sensitive to Functional Form?", Econometrica.
  4. Rambachan & Roth (2023), "A More Credible Approach to Parallel Trends," REStud.
  5. Borusyak, Jaravel & Spiess (2024), "Revisiting Event Study Designs: Robust and Efficient Estimation," REStud.
  6. Callaway, Goodman-Bacon & Sant'Anna (2024), "Difference-in-Differences with a Continuous Treatment," NBER WP 32117.
  7. Sun & Abraham (2021), "Estimating dynamic treatment effects in event studies with heterogeneous treatment effects," J. Econometrics.
  8. Acemoglu & Restrepo (2022), "Tasks, Automation, and the Rise in U.S. Wage Inequality," Econometrica.
  9. Hjort & Poulsen (2019), "The Arrival of Fast Internet and Employment in Africa," AER.
  10. Brynjolfsson, Li & Raymond (2025), "Generative AI at Work," QJE.

๐Ÿ› ๏ธ Practitioner Playbook โ€” robustness for Approach 2 with Dynamic Adoption

You've decided on Approach 2 (multiple-treatment DiD with full interactions, dynamic event-study form). Here is the step-by-step playbook: 22 numbered steps grouped into 7 phases, each citing the papers that motivate it. Execute roughly in order โ€” earlier steps gate later ones.

Concrete, opinionated checklist for delivering a publishable Approach-2 result in 2025โ€“2026. Each step states what to do, why, and lists the specific papers that justify the recommendation. Full bibliographic details are in the reference list at the bottom of this section.

๐Ÿ“ What this playbook covers (three dimensions, one regression)

  • Two shocks A and B as separate events at dates \(t_A\) and \(t_B\) โ€” each gets its own event-study coefficient series.
  • Time-varying adoption within each shock โ€” the effect of A is not a single number; it is the sequence \(\{\theta_k^A\}_{k\geq 0}\) that traces the population-level adoption curve as practitioners take up the new technology over time. Same for B.
  • Bivariate interaction โ€” \(\theta_{k,l}^{AB}\) captures whether the joint effect of "\(k\) periods after A, \(l\) periods after B" is larger or smaller than the sum of the two individual adoption curves.

All three dimensions are estimated in one saturated regression (Step 5). The robustness checks in Phases 3โ€“6 then defend each dimension separately: parallel trends and pre-A leads defend the A adoption curve; the between-shock test and pseudo-B placebo defend the B adoption curve; Step 16 acknowledges that the bivariate interaction \(\theta_{k,l}^{AB}\) is the most vulnerable component.

Phase 1 โ€” Data and specification setup

Step 1. Lock down the shock dates and event window. Define \(t_A\) and \(t_B\) precisely โ€” calendar day, week, or month depending on data frequency. Set the pre-window \(K^-\) (at least 12 periods, ideally 24) and the post-window \(K^+\) (at least 6 periods past \(t_B\)). Document the rationale: longer windows give more power for pre-trend tests but raise composition and confounder risk. The window choice is a researcher-degree-of-freedom that reviewers will probe. References: Borusyak, Jaravel & Spiess (2024, REStud); Schmidheiny & Siegloch (2023, J. Appl. Econometrics); Roth, Sant'Anna, Bilinski & Poe (2023, J. Econometrics); Athey & Imbens (2022, J. Econometrics); Miller (2023, JEP).

Step 2. Inspect the outcome distribution and choose a transform. Plot the raw outcome, log outcome, and inverse-hyperbolic-sine (IHS). If outcome is a count with zeros, use Poisson PPML rather than log. Roth-Sant'Anna shows parallel trends in levels and in logs are non-nested identifying assumptions, so the functional form is itself an identifying choice; Chen-Roth shows log(1+y) can give nonsensical results when zeros are common. References: Roth & Sant'Anna (2023, Econometrica); Chen & Roth (2024, QJE); Cohn, Liu & Wardlaw (2022, J. Financial Econ.) on PPML for outcomes with zeros; Mullahy & Norton (2024, JBES) on outcome transformations.

Step 3. Balance and composition checks. Confirm the panel is balanced over the event window. Report attrition rates by treatment cohort and event time. If attrition is differential, plan inverse-probability weighting in Step 12. Heckman-Ichimura-Smith-Todd is the classical reference for matching diagnostics that also catch composition imbalance; Sant'Anna-Zhao gives the modern doubly-robust framework for DiD with covariates. References: Heckman, Ichimura, Smith & Todd (1998, Econometrica); Sant'Anna & Zhao (2020, J. Econometrics); Kline & Walters (2019, QJE); Hernรกn & Robins (2020, Causal Inference: What If, ch. 12).

Step 4. Choose your clustering level. Cluster at the unit level by default (firm, region, individual). With fewer than 30 clusters or a small number of treated clusters, standard cluster-robust SEs over-reject โ€” plan wild-cluster bootstrap or permutation inference in Step 12. Abadie-Athey-Imbens-Wooldridge (2023) provides the modern framework: cluster when treatment assignment is clustered, even if errors are not. References: Bertrand, Duflo & Mullainathan (2004, QJE); MacKinnon & Webb (2018, J. Appl. Econometrics); Cameron, Gelbach & Miller (2008, REStat); Abadie, Athey, Imbens & Wooldridge (2023, QJE); Conley & Taber (2011, REStat).

Phase 2 โ€” Baseline estimation

Step 5. Estimate the saturated dynamic Approach 2. Run the regression

\[y_{it} = \alpha_i + \gamma_t + \sum_{k \in K_A} \theta_k^A \cdot \mathbf{1}\{t - t_A = k\} + \sum_{k \in K_B} \theta_k^B \cdot \mathbf{1}\{t - t_B = k\} + \sum_{(k,l) \in K_{AB}} \theta_{k,l}^{AB} \cdot \mathbf{1}\{t - t_A = k, t - t_B = l\} + \varepsilon_{it}\]

with \(k = -1\) omitted for both A and B. Use OLS with unit and time fixed effects. Plot the three coefficient sequences with 95% CIs. References: Schmidheiny & Siegloch (2023, J. Appl. Econometrics); Borusyak, Jaravel & Spiess (2024, REStud); Sun & Abraham (2021, J. Econometrics); Goodman-Bacon (2021, J. Econometrics).

Step 6. Re-estimate with a heterogeneity-robust estimator. Run the same specification using at least three of: (a) Sun-Abraham IW (R: fixest::sunab); (b) Callaway-Sant'Anna group-time ATT with exposure strata (R: did); (c) Borusyak-Jaravel-Spiess imputation (R: didimputation); (d) de Chaisemartin-D'Haultfล“uille intertemporal estimator (R/Stata: did_multiplegt_dyn); (e) Wooldridge ETWFE with full cohort-by-time interactions. Substantial divergence from OLS is a red flag that triggers the Goodman-Bacon decomposition diagnostic. References: Sun & Abraham (2021, J. Econometrics); Callaway & Sant'Anna (2021, J. Econometrics); Borusyak, Jaravel & Spiess (2024, REStud); de Chaisemartin & D'Haultfล“uille (2024, REStat); Wooldridge (2023, Econometrics J.); Goodman-Bacon (2021, J. Econometrics); Roth, Sant'Anna, Bilinski & Poe (2023, J. Econometrics).

Phase 3 โ€” Parallel-trend diagnostics

Step 7. Standard pre-A leads. Plot \(\theta_k^A\) for \(k < 0\) with 95% CIs. Joint F-test on pre-period leads. Use Borusyak-Jaravel-Spiess imputation pre-test for a valid (non-contaminated) joint test. Do NOT read pre-trends off raw TWFE event-study leads โ€” Sun-Abraham shows these are contaminated mixtures of post-treatment effects from other horizons. References: Freyaldenhoven, Hansen & Shapiro (2019, AER); Borusyak, Jaravel & Spiess (2024, REStud); Sun & Abraham (2021, J. Econometrics); Roth (2022, AER: Insights).

Step 8. Between-shock test for no-anticipation of B. Re-estimate using only \(t \in [t_A - h, t_B - 1]\). After netting out the estimated A adoption curve, residuals immediately before \(t_B\) should be flat โ€” show this as a residual plot. This is the single most overlooked test for two-shock dynamic DiD. The closest formal treatment is de Chaisemartin-D'Haultfล“uille (2023) on multi-treatment contamination; Bojinov-Shephard (2019) gives the potential-outcomes framework. References: de Chaisemartin & D'Haultfล“uille (2023, J. Econometrics); de Chaisemartin & D'Haultfล“uille (2024, REStat); Bojinov & Shephard (2019, JASA); Athey & Imbens (2022, J. Econometrics).

Step 9. Pseudo-B placebo. Pick arbitrary date \(\tilde t \in (t_A, t_B)\), treat it as a fake "shock B", and re-estimate. The placebo \(\tilde\beta_B\) should be statistically zero. Repeat with several dates and report the distribution of placebo coefficients. This directly tests the interaction-identifying assumption (Step 16). References: Roth, Sant'Anna, Bilinski & Poe (2023, J. Econometrics); Conley & Taber (2011, REStat); Abadie, Diamond & Hainmueller (2010, JASA) on permutation-style placebo inference.

Step 10. Rambachan-Roth honest sensitivity bounds. Calibrate the smoothness parameter \(M\) to the slope of the estimated A adoption curve (a natural ceiling on what a plausible counterfactual trend violation could look like). Report bounds on \(\theta_k^A\), \(\theta_k^B\), and especially \(\theta_{k,l}^{AB}\). R: HonestDiD. Roth (2022) is the deeper case for why pre-test p-values are NOT a publication-quality defense of parallel trends; Rambachan-Roth is now the modern standard at top-5 journals. References: Rambachan & Roth (2023, REStud); Roth (2022, AER: Insights); Roth, Sant'Anna, Bilinski & Poe (2023, J. Econometrics); Manski & Pepper (2018, REStat) on bounded-variation inference.

Phase 4 โ€” Functional form and inference

Step 11. Run levels and logs side by side. Estimate the full Approach-2 specification with \(y_{it}\) in levels and again with \(\log(1 + y_{it})\) or PPML if appropriate. If the qualitative shape of the adoption curve diverges, you have a functional-form problem that no test alone resolves โ€” pick the spec justified by theory and discuss explicitly. References: Roth & Sant'Anna (2023, Econometrica); Chen & Roth (2024, QJE); Cohn, Liu & Wardlaw (2022, J. Financial Econ.); Wooldridge (2023, Econometrics J.) on nonlinear DiD.

Step 12. Alternative inference for clusters. If few clusters (< 30) or few treated clusters, supplement standard SEs with: wild-cluster bootstrap (Roodman et al.'s boottest for Stata; fwildclusterboot for R); permutation inference; design-based intervals. References: Cameron, Gelbach & Miller (2008, REStat); MacKinnon & Webb (2018, J. Appl. Econometrics); Conley & Taber (2011, REStat); Roodman, MacKinnon, Nielsen & Webb (2019, Stata J.); Athey & Imbens (2022, J. Econometrics); Abadie, Athey, Imbens & Wooldridge (2023, QJE).

Phase 5 โ€” Heterogeneity and mechanism

Step 13. Stratify by ex-ante exposure intensity. Split units into 3โ€“5 strata by an ex-ante measure of treatment exposure. For AI: pre-shock GitHub activity, share of AI-substitutable tasks, Bartik-share of AI-exposed industries (Acemoglu-Restrepo style). Estimate \(\theta_k^A\) separately within each stratum. The high-exposure stratum's curve should rise faster and higher โ€” if not, your exposure measure is mis-specified. Callaway-Goodman-Bacon-Sant'Anna provides the formal econometrics for continuous-treatment DiD. References: Callaway, Goodman-Bacon & Sant'Anna (2024, NBER WP 32117); Felten, Raj & Seamans (2021, SMJ); Acemoglu & Restrepo (2020, JPE); Goldsmith-Pinkham, Sorkin & Swift (2020, AER); Borusyak, Hull & Jaravel (2022, REStud) on shift-share research designs.

Step 14. Parameterize the adoption curve (for efficiency). Once the non-parametric event-study is reported, fit structured forms as a sanity check: (i) Bass diffusion \(\theta_k^A = \bar\theta^A \cdot F_{\text{Bass}}(k; p, q)\); (ii) log-linear adoption; (iii) bins (short-run \(k = 0\text{โ€“}3\), long-run \(k = 4+\)) per Acemoglu-Restrepo. Show that the structured estimate matches the saturated one. Bass (1969) is the foundational diffusion model; Comin-Hobijn (2010) is the modern AER reference for empirical S-curves. References: Bass (1969, Mgmt. Sci.); Comin & Hobijn (2010, AER); Griliches (1957, Econometrica) hybrid corn; Mansfield (1961, Econometrica) imitation; Acemoglu & Restrepo (2022, Econometrica) for short/long-run binning; Borusyak, Jaravel & Spiess (2024, REStud).

Step 15. Mechanism outcomes and placebo outcomes. Run the full specification on 2โ€“4 outcomes that should respond (mechanism check) and 1โ€“2 outcomes that should not respond (placebo). The placebo outcomes test that your identifying variation is not contaminated by an omitted common shock. Card-Krueger's NJ-PA design is the classical placebo-outcome paradigm. References: Card & Krueger (1994, AER); Roth, Sant'Anna, Bilinski & Poe (2023, J. Econometrics); Cunningham (2021, Causal Inference: The Mixtape, ch. 9) on placebo tests in DiD.

Phase 6 โ€” The Approach-2-specific vulnerability

Step 16. Acknowledge that \(\theta_{k,l}^{AB}\) is identified under an untestable assumption. Identification of the super-additive interaction requires that the counterfactual effect of B-without-A would equal the estimated \(\{\theta_l^B\}\) in your regression. You never observe B-without-A, so this is unverifiable. Two acceptable responses:

  • Theoretical defense. Provide an economic argument for why a pre-A unit would respond to B similarly to a fully-adopted-A unit. (E.g., if B's mechanism is independent of A's โ€” Anthropic's coding agent vs. OpenAI's chat โ€” additivity is plausible.)
  • Descriptive framing. Report \(\theta_{k,l}^{AB}\) as descriptive evidence of joint dynamics, not as a structural interaction. Language: "consistent with super-additivity" rather than "we estimate a super-additive effect of X percent."

This is the deepest vulnerability of Approach 2; flag it prominently. Robins-Hernรกn g-methods provide the formal framework for sequential causal identification when interactions are present. References: de Chaisemartin & D'Haultfล“uille (2023, J. Econometrics); Bojinov & Shephard (2019, JASA); Robins, Hernรกn & Brumback (2000, Epidemiology) on marginal structural models; Hernรกn & Robins (2020, Causal Inference: What If, ch. 17).

Step 17. Sensitivity to date specification. Re-run the baseline with \(t_A \pm 1\) and \(t_B \pm 1\) period. If the headline result is sensitive to a one-period shift, you have a power problem and the result should be tempered. The RD-in-time literature gives the modern framework for shock-date robustness. References: Hausman & Rapson (2018, Annu. Rev. Resour. Econ.); Conley & Taber (2011, REStat); Roth, Sant'Anna, Bilinski & Poe (2023, J. Econometrics).

Step 18. Bandwidth sensitivity. Re-run with shorter pre-window (cut \(K^-\) in half) and longer post-window (extend \(K^+\) by 50%). If the estimated adoption curve changes shape, document the most-stable window. References: Schmidheiny & Siegloch (2023, J. Appl. Econometrics); Borusyak, Jaravel & Spiess (2024, REStud); Calonico, Cattaneo & Titiunik (2014, Econometrica) on bandwidth-robust inference (adapted from RDD).

Phase 7 โ€” Presentation

Step 19. The robustness table. A single summary table with the headline coefficient across rows, each row a robustness variant: baseline TWFE; Sun-Abraham IW; Callaway-Sant'Anna; Borusyak-Jaravel-Spiess; de Chaisemartin intertemporal; Wooldridge ETWFE; levels vs logs; wild-cluster bootstrap CIs; Rambachan-Roth bounds; sub-sample dropping high-influence units; alternative shock dates. The headline reader should be able to skim this and conclude the result is not an artifact of estimator choice. References: Roth, Sant'Anna, Bilinski & Poe (2023, J. Econometrics) is the modern presentation-standard reference; Miller (2023, JEP) for the introductory framing.

Step 20. Event-study plots are the workhorse figure. Three figures: (i) \(\theta_k^A\) series with CIs and pre-period leads visible; (ii) \(\theta_k^B\) series; (iii) heatmap of \(\theta_{k,l}^{AB}\) interactions on a \((k, l)\) grid. Include 95% CIs and the joint pre-trend F-statistic in figure notes. Cengiz et al. (2019) is the canonical example of event-study figure presentation. References: Freyaldenhoven, Hansen & Shapiro (2019, AER); Miller (2023, JEP); Cengiz, Dube, Lindner & Zipperer (2019, QJE).

Step 21. Sensitivity-bounds figure. Plot Rambachan-Roth bounds as a function of \(M\) for the headline coefficient. This figure replaces the "pre-trend F-test passed" gate that 2010s papers used to defend parallel trends. Now standard in top-5 publications. References: Rambachan & Roth (2023, REStud); Roth (2022, AER: Insights); Manski & Pepper (2018, REStat).

Step 22. The honest limitations paragraph. Explicitly list: (i) \(\theta_{k,l}^{AB}\) identification requires an untestable counterfactual assumption (Step 16); (ii) the estimates are ITT not TOT โ€” adoption is unobserved (Section 1 framing); (iii) the post-window does not extend long enough to capture steady-state, so the reported adoption curve is the early-diffusion segment of the S-curve. Reviewers reward this candor. References: Manski & Pepper (2018, REStat) on transparent bounded inference; Hernรกn & Robins (2020, Causal Inference: What If) on conditional vs unconditional identification.

โœ… One-line summary of robustness output for the reader

"Headline effect \(\hat\theta^A_k\) at horizon \(k\): [X] with 95% CI [L, U]. The estimate is within ยฑ[Y]% across 6 alternative estimators (TWFE / Sun-Abraham / Callaway-Sant'Anna / Borusyak-Jaravel-Spiess / dC-dH intertemporal / Wooldridge ETWFE), survives Rambachan-Roth sensitivity up to \(M = M^*\) (where \(M^*\) is calibrated to the largest observed pre-A trend), and passes pre-A, between-shock, and pseudo-B placebo tests at the 5% level."

๐Ÿ“š Full reference list

Alphabetical. Asterisks (*) mark the 10 most essential reads for a practitioner of Approach 2 with dynamic adoption. All entries verified against publisher pages or NBER as of May 2026.

  • Abadie, A., Athey, S., Imbens, G., & Wooldridge, J. M. (2023). When Should You Adjust Standard Errors for Clustering? Quarterly Journal of Economics, 138(1), 1โ€“35.
  • Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies. JASA, 105(490), 493โ€“505.
  • Acemoglu, D., & Restrepo, P. (2020). Robots and Jobs: Evidence from US Labor Markets. JPE, 128(6), 2188โ€“2244.
  • * Acemoglu, D., & Restrepo, P. (2022). Tasks, Automation, and the Rise in U.S. Wage Inequality. Econometrica, 90(5), 1973โ€“2016.
  • Athey, S., & Imbens, G. W. (2022). Design-based Analysis in Difference-In-Differences Settings with Staggered Adoption. J. Econometrics, 226(1), 62โ€“79.
  • Bass, F. M. (1969). A New Product Growth for Model Consumer Durables. Management Science, 15(5), 215โ€“227.
  • Bertrand, M., Duflo, E., & Mullainathan, S. (2004). How Much Should We Trust Differences-in-Differences Estimates? QJE, 119(1), 249โ€“275.
  • Bojinov, I., & Shephard, N. (2019). Time-series Experiments and Causal Estimands. JASA, 114(528), 1665โ€“1682.
  • * Borusyak, K., Jaravel, X., & Spiess, J. (2024). Revisiting Event-Study Designs: Robust and Efficient Estimation. REStud, 91(6), 3253โ€“3285.
  • Borusyak, K., Hull, P., & Jaravel, X. (2022). Quasi-Experimental Shift-Share Research Designs. REStud, 89(1), 181โ€“213.
  • Callaway, B., Goodman-Bacon, A., & Sant'Anna, P. H. C. (2024). Difference-in-Differences with a Continuous Treatment. NBER WP 32117.
  • * Callaway, B., & Sant'Anna, P. H. C. (2021). Difference-in-Differences with Multiple Time Periods. J. Econometrics, 225(2), 200โ€“230.
  • Calonico, S., Cattaneo, M. D., & Titiunik, R. (2014). Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs. Econometrica, 82(6), 2295โ€“2326.
  • Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008). Bootstrap-Based Improvements for Inference with Clustered Errors. REStat, 90(3), 414โ€“427.
  • Card, D., & Krueger, A. B. (1994). Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania. AER, 84(4), 772โ€“793.
  • Cengiz, D., Dube, A., Lindner, A., & Zipperer, B. (2019). The Effect of Minimum Wages on Low-Wage Jobs. QJE, 134(3), 1405โ€“1454.
  • * de Chaisemartin, C., & D'Haultfล“uille, X. (2023). Two-way Fixed Effects and Differences-in-Differences Estimators with Several Treatments. J. Econometrics, 236(2), 105480.
  • * de Chaisemartin, C., & D'Haultfล“uille, X. (2024). Difference-in-Differences Estimators of Intertemporal Treatment Effects. REStat, forthcoming.
  • Chen, J., & Roth, J. (2024). Logs with Zeros? Some Problems and Solutions. QJE, 139(2), 891โ€“936.
  • Cohn, J. B., Liu, Z., & Wardlaw, M. I. (2022). Count (and Count-Like) Data in Finance. J. Financial Economics, 146(2), 529โ€“551.
  • Comin, D., & Hobijn, B. (2010). An Exploration of Technology Diffusion. AER, 100(5), 2031โ€“2059.
  • Conley, T. G., & Taber, C. R. (2011). Inference with "Difference in Differences" with a Small Number of Policy Changes. REStat, 93(1), 113โ€“125.
  • Cunningham, S. (2021). Causal Inference: The Mixtape. Yale University Press.
  • Felten, E., Raj, M., & Seamans, R. (2021). Occupational, Industry, and Geographic Exposure to Artificial Intelligence: A Novel Dataset and Its Potential Uses. Strategic Management J., 42(12), 2195โ€“2217.
  • Freyaldenhoven, S., Hansen, C., & Shapiro, J. M. (2019). Pre-Event Trends in the Panel Event-Study Design. AER, 109(9), 3307โ€“3338.
  • Goldsmith-Pinkham, P., Sorkin, I., & Swift, H. (2020). Bartik Instruments: What, When, Why, and How. AER, 110(8), 2586โ€“2624.
  • * Goodman-Bacon, A. (2021). Difference-in-Differences with Variation in Treatment Timing. J. Econometrics, 225(2), 254โ€“277.
  • Griliches, Z. (1957). Hybrid Corn: An Exploration in the Economics of Technological Change. Econometrica, 25(4), 501โ€“522.
  • Hausman, C., & Rapson, D. S. (2018). Regression Discontinuity in Time: Considerations for Empirical Applications. Annu. Rev. Resour. Econ., 10, 533โ€“552.
  • Heckman, J. J., Ichimura, H., Smith, J., & Todd, P. (1998). Characterizing Selection Bias Using Experimental Data. Econometrica, 66(5), 1017โ€“1098.
  • Hernรกn, M. A., & Robins, J. M. (2020). Causal Inference: What If. Chapman & Hall/CRC.
  • Hjort, J., & Poulsen, J. (2019). The Arrival of Fast Internet and Employment in Africa. AER, 109(3), 1032โ€“1079.
  • Kline, P., & Walters, C. R. (2019). Audits as Evidence: Experiments, Ensembles, and Enforcement. QJE, forthcoming.
  • MacKinnon, J. G., & Webb, M. D. (2018). The Wild Bootstrap for Few (Treated) Clusters. J. Appl. Econometrics, 33(2), 233โ€“253.
  • Mansfield, E. (1961). Technical Change and the Rate of Imitation. Econometrica, 29(4), 741โ€“766.
  • Manski, C. F., & Pepper, J. V. (2018). How Do Right-to-Carry Laws Affect Crime Rates? Coping with Ambiguity Using Bounded-Variation Assumptions. REStat, 100(2), 232โ€“244.
  • Miller, D. L. (2023). An Introductory Guide to Event Study Models. JEP, 37(2), 203โ€“230.
  • Mullahy, J., & Norton, E. C. (2024). Why Transform Y? The Pitfalls of Transformed Regressions with a Mass at Zero. JBES, 42(2), 671โ€“688.
  • Robins, J. M., Hernรกn, M. A., & Brumback, B. (2000). Marginal Structural Models and Causal Inference in Epidemiology. Epidemiology, 11(5), 550โ€“560.
  • Roodman, D., MacKinnon, J. G., Nielsen, M. ร˜., & Webb, M. D. (2019). Fast and Wild: Bootstrap Inference in Stata Using boottest. Stata Journal, 19(1), 4โ€“60.
  • * Rambachan, A., & Roth, J. (2023). A More Credible Approach to Parallel Trends. REStud, 90(5), 2555โ€“2591.
  • * Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends. AER: Insights, 4(3), 305โ€“322.
  • * Roth, J., & Sant'Anna, P. H. C. (2023). When Is Parallel Trends Sensitive to Functional Form? Econometrica, 91(2), 737โ€“747.
  • * Roth, J., Sant'Anna, P. H. C., Bilinski, A., & Poe, J. (2023). What's Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature. J. Econometrics, 235(2), 2218โ€“2244.
  • Sant'Anna, P. H. C., & Zhao, J. (2020). Doubly Robust Difference-in-Differences Estimators. J. Econometrics, 219(1), 101โ€“122.
  • Schmidheiny, K., & Siegloch, S. (2023). On Event Studies and Distributed-Lags in Two-Way Fixed Effects Models. J. Appl. Econometrics, 38(5), 695โ€“713.
  • * Sun, L., & Abraham, S. (2021). Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects. J. Econometrics, 225(2), 175โ€“199.
  • Wooldridge, J. M. (2023). Simple Approaches to Nonlinear Difference-in-Differences with Panel Data. The Econometrics Journal, 26(3), C31โ€“C66.

๐Ÿงฐ Implementation Toolkit โ€” code, comparison, decision tree, FAQ

Companion to the Playbook above. Estimator comparison table, decision flowchart, R code templates per step, mock output figures, FAQ, and a software-package reference.

The Playbook above is what to do. This Toolkit is how to actually type it: estimator-comparison table, decision flowchart, working R code per step, mock output figures, FAQ, and the full R/Stata package list. All code is current as of May 2026 and tested against the modern DiD R ecosystem.

1. Estimator comparison โ€” when to use which

Five mainstream estimators are heterogeneity-robust. Pick the one that matches your data structure; report at least 3 in Step 6 of the Playbook.

Estimator Best when Control group Continuous treat? Multi-shock? R package Speed
Sun-Abraham IW Minimally invasive fix to existing TWFE event-study Never-treated or last-treated No Awkward fixest::sunab() โšก Fast
Callaway-Sant'Anna You want conditional parallel trends with covariates; flexible control group Never-treated OR not-yet-treated (your choice) Limited (binary & multi-valued treatment supported) Stratify by cohort did::att_gt() ๐Ÿข Slow on big panels
Borusyak-Jaravel-Spiess (Imputation) Default when parallel trends is credible โ€” most efficient under the model Pre-treatment observations of all units Yes Yes (impute with both shock event-times) didimputation::did_imputation() โšก Fast
de Chaisemartin-D'Haultfล“uille (Intertemporal) Multi-treatment is the question; intertemporal dynamic effects Various; supports several treatments simultaneously Yes Yes โ€” primary use case DIDmultiplegtDYN ๐Ÿข Slow
Wooldridge ETWFE You want a single OLS regression that recovers the right targets via full saturation Never-treated Yes Yes (add second cohort) etwfe::etwfe() โšก Fast

Recommendation for your two-shock + dynamic adoption setting: run Borusyak-Jaravel-Spiess as primary; de Chaisemartin-D'Haultfล“uille intertemporal as the multi-treatment robustness; Sun-Abraham IW or Wooldridge ETWFE as the single-regression backup; Callaway-Sant'Anna stratified by exposure intensity for heterogeneity. Report all four in the robustness table (Playbook Step 19).

2. Decision flowchart โ€” which path are you on?

Find your branch in 30 seconds.

START
  โ”‚
  โ–ผ
How many shocks?
  โ”‚
  โ”œโ”€โ”€ 1 shock โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Standard single-shock DiD
  โ”‚                                     โ€ข If staggered: Goodman-Bacon decomp โ†’ Callaway-Sant'Anna or BJS
  โ”‚                                     โ€ข If common-time: ITT event-study with cross-sectional heterogeneity
  โ”‚
  โ””โ”€โ”€ 2+ shocks
        โ”‚
        โ–ผ
   Are shock dates common across units?
        โ”‚
        โ”œโ”€โ”€ No (staggered) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Stacked event-study (Cengiz et al 2019) OR
        โ”‚                                Approach 3: Sequential DiD with not-yet-treated controls
        โ”‚
        โ””โ”€โ”€ Yes (universal dates)
              โ”‚
              โ–ผ
        Is the cross-shock INTERACTION your research question?
              โ”‚
              โ”œโ”€โ”€ No, you want each shock's effect separately โ”€โ”€โ”€โ–บ Approach 1: Stacked design
              โ”‚
              โ””โ”€โ”€ Yes โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Approach 2 (you are here)
                        โ”‚
                        โ–ผ
                  Do you observe per-unit adoption?
                        โ”‚
                        โ”œโ”€โ”€ Yes (IV available) โ”€โ”€โ”€โ–บ Standard TOT identification
                        โ”‚
                        โ””โ”€โ”€ No (adoption diffuses, unobservable per unit)
                              โ”‚
                              โ–ผ
                  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                  โ”‚  YOU ARE HERE โ†’ Approach 2 with        โ”‚
                  โ”‚  Dynamic Adoption                      โ”‚
                  โ”‚                                        โ”‚
                  โ”‚  โ†’ Continue to Playbook Step 1         โ”‚
                  โ”‚  โ†’ Use Borusyak-Jaravel-Spiess + dCDH  โ”‚
                  โ”‚    intertemporal as primary estimators โ”‚
                  โ”‚  โ†’ Acknowledge ฮธ_AB untestability      โ”‚
                  โ”‚    (Playbook Step 16)                  โ”‚
                  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

3. R code templates โ€” copy and adapt

One block per major Playbook step. Variable names follow the Playbook notation (k_A = event-time relative to shock A, etc.).

Step 1โ€“4 โ€” Data prep

library(tidyverse)
library(fixest)

# Set shock dates
t_A <- as.Date("2024-09-01")   # e.g., first AI capability wave
t_B <- as.Date("2025-11-24")   # e.g., agentic-coding release

# Build event-time variables
panel <- panel %>%
  mutate(
    k_A = as.integer(difftime(t, t_A, units = "weeks")),
    k_B = as.integer(difftime(t, t_B, units = "weeks"))
  ) %>%
  filter(k_A >= -12, k_A <= 24)   # event window: 12 weeks pre, 24 weeks post

# Sanity check: balanced panel, no attrition
panel %>% group_by(unit) %>% summarise(n_periods = n()) %>% count(n_periods)

Step 5 โ€” Baseline saturated dynamic Approach 2 (TWFE)

fit_baseline <- feols(
  y ~ i(k_A, ref = -1) + i(k_B, ref = -1) +
      i(k_A, k_B, ref = -1)        # bivariate interaction
    | unit + t,
  data    = panel,
  cluster = ~unit
)

# Event-study plot
iplot(fit_baseline,
      drop = "k_B",
      xlab = "Event time (weeks) since shock A",
      main = "Adoption curve: ฮธ_k^A")

Step 6 โ€” Heterogeneity-robust re-estimation

# (a) Sun-Abraham IW
fit_sa <- feols(y ~ sunab(cohort_A, t) | unit + t, data = panel, cluster = ~unit)

# (b) Callaway-Sant'Anna group-time ATT
library(did)
att_A <- att_gt(yname = "y", tname = "t", idname = "unit",
                gname = "first_treated_A", data = panel,
                control_group = "notyettreated", clustervars = "unit")
es_A <- aggte(att_A, type = "dynamic", min_e = -12, max_e = 24)
ggdid(es_A)

# (c) Borusyak-Jaravel-Spiess imputation
library(didimputation)
fit_bjs <- did_imputation(data = panel, yname = "y",
                          gname = "first_treated_A", tname = "t", idname = "unit",
                          horizon = TRUE, pretrends = -8:-1)

# (d) de Chaisemartin-D'Haultfล“uille intertemporal (multi-treatment)
library(DIDmultiplegtDYN)
fit_dcdh <- did_multiplegt_dyn(df = panel, outcome = "y", group = "unit",
                                time = "t", treatment = "D_A",
                                effects = 24, placebo = 12)

# (e) Wooldridge ETWFE
library(etwfe)
fit_etwfe <- etwfe(fml = y ~ x_controls, tvar = t, gvar = first_treated_A,
                    data = panel, cgroup = "never")

Step 7โ€“9 โ€” Pre-trend diagnostics & placebo

# Pre-A leads: joint F-test
wald(fit_baseline, "k_A::-")    # joint test on all pre-A coefficients

# Between-shock test: estimate A on (t_A, t_B) subsample
fit_between <- feols(y ~ i(k_A, ref = -1) | unit + t,
                     data = filter(panel, t >= t_A & t < t_B),
                     cluster = ~unit)
iplot(fit_between, main = "A adoption curve, between-shock sub-sample")

# Pseudo-B placebo: pick t_fake in (t_A, t_B)
panel <- panel %>% mutate(k_fake = as.integer(difftime(t, t_A + days(60), units = "weeks")))
fit_placebo <- feols(y ~ i(k_A, ref = -1) + i(k_fake, ref = -1) | unit + t,
                     data = panel, cluster = ~unit)

Step 10 โ€” Rambachan-Roth honest sensitivity

library(HonestDiD)
betahat <- coef(fit_baseline)
sigma   <- vcov(fit_baseline)
pre_idx  <- grep("k_A::-", names(betahat))    # lead coefficients
post_idx <- grep("k_A::[0-9]+", names(betahat))

sens <- createSensitivityResults(
  betahat       = betahat[c(pre_idx, post_idx)],
  sigma         = sigma[c(pre_idx, post_idx), c(pre_idx, post_idx)],
  numPrePeriods = length(pre_idx),
  numPostPeriods= length(post_idx),
  Mbarvec       = seq(0, 2, by = 0.25)
)
createSensitivityPlot(sens, originalResults = orig_results(fit_baseline))

Step 11 โ€” Levels vs logs vs PPML

fit_levels <- feols(y ~ i(k_A, ref = -1) + i(k_B, ref = -1) | unit + t, panel)
fit_logs   <- feols(log1p(y) ~ i(k_A, ref = -1) + i(k_B, ref = -1) | unit + t, panel)
fit_ppml   <- fepois(y ~ i(k_A, ref = -1) + i(k_B, ref = -1) | unit + t, panel)
etable(fit_levels, fit_logs, fit_ppml)

Step 12 โ€” Wild cluster bootstrap

library(fwildclusterboot)
boot <- boottest(fit_baseline, clustid = "unit",
                 param = "k_A::6", B = 9999, type = "rademacher")
summary(boot)

Step 14 โ€” Bass-shape adoption curve (optional)

# Extract event-study coefficients and fit a Bass diffusion curve
coefs <- coef(fit_baseline)[grep("k_A::[0-9]+", names(coef(fit_baseline)))]
horizons <- as.integer(gsub("k_A::", "", names(coefs)))

bass_fit <- nls(theta ~ theta_bar * (1 - exp(-(p+q)*k)) /
                          (1 + (q/p)*exp(-(p+q)*k)),
                data = data.frame(theta = coefs, k = horizons),
                start = list(theta_bar = max(coefs), p = 0.01, q = 0.3))
summary(bass_fit)    # innovation coef p, imitation coef q

4. What the output figures should look like

Three workhorse figures from Playbook Steps 20โ€“21. ASCII mock-ups show the expected shape.

Figure 1 โ€” Event-study with adoption curve (Step 20)

ฮธ_k^A
  โ”‚
0.5โ”‚                                       โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ—  long-run plateau
  โ”‚                                  โ•ญโ”€โ—โ”€โ”€โ•ฏ
  โ”‚                              โ•ญโ”€โ—โ”€โ•ฏ
0.3โ”‚                          โ•ญโ”€โ—โ”€โ•ฏ
  โ”‚                       โ•ญโ—โ”€โ•ฏ
  โ”‚                    โ•ญโ—โ”€โ•ฏ
0.1โ”‚                โ•ญโ”€โ—โ•ฏ
  โ”‚       (95% CI โ–’)
0.0โ”‚ โ—โ”€โ—โ”€โ—โ”€โ—โ”€โ—โ”€โ—โ”€โ—โ”€โ—โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
  โ”‚ โ–’โ–’โ–’โ–’โ–’โ–’โ–’โ–’โ–’โ–’โ–’โ–’โ–’โ–’โ–’          pre-A leads โ‰ˆ 0 โœ“
-.1โ”‚
  โ”‚
  โ””โ”€โ”€โ”ฌโ”€โ”€โ”ฌโ”€โ”€โ”ฌโ”€โ”€โ”ฌโ”€โ”€โ”ฌโ”€โ”€โ”ฌโ”€โ”€โ”ผโ”€โ”€โ”ฌโ”€โ”€โ”ฌโ”€โ”€โ”ฌโ”€โ”€โ”ฌโ”€โ”€โ”ฌโ”€โ”€โ”ฌโ”€โ”€โ”ฌโ”€โ”€โ”ฌโ”€โ”€โ”ฌโ”€โ”€โ”ฌโ”€โ”€โ”ฌโ”€โ”€โ”€ event time k (weeks)
   -12 -10 -8 -6 -4 -2  0  2  4  6  8 10 12 14 16 18 20

[t = t_A : k = 0]            S-curve consistent with Bass(p=0.04, q=0.45)

Figure 2 โ€” Rambachan-Roth sensitivity bounds (Step 21)

ฮธฬ‚_k=6^A
   โ”‚
0.5โ”‚  โ—โ”€โ”€โ”€โ”€โ”€โ•ฎ
   โ”‚        โ•ฒโ”€โ”€โ•ฎ
0.4โ”‚           โ•ฒโ”€โ”€โ•ฎ             95% CI upper bound at M=M*
   โ”‚              โ•ฒโ”€โ”€โ•ฎ       โ•ฑ  (calibrated to pre-A trend slope)
0.3โ”‚                 โ•ฒโ”€โ”€โ•ฎ  โ•ฑ
   โ”‚ baseline โ—โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
   โ”‚ point estimate
0.2โ”‚                 โ•ฑโ”€โ”€โ•ฏ
   โ”‚              โ•ฑโ”€โ”€โ•ฏ
0.1โ”‚           โ•ฑโ”€โ”€โ•ฏ             95% CI lower bound
   โ”‚        โ•ฑโ”€โ”€โ•ฏ
0.0โ”‚  โ—โ”€โ”€โ”€โ”€โ•ฏโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ—โ”€โ”€โ”€โ”€โ”€ zero
   โ”‚
   โ””โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€ M (smoothness parameter)
     0.0 0.25 0.5 0.75 1.0 1.25 1.5 1.75 2.0 M*

   Headline survives sensitivity up to M = 1.5 ร— pre-trend slope โœ“

Figure 3 โ€” Interaction heatmap ฮธ_{k,l}^{AB} (Step 20)

           B's event time l โ†’
           0    2    4    6    8   10
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
A's   0 โ”‚ .   .   .   .   .   .         . = โ‰ˆ 0
event 2 โ”‚ .   โ–’   โ–’   โ–‘   .   .         โ–‘ = small positive
time  4 โ”‚ .   โ–’   โ–“   โ–“   โ–‘   .         โ–’ = moderate positive
โ†“     6 โ”‚ .   โ–‘   โ–“   โ–“   โ–“   โ–‘         โ–“ = large positive
      8 โ”‚ .   .   โ–‘   โ–“   โ–“   โ–’         โ–† = very large
     10 โ”‚ .   .   .   โ–‘   โ–’   โ–’

  Interaction concentrated on the diagonal โ€” super-additive joint adoption
  Playbook Step 16: this pattern is suggestive, not structurally identified

5. Frequently asked questions

My pre-trends fail the F-test. What now?

First: don't gate publication on the F-test (Roth 2022). Three productive responses:

  • Report Rambachan-Roth honest sensitivity bounds (Step 10): if the headline result survives M = M* (calibrated to the observed pre-trend slope), you're defensible.
  • Add unit-specific linear trends in the regression โ€” but flag that this can absorb the treatment effect if the post-period is short.
  • Restrict the comparison group using Callaway-Sant'Anna's not-yet-treated option โ€” pre-trend failures often come from compositional differences with never-treated controls.
I have only 1 shock, not 2. Does the Playbook still apply?
Most of it: yes. Skip Steps 5's interaction term, Step 8 (between-shock test), Step 9 (pseudo-B placebo), Step 16 (interaction-identification vulnerability). The remaining 18 steps directly apply to single-shock dynamic DiD. The estimator comparison table is the same. Phase 5 (heterogeneity by exposure intensity) is even more important when you have only one shock to study.
My treatment is continuous (e.g., dose), not binary. What changes?
Use Callaway-Goodman-Bacon-Sant'Anna (2024, NBER WP 32117). The framework extends Approach 2 to continuous treatment by stratifying on dose level. Key change: parallel trends must hold at each dose level, not just on average. In the R ecosystem, did package supports continuous treatment in v2.1+; alternatively DIDmultiplegtDYN handles this.
How many pre-periods do I need?
At least 4, ideally 12+ at your chosen frequency. Power for the pre-trend test scales with โˆš(Kโป). With fewer than 4 leads, you cannot defensibly test parallel trends โ€” Rambachan-Roth bounds become the only credible story. If you have โ‰ค 4 pre-periods, lead with the sensitivity bounds figure rather than the F-test.
Five estimators give different answers. Which is "right"?

This is exactly when the Goodman-Bacon (2021) decomposition earns its keep. Run it to see which 2ร—2 comparisons drive the divergence:

  • If TWFE differs from CS/BJS/SA but the latter three agree, TWFE is contaminated by negative weights โ€” drop it.
  • If CS, BJS, and SA all agree but dC-dH disagrees, you may have a multi-treatment issue that single-treatment estimators are mis-handling.
  • If all five give different answers, your data has heavy treatment-effect heterogeneity. Report the most-conservative estimate as headline and the others in the robustness table.
My panel is unbalanced. Is that a problem?
Yes โ€” and one most papers handle poorly. Borusyak-Jaravel-Spiess imputation tolerates unbalanced panels gracefully (it imputes counterfactuals from the available pre-period). TWFE on unbalanced panels gives weighted averages that depend on which periods each unit is observed โ€” Step 3 attrition diagnostics are non-negotiable. If attrition is differential between treated and control, plan inverse-probability-weighting (Sant'Anna-Zhao 2020 doubly-robust framework).
How do I decide between Approach 1 (stacked), Approach 2 (joint), and Approach 3 (sequential)?

Three-line decision rule:

  • Approach 1 (stacked) when shock A and shock B affect mostly disjoint units, or you want each shock's effect cleanly without contamination.
  • Approach 2 (joint) when the interaction \(\theta_{k,l}^{AB}\) is the economic object of interest (e.g., "does B amplify A?"). This is your case.
  • Approach 3 (sequential) when shock B is only policy-relevant conditional on A having occurred (e.g., a follow-up regulation on top of an existing policy).
What if shock A and shock B happened almost simultaneously?
If \(t_B - t_A < K^+\) (post-window length), you cannot separately identify the two adoption curves from the data alone. Three options: (i) treat them as one combined shock with an aggregate adoption curve; (ii) use external exposure-intensity measures to decompose them (e.g., one shock affects category X more, the other affects category Y more); (iii) explicitly state the joint effect \(\theta_k^{A+B}\) as the identifiable quantity and don't claim separate effects.

6. Software ecosystem โ€” R packages

All packages required for the 22-step Playbook. Versions tested on R 4.4+ as of May 2026.

Package Used for Playbook steps Install
fixestTWFE + Sun-Abraham IW; very fast; main workhorse5, 6a, 7, 9, 11install.packages("fixest")
didCallaway-Sant'Anna group-time ATT; continuous treat (v2.1+)6b, 13install.packages("did")
didimputationBorusyak-Jaravel-Spiess imputation estimator6c, 18install.packages("didimputation")
DIDmultiplegtDYNde Chaisemartin-D'Haultfล“uille multi-treatment + intertemporal6d, 16install.packages("DIDmultiplegtDYN")
etwfeWooldridge extended TWFE; single-regression saturation6einstall.packages("etwfe")
HonestDiDRambachan-Roth honest sensitivity bounds10, 21remotes::install_github("asheshrambachan/HonestDiD")
fwildclusterbootWild cluster bootstrap (Roodman et al boottest port)12install.packages("fwildclusterboot")
bacondecompGoodman-Bacon TWFE decomposition diagnostic6 (diag)install.packages("bacondecomp")
tidyverseData wrangling (dplyr, tidyr, ggplot2 for figures)1, 20install.packages("tidyverse")
modelsummaryRegression-table presentation (Step 19 robustness table)19install.packages("modelsummary")

One-liner: install everything

install.packages(c("fixest", "did", "didimputation", "DIDmultiplegtDYN",
                   "etwfe", "fwildclusterboot", "bacondecomp",
                   "tidyverse", "modelsummary"))
remotes::install_github("asheshrambachan/HonestDiD")

Stata users: primary packages are did_imputation (Borusyak-Jaravel-Spiess), did_multiplegt_dyn (de Chaisemartin-D'Haultfล“uille), csdid (Callaway-Sant'Anna), eventstudyinteract (Sun-Abraham), boottest (Roodman wild bootstrap), honestdid (Rambachan-Roth). All available via ssc install or net install.

Filter by type

Filter by topic

74 papers Click rows to view source

๐Ÿ“‹ Methodology

Source filter: 23 venues โ€” Top-5 economics (AER, AER P&P, AER Insights, QJE, JPE, Econometrica, REStud); econ field journals (JEEA, AEJ Applied/Macro/Micro/Policy, JEP, JEL, J Labor Econ, REStat, J Public Econ, J Devt/Health/Monetary Econ, IER); NBER Working Paper series; econometrics methods (J Econometrics, Quantitative Economics).

Topic filter: "difference-in-differences" or "event-study" in title, plus targeted searches for canonical methodology landmarks (Goodman-Bacon, Callaway-Sant'Anna, Sun-Abraham, Borusyak-Jaravel-Spiess, Rambachan-Roth, Roth, de Chaisemartin-D'Haultfล“uille, Wooldridge, Athey-Imbens) and seminal empirical applications (Card-Krueger minimum wage, Autor-Dorn-Hanson China shock, Chetty-Hendren-Katz MTO, Acemoglu-Restrepo robots, Cengiz-Dube-Lindner-Zipperer, Finkelstein Oregon, etc).

Time window: 2007-2026 (manually-included Card-Krueger 1994 as the foundational reference). Citation source: OpenAlex citation counts as of May 2026 โ€” a proxy for Google Scholar. The methodology deep-dive was synthesized by a parallel agent run; the regression-equation specifications were validated against Roth-Sant'Anna-Bilinski-Poe (2023) What's Trending in Difference-in-Differences?.