top of page
Publications (2014 onward)

Hazlett, C., Ramos, Antonio P., Smith, S. (2023). Better individual-level risk models can improve the targeting and life-saving potential of early-mortality interventions. Nature: Scientific Reports. [paper]

Infant mortality remains high and uneven in much of sub-Saharan Africa. Even low-cost, highly effective therapies can only save lives in proportion to how successfully they can be targeted to those children who, absent the treatment, would have died. This places great value on maximizing the accuracy of any targeting or means-testing algorithm. Yet, the interventions that countries deploy in hopes of reducing mortality are often targeted based on simple models of wealth or income or a few additional variables. Examining 22 countries in sub-Saharan Africa, we illustrate the use of flexible (machine learning) risk models employing up to 25 generally available pre-birth variables from the Demographic and Health Surveys. Using these models, we construct risk scores such that the 10 percent of the population at highest risk account for 15-30 percent of infant mortality, depending on the country. Successful targeting in these models turned on several variables other than wealth, while models that employ only wealth data perform little or no better than chance. Consequently, employing such data and models to predict high-risk births in the countries studied flexibly could substantially improve the targeting and thus the life-saving potential of existing interventions.

Hazlett, C., Parente, F. (2023). From "Is it unconfounded?" to "How much confounding would it take?": Applying the sensitivity-based approach to assess causes of support for peace in Colombia. Journal of Politics, 85(3). [paper]

Attention to the credibility of causal claims has increased tremendously in recent years. When relying on observational data, debate often centers on whether investigators have ruled out any bias due to confounding. However, the relevant scientific question is generally not whether bias is precisely zero, but whether it is problematic enough to alter one’s research conclusion. We argue that sensitivity analyses would improve research practice by showing how results would change under plausible degrees of confounding, or equivalently, by revealing what one must argue about the strength of confounding to sustain a research conclusion. This would improve scrutiny of studies in which non-zero bias is expected, and of those where authors argue for zero bias but results may be fragile to confounding too weak to be ruled out. We illustrate this using off-the-shelf sensitivity tools to examine two potential influences on support for the FARC peace agreement in Colombia.

Fabbe, K., Hazlett, C., Sinmazdemir, T. (2023). Threat perceptions, loyalties and attitudes towards peace: The effects of civilian victimization among Syrian refugees in Turkey. Conflict Management and Peace Science. [paper]

For refugees who have fled civil conflict, do experiences of victimization by one armed group push them to support the opposing armed groups? Or, does victimization cause refugees to revoke their support for all armed groups, whatever side they are on, and call instead for peace? This paper studies the effect of civilian victimization on threat perceptions, loyalties, and attitudes toward peace in the context of Syrian refugees in Turkey, many of whom faced regime-caused violence prior to their departure. Our research strategy leverages variation in home destruction caused by barrel bombs to examine the effect of violence on refugees’ views. We find that refugees who lose their home to barrel bombs withdraw support from armed actors and are more supportive of ending the war and finding peace. Suggestive evidence shows that while victims do not disengage from issues in Syria, they do show less optimism about an opposition victory.

Cesar B. Martinez-Alvarez, Chad Hazlett, Paasha Mahdavi, and Michael L. Ross. (2022). Political Leadership Has Limited Impact on Fossil Fuel Taxes and Subsidies. Proceedings of the National Academy of Science, 119(47). [paper]

For countries to rapidly decarbonize they need strong leadership, according to both academic studies and popular accounts. But leadership on climate issues is difficult to measure and its importance is unclear. We use original data to investigate the role of 623 presidents, prime ministers, and monarchs in 155 countries in their countries' climate policies, focusing on the reform of gasoline taxes and subsidies. Our findings suggest that the role of leadership is surprisingly limited and often ephemeral. In most countries, leader tenures fail to explain variation in gasoline taxes and subsidies. This holds true regardless of the leader's age, gender, education, or political ideology. Rulers who govern during an economic crisis perform no better or worse than other rulers. Even Presidents and Prime Ministers who were recognized by the United Nations for environmental leadership had no more success than other leaders in reducing subsidies or raising fuel taxes. Where leaders appear to play an important role---primarily in countries with large subsidies---their reforms often failed, with subsidies returning to pre-reform levels within the first 12 months 62\% of the time, and within five years 87\% of the time. Our findings suggest that leaders of all types find it exceptionally hard to have a lasting impact on gasoline taxes and subsidies.

Ablai Akhazhanov, Anupreeta More, Arash Amini, Chad Hazlett, Tomaso Treu, Simon Birrer, and the DES Collaborative. (2022). Finding quadruply imaged quasars with machine learning -- I. Methods. Monthly Notices of the Royal Astronomical Society. 513(2). [paper]

Strongly lensed quadruply imaged quasars (quads) are extraordinary objects. They are very rare in the sky and yet they provide unique information about a wide range of topics, including the expansion history and the composition of the Universe, the distribution of stars and dark matter in galaxies, the host galaxies of quasars, and the stellar initial mass function. Finding them in astronomical images is a classic ‘needle in a haystack’ problem, as they are outnumbered by other (contaminant) sources by many orders of magnitude. To solve this problem, we develop state-of-the-art deep learning methods and train them on realistic simulated quads based on real images of galaxies taken from the Dark Energy Survey, with realistic source and deflector models, including the chromatic effects of microlensing. The performance of the best methods on a mixture of simulated and real objects is excellent, yielding area under the receiver operating curve in the range of 0.86–0.89. Recall is close to 100 per cent down to total magnitude i∼21 indicating high completeness, while precision declines from 85 per cent to 70 per cent in the range i∼17–21. The methods are extremely fast: training on 2 million samples takes 20 h on a GPU machine, and 10^8 multiband cut-outs can be evaluated per GPU-hour. The speed and performance of the method pave the way to apply it to large samples of astronomical sources, bypassing the need for photometric pre-selection that is likely to be a major cause of incompleteness in current samples of known quads.

Graeme Blair, Mohammed Bukar, Rebecca Littman, Elizabeth R. Nugent, Rebecca Wolfe, Benjamin Crisman, Anthony Etim, Chad Hazlett, Jiyoung Kim. (2021). Trusted authorities can change minds and shift norms during conflict. Proceedings of the National Academy of Science, 118 (42). [paper]

The reintegration of former members of violent extremist groups is a pressing policy challenge. Governments and policymakers often have to change minds among reticent populations and shift perceived community norms in order to pave the way for peaceful reintegration. How can they do so on a mass scale? Previous research shows that messages from trusted authorities can be effective in creating attitude change and shifting perceptions of social norms. In this study, we test whether messages from religious leaders—trusted authorities in many communities worldwide—can change minds and shift norms around an issue related to conflict resolution: the reintegration of former members of violent extremist groups. Our study takes place in Maiduguri, Nigeria, the birthplace of the violent extremist group Boko Haram. Participants were randomly assigned to listen to either a placebo radio message or to a treatment message from a religious leader emphasizing the importance of forgiveness, announcing the leader’s forgiveness of repentant fighters, and calling on followers to forgive. Participants were then asked about their attitudes, intended behaviors, and perceptions of social norms surrounding the reintegration of an ex–Boko Haram fighter. The religious leader message significantly increased support for reintegration and willingness to interact with the ex-fighter in social, political, and economic life (8 to 10 percentage points). It also shifted people’s beliefs that others in their community were more supportive of reintegration (6 to 10 percentage points). Our findings suggest that trusted authorities such as religious leaders can be effective messengers for promoting peace.

Chang, T.S., Ding, Y., Freund, M.K., Johnson, R., Schwarz, T., Yabu, J.M., Hazlett, C., Chiang, J.N., Wulf, D.A., Antonio, A.L. and Ariannejad, M., (2021). Pre-existing conditions in Hispanics/Latinxs that are COVID-19 risk factors. Iscience, 24(3), p.102188. [paper]

Coronavirus disease 2019 (COVID-19) has exposed health care disparities in minority groups including Hispanics/Latinxs (HL). Studies of COVID-19 risk factors for HL have relied on county-level data. We investigated COVID-19 risk factors in HL using individual-level, electronic health records in a Los Angeles health system between March 9, 2020, and August 31, 2020. Of 9,287 HL tested for SARS-CoV-2, 562 were positive. HL constituted an increasing percentage of all COVID-19 positive individuals as disease severity escalated. Multiple risk factors identified in Non-Hispanic/Latinx whites (NHL-W), like renal disease, also conveyed risk in HL. Pre-existing nonrheumatic mitral valve disorder was a risk factor for HL hospitalization but not for NHL-W COVID-19 or HL influenza hospitalization, suggesting it may be a specific HL COVID-19 risk. Admission laboratory values also suggested that HL presented with a greater inflammatory response. COVID-19 risk factors for HL can help guide equitable government policies and identify at-risk populations.

Blum, A., Hazlett, C., Posner, D. (2021). Measuring ethnic bias: Can misattribution-based tools from social psychology reveal group biases that economics games cannot. Political Analysis. [manuscript]

Economics games such as the Dictator and Public Goods Games have been widely used to measure ethnic bias in political science and economics. Yet these tools may fail to measure bias as intended because they are vulnerable to self-presentational concerns and/or fail to capture bias rooted in more automatic associative and affective reactions. We examine a set of misattribution-based approaches, adapted from social psychology, that may sidestep these concerns. Participants in Nairobi, Kenya completed a series of common economics games alongside versions of these misattribution tasks adapted for this setting, each designed to detect bias towards non-coethnics relative to coethnics. Several of the misattribution tasks show clear evidence of (expected) bias, arguably reflecting differences in positive/negative affect and heightened threat perception toward non-coethnics. The Dictator and Public Goods Games, by contrast, are unable to detect any bias in behavior towards non-coethnics versus coethnics. We conclude that researchers of ethnic and other biases may benefit from including misattribution-based procedures in their tool kits to widen the set of biases to which their investigations are sensitive.

Hazlett, C., Wainstein, L. (2020). Understanding, choosing, and unifying multilevel and fixed effect approaches. Political Analysis. [manuscript]

When working with grouped data, investigators may choose between “fixed effects” models (FE) with specialized (e.g., cluster-robust) standard errors, or “multilevel models" (MLMs) employing “random effects”. We review the claims given in published works regarding this choice, then clarify how these approaches work and compare by showing that: (i) random effects employed in MLMs are simply “regularized” fixed effects; (ii) unmodified MLMs are consequently susceptible to bias—but there is a longstanding remedy; and (iii) the “default” MLM standard errors rely on narrow assumptions that can lead to under coverage in many settings. Our review of over 100 papers using MLM in political science, education, and sociology show that these “known” concerns have been widely ignored in practice. We describe how to debias MLM’s coeicient estimates, and provide an option to more flexibly estimate their standard errors. Most illuminating, once MLMs are adjusted in these two ways the point estimate and standard error for the target coeicient are exactly equal to those of the analogous FE model with cluster-robust standard errors. For investigators working with observational data and who are interested only in inference on the target coefficient, either approach is equally appropriate and preferable to uncorrected MLM.


Hazlett, C., Maokola, W., Wulf, D. (2020). Inference without randomization or ignorability: A stability-controlled quasi-experiment on the prevention of tuberculosis. Statistics in Medicine, 39(28), 4149-4186. DOI: 10.1002/sim.8717 [manuscript]

When determining the effectiveness of a new treatment, randomized trials are not always possible or ethical, or we may wish to estimate the effect a treatment has actually had, among a population that already received it, through an unknown selection process. The stability-controlled quasi-experiment (SCQE) (Hazlett, 2019) replaces randomization with an assumption on the outcome’s “baseline trend,” or more precisely, the change in average non-treatment potential outcome across successive cohorts. We describe and extend this method, and provide its first direct application: examining the real world effectiveness of isoniazid preventive therapy (IPT) to reduce tuberculosis (TB) incidence among people living with HIV in Tanzania. Since IPT became available in the clinics we studied, 27% of new patients received it, selected through an unknown process. Within a year, 16% of those not on IPT developed TB, compared to fewer than 1% of those taking IPT. We find that (i) despite this compelling naive comparison, if the baseline trend is assumed to be flat, the effect of IPT on TB incidence would be -2 percentage points (pp) with a confidence interval of [-10, 5]; (ii) to argue that IPT was beneficial requires believing that the (non-treatment) incidence rate would have risen by at least 0.5pp per year in the absence of the treatment; and (iii) to argue IPT was not harmful requires arguing that the baseline trend did not fall by more than 1pp per year. We also find that those who were given treatment may have been less likely to develop TB anyway. This illustrates how the SCQE approach extracts valid causal information from observational data while protecting against over-confidence.

See also:  this manuscript on assessing the effect of remdesdivir on COVID-19 outside of randomized trials, or this manuscript on assessing the effects of hydroychloroquine and dexamethasone, (or these slides)


Hazlett, C., Mildenberger, M. (2020). Wildfire Exposure Increases Pro-Climate Political Behaviors. American Political Science Review. Online 15 July 2020. [paper]

One political barrier to climate reforms is the temporal mismatch between short-term policy costs and long-term policy benefits. Will public support for climate reforms increase as climate-related disasters make the short-term costs of inaction more salient? Leveraging variation in the timing of Californian wildfires, we evaluate how exposure to a climate-related hazard influences political behavior, rather than self-reported attitudes or behavioral intentions. We show that wildfires increased support for costly, climate-related ballot measures by 5 to 6 percentage points for those living within 5km of a recent wildfire, decaying to near zero beyond a distance of 15km. This effect is concentrated in Democratic-voting areas, and nearly zero in Republican-dominated areas. We conclude that experienced climate threats can enhance willingness-to-act but largely in places where voters are known to believe in climate change.

Conley, B., Hazlett, C. (2020). How very massive atrocities end: A dataset and typology. Journal of Peace Research, 30. [manuscriptadditional tablesdataset, data manuallink to case studies]

Understanding how the most severe mass atrocities have historically come to an end may aid in designing policy interventions to more rapidly terminate future episodes. To facilitate research in this area, we construct a new dataset covering all 43 very large mass atrocities perpetrated by governments or non-governments since 1945 with at least 50,000 civilian fatalities. This article introduces and summarizes these data, including an inductively generated typology of three major ending types: those in which (i) violence is carried out to its intended conclusion (37%); (ii) the perpetrating force is driven out of power militarily (26%); or (iii) the perpetrators shift to a different strategy no longer involving mass atrocities against civilians (37%). We find that international actors play a range of important roles in endings, often involving encouragement and support for changes in strategy that reduce mass killings. Endings could be attributed principally to armed foreign interventions in only four cases, three of which involved regime change. Within the cases we study, no ending was attributable to a neutral peacekeeping mission.

Hazlett, C. (2020). Kernel Balancing: A flexible non-parametric weighting procedure for estimating causal effects.Statistica Sinica, 30, 1155-1189. [paper] [supplement] [R package]

Matching and weighting methods are widely used to estimate causal effects when adjusting for a set of observables is required. Matching is appealing for its non-parametric nature, but with continuous variables, is not guaranteed to remove bias. Weighting techniques choose weights on units to ensure pre-specified functions of the covariates have equal (weighted) means for the treated and control group. This assures unbiased effect estimation only when the potential outcomes are linear in those pre-specified functions of the observables. Kernel balancing begins by assuming the expectation of the non-treatment potential outcome conditional on the covariates falls in a large, flexible space of functions associated with a kernel. It then constructs linear bases for this function space and achieves approximate balance on these bases. A worst-case bound on the bias due to this approximation is given and is the target of minimization. Relative to current practice, kernel balancing offers one reasoned solution to the long-standing question of which functions of the covariates investigators should attempt to achieve (and check) balance on. Further, these weights are also those that would make the estimated multivariate density of covariates approximately the same for the treated and control groups, when the same choice of kernel is used to estimate those densities. The approach is fully automated up to the choice of a kernel and smoothing parameter, for which default options and guidelines are provided. An R package, KBAL, implements this approach.

            For R users, a recently updated kbal package can be installed from my GitHub repository:

> devtools::install_github("chadhazlett/kbal")

Hazlett, C.,  Campos E., Tan, P., Truong, H., Loo, S., DiStefano, C., Jeste, S., & Senturk, D. (2020). Principle ERP reduction and analysis: Estimating and using principle ERP waveforms underlying ERPs across tasks, subjects and electrodes. NeuroImage, 212, 116630. [paper] [slides]

Event-related potential (ERP) waveforms are the summation of many overlapping signals. Changes in the peak or mean amplitude of a waveform over a given time period, therefore, cannot reliably be attributed to a particular ERP component of ex ante interest, as is the standard approach to ERP analysis. Though this problem is widely recognized, it is not well addressed in practice. Our approach begins by presuming that any observed ERP waveform — at any electrode, for any trial type, and for any participant — is approximately a weighted combination of signals from an underlying set of what we refer to as principle ERPs, or pERPs. We propose an accessible approach to analyzing complete ERP waveforms in terms of their underlying pERPs. First, we propose the principle ERP reduction (pERP-RED) algorithm for investigators to estimate a suitable set of pERPs from their data, which may span multiple tasks. Next, we provide tools and illustrations of pERP-space analysis, whereby observed ERPs are decomposed into the amplitudes of the contributing pERPs, which can be contrasted across conditions or groups to reveal which pERPs differ (substantively and/or significantly) between conditions/groups. Differences on all pERPs can be reported together rather than selectively, providing complete information on all components in the waveform, thereby avoiding selective reporting or user discretion regarding the choice of which components or windows to use. The scalp distribution of each pERP can also be plotted for any group/condition. We demonstrate this suite of tools through simulations and on real data collected from multiple experiments on participants diagnosed with Autism Spectrum Disorder and Attention Deficit Hyperactivity Disorder. Software for conducting these analyses is provided in the pERPred package for R.

Cinelli, C., Hazlett, C. (2020). Making Sense of Sensitivity: Extending omitted variable bias. Journal of the Royal Statistical Society, Series B (2020). 82, Part1, 39-67.  [manuscript] [Software: R, STATA, Python] [Shiny app]

In this paper we extend the familiar "omitted variable bias" framework, creating a suite of tools for sensitivity analysis of regression coefficients and their standard errors to unobserved confounders that: (i) do not require assumptions about the functional form of the treatment assignment mechanism nor the distribution of the unobserved confounder(s); (ii) can be used to assess the sensitivity to multiple confounders, whether they influence the treatment or the outcome linearly or not; (iii) facilitate the use of expert knowledge to judge the plausibility of sensitivity parameters; and, (iv) can be easily and intuitively displayed, either in concise regression tables or more elaborate graphs. More precisely, we introduce two novel measures for communicating the sensitivity of regression results that can be used for routine reporting. The "robustness value" describes the association unobserved confounding would need to have with both the treatment and the outcome to change the research conclusions. The partial R-squared of the treatment with the outcome shows how strongly confounders explaining all of the outcome would have to be associated with the treatment to eliminate the estimated effect. Next, we provide intuitive graphical tools that allow researchers to make more elaborate arguments about the sensitivity of not only point estimates but also t-values (or p-values and confidence intervals). We also provide graphical tools for exploring extreme sensitivity scenarios in which all or much of the residual variance is assumed to be due to confounders. Finally, we note that a widespread informal "benchmarking" practice can be widely misleading, and introduce a novel alternative that allows researchers to formally bound the strength of unobserved confounders "as strong as" certain covariate(s) in terms of the explained variance of the treatment and/or the outcome. We illustrate these methods with a running example that estimates the effect of exposure to violence in western Sudan on attitudes toward peace. 

Hazlett, C. (2019). Angry or Weary? The effect of personal violence on attitudes towards peace in Darfur. Journal of Conflict Resolution, 64(5), 844-870. [manuscript]

Does exposure to violence motivate individuals to support further violence, or to seek peace? Such questions are central to our understanding of how conflicts evolve, terminate, and recur. Yet, convincing empirical evidence as to which response dominates, even in a specific case, has been elusive, owing to the inability to rule out confounding biases. This paper employs a natural experiment based on the indiscriminacy of violence within villages in Darfur to examine how refugees' experiences of violence affect their attitudes toward peace. The results are consistent with a pro-peace or "weary" response: individuals directly harmed by violence were more likely to report that peace is possible, and less likely to demand execution of their enemies. This provides micro-level evidence supporting earlier country-level work on "war-weariness," and extends the growing literature on the effects of violence on individuals by including attitudes toward peace as an important outcome. These findings suggest that victims harmed by violence during war can play a positive role in settlement and reconciliation processes.

Hazlett, C. (2019). Estimating causal effects of new treatments despite self-selection: The case of experimental medical treatments. Journal of Causal Inference, 7(1). [paper]

Providing terminally ill patients with access to experimental treatments, as allowed by recent “right to try” laws and “expanded access” programs, poses a variety of ethical questions. While practitioners and investigators may assume it is impossible to learn the effects of these treatment without randomized trials, this paper describes a simple tool to estimate the effects of these experimental treatments on those who take them, despite the problem of selection into treatment, and without assumptions about the selection process. The key assumption is that the average outcome, such as survival, would remain stable over time in the absence of the new treatment. Such an assumption is unprovable, but can often be credibly judged by reference to historical data and by experts familiar with the disease and its treatment. Further, where this assumption may be violated, the result can be adjusted to account for a hypothesized change in the non-treatment outcome, or to conduct a sensitivity analysis. The method is simple to understand and implement, requiring just four numbers to form a point estimate. Such an approach can be used not only to learn which experimental treatments are promising, but also to warn us when treatments are actually harmful – especially when they might otherwise appear to be beneficial, as illustrated by example here. While this note focuses on experimental medical treatments as a motivating case, more generally this approach can be employed where a new treatment becomes available or has a large increase in uptake, where selection bias is a concern, and where an assumption on the change in average non-treatment outcome over time can credibly be imposed.

Fabbe, K., Hazlett, C., Sinmazdemir, T. (2019). A Persuasive Peace: Syrian refugees' attitudes towards compromise and civil war termination. Journal of Peace Research, 56(1), 103-117. [paper

Civilians who have fled violent conflict and settled in neighboring countries are integral to processes of civil war termination. Contingent on their attitudes, they can either back peaceful settlements or support warring groups and continued fighting. Attitudes toward peaceful settlement are expected to be especially obdurate for civilians who have been exposed to violence. In a survey of 1,120 Syrian refugees in Turkey conducted in 2016, we use experiments to examine attitudes towards two critical phases of conflict termination -- a ceasefire and a peace agreement. We examine the rigidity/flexibility of refugees' attitudes to see if subtle changes in how wartime losses are framed or in who endorses a peace process can shift willingness to compromise with the incumbent Assad regime.  Our results show, first, that refugees are far more likely to agree to a ceasefire proposed by a civilian as opposed to one proposed by armed actors from either the Syrian government or the opposition. Second, simply describing the refugee community's wartime experience as suffering rather than sacrifice substantially increases willingness to compromise with the regime to bring about peace. This effect remains strong among those who experienced greater violence. Together, these results show that even among a highly pro-opposition population that has experienced severe violence, willingness to settle and make peace are remarkably flexible and dependent upon these cues.

Fong, C., Hazlett, C., Imai, K. (2018). Covariate Balancing Propensity Score for a Continuous Treatment: Application to the efficacy of political advertisements. Annals of Applied Statistics, 12(1), 156-177. [paper] [R package]

​Propensity score matching and weighting are popular methods when estimating causal effects in observational studies. Beyond the assumption of unconfoundedness, however, these methods also require the model for the propensity score to be correctly specified. The recently proposed covariate balancing propensity score (CBPS) methodology increases the robustness to model misspecification by directly optimizing sample covariate balance between the treatment and control groups. In this paper, we extend the CBPS to a continuous treatment. We propose the covariate balancing generalized propensity score (CBGPS) methodology, which minimizes the association between covariates and the treatment. We develop both parametric and nonparametric approaches and show their superior performance over the standard maximum likelihood estimation in a simulation study. The CBGPS methodology is applied to an observational study, whose goal is to estimate the causal effects of political advertisements on campaign contributions. We also provide open-source software that implements the proposed methods.

           For R users, CBPS can be installed from CRAN:


Hazlett, C., Berinsky, A. (2017). Stress-testing the affect misattribution procedure: Heterogeneous control of affect misattribution procedure effects under incentives.  British Journal of Social Psychology, 57(1), 61-74. [paper]

The affect misattribution procedure (AMP) is widely used to measure sensitive attitudes towards classes of stimuli, by estimating the effect that affectively charged prime images have on subsequent judgements of neutral target images. We test its resistance to efforts to conceal one’s attitudes, by replicating the standard AMP design while offering small incentives to conceal attitudes towards the prime images. We find that although the average AMP effect remains positive, it decreases significantly in magnitude. Moreover, this reduction in the mean AMP effect under incentives masks large heterogeneity: one subset of individuals continues to experience the "full" AMP effect, while another reduces their effect to approximately zero. The AMP thus appears to be resistant to efforts to conceal one’s attitudes for some individuals but is highly controllable for others. We further find that those individuals with high self-reported effort to avoid the influence of the prime are more often able to eliminate their AMP effect. We conclude by discussing possible mechanisms.

Ross, M.L, Hazlett, C., & Mahdavi P. (2017). Global progress and backsliding on gasoline taxes and subsidies. Nature Energy, 2(1), 1-6. [paper]

To reduce greenhouse gas emissions in the coming decades, many governments will have to reform their energy policies. These policies are dicult to measure with any precision. As a result, it is unclear whether progress has been made towards important energy policy reforms, such as reducing fossil fuel subsidies. We use new data to measure net taxes and subsidies for gasoline in almost all countries at the monthly level and find evidence of both progress and backsliding. From 2003 to 2015, gasoline taxes rose in 83 states but fell in 46 states. During the same period, the global mean gasoline tax fell by 13.3% due to faster consumption growth in countries with lower taxes. Our results suggest that global progress towards fossil fuel price reform has been mixed, and that many governments are failing to exploit one of the most cost-eective policy tools for limiting greenhouse gas emissions.

Ferwerda, J., Hainmueller, J., Hazlett, C. (2017). KRLS: A Stata package for kernel-based regularized least squares. Journal of Statistical Software, 55(2). [paper]

       For R users, KRLS can be installed from CRAN:


       For STATA users, it can be installed from the SSC repository:

   >ssc install krls, all replace

de Waal, A., Davenport, C., Hazlett, C., Kennedy, J. (2014). The Epidemiology of Lethal Violence in Darfur: Using micro-data to explore complex patterns of ongoing armed conflict. Social Science & Medicine, 120.

This article describes and analyzes patterns of lethal violence in Darfur, Sudan, during 2008–09, drawing upon a uniquely detailed dataset generated by the United Nations–African Union hybrid operation in Darfur (UNAMID), combined with data generated through aggregation of reports from open-source venues. These data enable detailed analysis of patterns of perpetrator/victim and belligerent groups over time, and show how violence changed over the four years following the height of armed conflict in 2003–05. During the reference period, violent incidents were sporadic and diverse and included: battles between the major combatants; battles among subgroups of combatant coalitions that were ostensibly allied; inter-tribal conflict; incidents of one-sided violence against civilians by different parties; and incidents of banditry. The conflict as a whole defies easy categorization. The exercise illustrates the limits of existing frameworks for categorizing armed violence and underlines the importance of rigorous microlevel data collection and improved models for understanding the dynamics of collective violence. By analogy with the use of the epidemiological data for infectious diseases to help design emergency health interventions, we argue for improved use of data on lethal violence in the design and implementation of peacekeeping, humanitarian and conflict resolution interventions.

Hainmueller, J., Hazlett, C. ​​(2014). Kernel Regularized Least Squares: Reducing misspecification bias with a flexible and interpretable machine learning approach. Political Analysis, 22(2). [paper] [R package][appendix]

​We propose the use of Kernel Regularized Least Squares (KRLS) for social science modeling and inference problems. KRLS borrows from machine learning methods designed to solve regression and classification problems without relying on linearity or additivity assumptions. The method constructs a flexible hypothesis space that uses kernels as radial basis functions and finds the best-fitting surface in this space by minimizing a complexity-penalized least squares problem. We argue that the method is well-suited for social science inquiry because it avoids strong parametric assumptions, yet allows interpretation in ways analogous to generalized linear models while also permitting more complex interpretation to examine non-linearities, interactions, and heterogeneous effects. We also extend the method in several directions to make it more effective for social inquiry, by (1) deriving estimators for the pointwise marginal effects and their variances, (2) establishing unbiasedness, consistency, and asymptotic normality of the KRLS estimator under fairly general conditions, (3) proposing a simple automated rule for choosing the kernel bandwidth, and (4) providing companion software. We illustrate the use of the method through simulations and empirical examples.

Work in Progress / Under Review

Cinelli, C., Hazlett, C. An Omitted Variable Bias Framework for Sensitivity Analysis of Instrumental Variables. [draft]

We develop an "omitted variable bias" framework for sensitivity analysis of instrumental variable (IV) estimates that is immune to "weak instruments," naturally handles multiple "side-effects" (violations of the exclusion restriction assumption) and "confounders" (violations of the ignorability of the instrument assumption), exploits expert knowledge to bound sensitivity parameters, and can be easily implemented with standard software. Conveniently, many pivotal conclusions regarding the sensitivity of the IV estimate (e.g. tests against the null hypothesis of zero causal effect) can be reached simply through separate sensitivity analyses of the effect of the instrument on the treatment (the "first stage") and the effect of the instrument on the outcome (the "reduced form"). More specifically, we introduce sensitivity statistics for routine reporting, such as robustness values for IV estimates, describing the minimum strength that omitted variables need to have to change the conclusions of an IV study. Next we provide visual displays that fully characterize the sensitivity of IV point-estimates and confidence intervals to violations of the standard IV assumptions. Finally, we offer formal bounds on the worst possible bias under the assumption that the maximum explanatory power of omitted variables are no stronger than a multiple of the explanatory power of observed variables. We apply our methods in a running example that uses instrumental variables to estimate the returns to schooling.


Hartman, E., Hazlett, C., Sterbenz, C. KPop: A kernel balancing approach for reducing specification assumptions in survey weighting. (R&R) [draft]

Response rates to surveys have declined precipitously. Some researchers have responded by relying more heavily on convenience-based internet samples. This leaves researchers asking not if, but how, to weight survey results to represent their target population. Though practitioners often call upon expert knowledge in constructing their auxiliary vector, X, to use in weighting methods, they face difficult, feasibility-constrained choices regarding which variables to choose, how to coarsen them, and what interactions of other functions of those variables to include in X. Most approaches seek weights on the sampled units that make X have the same mean in the sample as in the population. However, such weights ensure that an outcome variable of interest Y is correctly weighted only if the expectation of Y is linear in X, an unrealistic assumption. We describe kernel balancing for population weighting (KPop) to make samples more similar to populations on the distribution of X, beyond the first moment margin. This approach effectively replaces the design matrix X with a kernel matrix, K, that encodes high-order information about X via the “kernel trick”. We then weight the sampled units so that their average row of K is approximately equal to that of the population, working through a spectral decomposition. This produces good calibration on a wide range of smooth functions of X, without relying on the user to select those functions. We describe the method and illustrate its use in weighting political survey samples, including from the 2016 American presidential election.

Chad Hazlett, Ami Wulf, Bogdan Pasaniuc, Onyebuchi Arah, Kristine Erlandson, Brian Montague. Wulf, D.A., Arah, O. Credible learning of hydroxychloroquine and dexamethasone effects on covid-19 mortality outside of randomized trials. [draft] [slides][shiny app for method]

Objectives: To investigate the effectiveness of hydroxychloroquine and dexamethasone on coronavirus disease (covid-19) mortality using patient data outside of randomized trials.

Design: Phenotypes derived from electronic health records were analyzed using the stability-controlled quasi-experiment (SCQE) to provide a range of possible causal effects of hydroxychloroquine and dexamethasone on covid-19 mortality.

Setting and participants: Data from 2,007 covid-19 positive patients hospitalized  at a large university hospital system over the course of 200 days and not enrolled in randomized trials were analyzed using SCQE. For hydroxychloroquine, we examine a high-use cohort (n=766, days 1 to 43) and a later, low-use cohort (n=548, days 44 to 82). For dexamethasone, we examine a low-use cohort (n=614, days 44 to 101) and high-use cohort (n=622, days 102 to 200).

Outcome measure: 14-day mortality, with a secondary outcome of 28-day mortality.

Results: Hydroxycholoroquine could only have been significantly (p<0.05) beneficial if baseline mortality was at least 6.4 percentage points (55%) lower among patients in the later low-use than the earlier high-use cohort. Hydroxychloroquine instead proves significantly harmful if baseline mortality rose from one cohort to the next by just 0.3 percentage points. Dexamethasone significantly reduced mortality risk if baseline mortality in the later (high-use) cohort (days 101-200) was higher than, the same as, or up to 1.5 percentage points lower than that in the earlier (low-use) cohort (days 44-100). It could only prove significantly harmful if mortality improved from one cohort to the next by 6.8 percentage points due to other causes -- an assumption implying an unlikely 94% reduction in mortality due to other causes, leaving an in-hospital mortality rate of just 0.4%.

Conclusions: The assumptions required for a beneficial effect of hydroxychloroquine on 14 day mortality are difficult to sustain, while the assumptions required for hydroxychloroquine to be harmful are difficult to reject with confidence. Dexamethasone, by contrast, was beneficial under a wide range of plausible assumptions, and was only harmful if a nearly impossible assumption is met. More broadly, the SCQE provides a useful tool for making reasoned, limited and credible inferences from non-randomized uses of experimental therapies, when randomized trials are still ongoing and will take long, or to provide corroborative evidence from different populations.

David Ami Wulf, Brian L Hill, Jeffrey N Chiang, Onyebuchi A. Arah, David Goodman-Meza, & Chad Hazlett. Safely learning from new non-randomized treatments: Assessing the effect of remdesivir on COVID-19 mortality. [draft]

Background: Investigation of treatment effects using patient data from outside of randomized trials is common, but can leave readers overconfident in point estimates that rely on improbable assumptions.


Methods: We analyzed the electronic health records of 136 COVID-19 positive patients hospitalized at a large university hospital system over the course of 110 days early in the pandemic. Through the stability-controlled quasi-experiment (SCQE), we utilized rapid changes in remdesivir usage over this period to provide a range of possible causal effects of remdesivir on COVID-19

mortality (within 28 days).

Results: Remdesivir use was initially high, then dropped during a no-use period (which saw 24.5% mortality), and then returned to high-usage. Remdesivir significantly (p<0.05) reduced mortality risk among those who took it if baseline mortality was at least 1.9 percentage points (8%) higher among patients in the combined high-use period (that is, had remdesivir not been used in this

period either) than those in the middle no-use period. It could only have been harmful if baseline

mortality dropped to 0% in the high-use period.


Conclusions: The assumptions required for a beneficial effect of remdesivir are plausible, but not defensible with confidence, while the assumptions required to declare it harmful are implausible. More broadly, without any assumption of unconfoundedness, the SCQE reveals what inferences can be credibly supported by evidence from observational data, making it useful when randomized

trials have not yet produced clear evidence or in providing estimates for different populations.

Cinelli, C., Ferwerda, J., Hazlett, C. sensemakr: Sensitivity Analysis Tools for OLS in R and Stata. [manuscript]

This paper introduces the package sensemakr for R and Stata, which implements a suite of sensitivity analysis tools for regression models developed in Cinelli and Hazlett (2020a). Given a regression model, sensemakr can compute sensitivity statistics for routine reporting, such as the robustness value, which describes the minimum strength that unobserved confounders need to have to overturn a research conclusion. The package also provides plotting tools that visually demonstrate the sensitivity of point estimates and t-values to hypothetical confounders. Finally, sensemakr implements formal bounds on sensitivity parameters by means of comparison with the explanatory power of observed variables. All these tools are based on the familiar "omitted variable bias" framework, do not require assumptions regarding the functional form of the treatment assignment mechanism nor the distribution of the unobserved confounders, and naturally handle multiple, non-linear confounders. With sensemakr, users can transparently report the sensitivity of their causal inferences to unobserved confounding, thereby enabling a more precise, quantitative debate as to what can be concluded from imperfect observational studies.

Hazlett, C., Xu, Yiqing. Trajectory Balancing: A general reweighting approach to causal inference with time-series cross-sectional data. [draft]

We introduce trajectory balancing, a general weighting approach to causal inference with time-series cross-sectional (TSCS) data. We focus on settings where one or more units is exposed to treatment at a given time, while a set of control units remain untreated. First, we show that many commonly used TSCS methods imply an assumption that each unit’s non-treatment potential outcomes in the post-treatment period are linear in that unit’s pre-treatment outcomes and its time-invariant covariates. Under this assumption, we introduce the mean balancing method that weights control units such that the averages of the pre-treatment outcomes and covariates are approximately equal between the treatment and (weighted) control groups. Second, we relax the linearity assumption and propose the kernel balancing to seek approximate balance on a kernel-based feature expansion of the pre-treatment outcomes and covariates. The resulting approach inherits the ability of synthetic control and latent factor models to tolerate time-varying confounders, but (1) improves feasibility and stability with reduced user discretion; (2) accommodates both short and long pre-treatment time periods with many or few treated units; and (3) balances on the high-order “trajectory” of pre-treatment outcomes rather than their period-wise average. We illustrate this method with simulations and two empirical examples.

bottom of page