2025-07-11 | | Total: 12
This paper introduces a definition of ideological polarization of an electorate around a particular central point. By being flexible about the location or width of the center, this measure enables the researcher to analyze polarization around any point of interest. The paper then applies this approach to US voter survey data between 2004 and 2020, showing how polarization between right-of-center voters and the rest of the electorate was increasing gradually, while polarization between left-wingers and the rest was originally constant and then rose steeply. It also shows how, following elections, polarization around left-wing positions decreased while polarization around right-wing positions increased. Furthermore, the paper shows how this measure can be used to find cleavage points around which polarization changed the most. I then show how ideological polarization as defined here is related to other phenomena, such as affective polarization and increased salience of divisive issues.
This paper studies matching markets where institutions are matched with possibly more than one individual. The matching market contains some couples who view the pair of jobs as complements. First, we show by means of an example that a stable matching may fail to exist even when both couples and institutions have responsive preferences. Next, we provide conditions on couples' preferences that are necessary and sufficient to ensure a stable matching for every preference profile where institutions may have any responsive preference. Finally, we do the same with respect to institutions' preferences, that is, we provide conditions on institutions' preferences that are necessary and sufficient to ensure a stable matching for every preference profile where couples may have any responsive preference.
We study many-to-one matching problems between institutions and individuals, where each institution may be matched to multiple individuals. The matching market includes couples, who view pairs of institutions as complementary. Institutions' preferences over sets of individuals are assumed to satisfy responsiveness, whereas couples' preferences over pairs of institutions may violate responsiveness. In this setting, we first assume that all institutions share a common preference ordering over individuals, and we establish: (i) a complete characterization of all couples' preference profiles for which a stable matching exists, under the additional assumption that couples violate responsiveness only to ensure co-location at the same institution, and (ii) a necessary and sufficient condition on the common institutional preference such that a stable matching exists when couples may violate responsiveness arbitrarily. Next, we relax the common preference assumption, requiring institutions to share a common ranking only over the members of each couple. Under this weaker assumption, we provide: (i) a complete characterization of all couples' preferences for which a stable matching exists, and (ii) a sufficient condition on individuals' preferences that guarantees the existence of a stable matching.
With stakeholder-level in-market data, we conduct a comparative analysis of machine learning (ML) for forecasting electricity prices in Singapore, spanning 15 individual models and 4 ensemble approaches. Our empirical findings justify the three virtues of ML models: (1) the virtue of capturing non-linearity, (2) the complexity (Kelly et al., 2024) and (3) the l2-norm and bagging techniques in a weak factor environment (Shen and Xiu, 2024). Simulation also supports the first virtue. Penalizing prediction correlation improves ensemble performance when individual models are highly correlated. The predictability can be translated into sizable economic gains under the mean-variance framework. We also reveal significant patterns of time-series heterogeneous predictability across macro regimes: predictability is clustered in expansion, volatile market and extreme geopolitical risk periods. Our feature importance results agree with the complex dynamics of Singapore's electricity market after de regulation, yet highlight its relatively supply-driven nature with the continued presence of strong regulatory influences.
This paper develops a high-frequency economic indicator using a Bayesian Dynamic Factor Model estimated with mixed-frequency data. The model incorporates weekly, monthly, and quarterly official indicators, and allows for dynamic heterogeneity and stochastic volatility. To ensure temporal consistency and avoid irregular aggregation artifacts, we introduce a pseudo-week structure that harmonizes the timing of observations. Our framework integrates dispersed and asynchronous official statistics into a unified High-Frequency Economic Index (HFEI), enabling real-time economic monitoring even in environments characterized by severe data limitations. We apply this framework to construct a high-frequency indicator for Ecuador, a country where official data are sparse and highly asynchronous, and compute pseudo-weekly recession probabilities using a time-varying mean regime-switching model fitted to the resulting index.
Which level of voting costs is optimal in a democracy? This paper argues that intermediate voting costs - what we term a "Midcost democracy" - should be avoided, as they fail to ensure that electoral outcomes reflect the preferences of the majority. We study a standard binary majority decision in which a majority of the electorate prefers alternative A over alternative B. The population consists of partisan voters, who always participate, and non-partisan voters, who vote only when they believe their participation could be pivotal, given that voting entails a cost. We show that the probability of the majority-preferred alternative A winning is non-monotonic in the level of voting costs. Specifically, when voting costs are either high or negligible, alternative A wins in all equilibria. However, at intermediate cost levels, this alignment breaks down. These findings suggest that democratic systems should avoid institutional arrangements that lead to moderate voting costs, as they may undermine the majority principle.
We study the identification of dynamic discrete choice models with sophisticated, quasi-hyperbolic time preferences under exclusion restrictions. We consider both standard finite horizon problems and empirically useful infinite horizon ones, which we prove to always have solutions. We reduce identification to finding the present-bias and standard discount factors that solve a system of polynomial equations with coefficients determined by the data and use this to bound the cardinality of the identified set. The discount factors are usually identified, but hard to precisely estimate, because exclusion restrictions do not capture the defining feature of present bias, preference reversals, well.
Given the rapid adoption of generative AI and its potential to impact a wide range of tasks, understanding the effects of AI on the economy is one of society's most important questions. In this work, we take a step toward that goal by analyzing the work activities people do with AI, how successfully and broadly those activities are done, and combine that with data on what occupations do those activities. We analyze a dataset of 200k anonymized and privacy-scrubbed conversations between users and Microsoft Bing Copilot, a publicly available generative AI system. We find the most common work activities people seek AI assistance for involve gathering information and writing, while the most common activities that AI itself is performing are providing information and assistance, writing, teaching, and advising. Combining these activity classifications with measurements of task success and scope of impact, we compute an AI applicability score for each occupation. We find the highest AI applicability scores for knowledge work occupation groups such as computer and mathematical, and office and administrative support, as well as occupations such as sales whose work activities involve providing and communicating information. Additionally, we characterize the types of work activities performed most successfully, how wage and education correlate with AI applicability, and how real-world usage compares to predictions of occupational AI impact.
We propose a novel multi-task neural network approach for estimating distributional treatment effects (DTE) in randomized experiments. While DTE provides more granular insights into the experiment outcomes over conventional methods focusing on the Average Treatment Effect (ATE), estimating it with regression adjustment methods presents significant challenges. Specifically, precision in the distribution tails suffers due to data imbalance, and computational inefficiencies arise from the need to solve numerous regression problems, particularly in large-scale datasets commonly encountered in industry. To address these limitations, our method leverages multi-task neural networks to estimate conditional outcome distributions while incorporating monotonic shape constraints and multi-threshold label learning to enhance accuracy. To demonstrate the practical effectiveness of our proposed method, we apply our method to both simulated and real-world datasets, including a randomized field experiment aimed at reducing water consumption in the US and a large-scale A/B test from a leading streaming platform in Japan. The experimental results consistently demonstrate superior performance across various datasets, establishing our method as a robust and practical solution for modern causal inference applications requiring a detailed understanding of treatment effect heterogeneity.
The Pandora's box problem (Weitzman 1979) is a core model in economic theory that captures an agent's (Pandora's) search for the best alternative (box). We study an important generalization of the problem where the agent can either fully open boxes for a certain fee to reveal their exact values or partially open them at a reduced cost. This introduces a new tradeoff between information acquisition and cost efficiency. We establish a hardness result and employ an array of techniques in stochastic optimization to provide a comprehensive analysis of this model. This includes (1) the identification of structural properties of the optimal policy that provide insights about optimal decisions; (2) the derivation of problem relaxations and provably near-optimal solutions; (3) the characterization of the optimal policy in special yet non-trivial cases; and (4) an extensive numerical study that compares the performance of various policies, and which provides additional insights about the optimal policy. Throughout, we show that intuitive threshold-based policies that extend the Pandora's box optimal solution can effectively guide search decisions.
Time-series models like ARIMA remain widely used for forecasting but limited to linear assumptions and high computational cost in large and complex datasets. We propose Galerkin-ARIMA that generalizes the AR component of ARIMA and replace it with a flexible spline-based function estimated by Galerkin projection. This enables the model to capture nonlinear dependencies in lagged values and retain the MA component and Gaussian noise assumption. We derive a closed-form OLS estimator for the Galerkin coefficients and show the model is asymptotically unbiased and consistent under standard conditions. Our method bridges classical time-series modeling and nonparametric regression, which offering improved forecasting performance and computational efficiency.
We introduce a novel extension of the influential changes-in-changes (CiC) framework [Athey and Imbens, 2006] to estimate the average treatment effect on the treated (ATT) and distributional causal estimands in panel data settings with unmeasured confounding. While CiC relaxes the parallel trends assumption inherent in difference-in-differences (DiD), existing approaches typically accommodate only a single scalar unobserved confounder and rely on monotonicity assumptions between the confounder and the outcome. Moreover, current formulations lack inference procedures and theoretical guarantees that accommodate continuous covariates. Motivated by the intricate nature of confounding in empirical applications and the need to incorporate continuous covariates in a principled manner, we make two key contributions in this technical report. First, we establish nonparametric identification under a novel set of assumptions that permit high-dimensional unmeasured confounders and non-monotonic relationships between confounders and outcomes. Second, we construct efficient estimators that are Neyman orthogonal to infinite-dimensional nuisance parameters, facilitating valid inference even in the presence of high-dimensional continuous or discrete covariates and flexible machine learning-based nuisance estimation.