Applications

2025-11-14 | | Total: 8

#1 Heuristic Solutions for the Best Secretary Problem [PDF] [Copy] [Kimi] [REL]

Author: Eugene Seong

This paper introduces a heuristic framework for the Best Secretary Problem, where one item must be selected using rank information only. We develop five data-responsive rules extending classical fixed-cutoff methods: an expected-record threshold, an adaptive deviation correction, a probabilistic early-accept rule, a two-phase relaxation, and a local dynamic programming approximation. These rules adjust thresholds sequentially as information accumulates. Simulations across diverse sample sizes, distributions, and autocorrelated settings show that the heuristics match or exceed traditional optimal rules in stability and efficiency. The expected-record rule remains strong despite its simplicity, the adaptive correction performs well under asymmetry, and the adaptive and probabilistic rules reduce average stopping times. An ensemble combining multiple rules yields the most stable performance. Overall, a few intuitive parameters achieve near-optimal results, demonstrating that data-responsive heuristics can effectively extend rank-based optimal stopping to dynamic decision environments.

Subject: Applications

Publish: 2025-11-13 11:25:42 UTC


#2 Two Americas of Well-Being: Divergent Rural-Urban Patterns of Life Satisfaction and Happiness from 2.6 B Social Media Posts [PDF1] [Copy] [Kimi] [REL]

Authors: Stefano Maria Iacus, Giuseppe Porro

Using 2.6 billion geolocated social-media posts (2014-2022) and a fine-tuned generative language model, we construct county-level indicators of life satisfaction and happiness for the United States. We document an apparent rural-urban paradox: rural counties express higher life satisfaction while urban counties exhibit greater happiness. We reconcile this by treating the two as distinct layers of subjective well-being, evaluative vs. hedonic, showing that each maps differently onto place, politics, and time. Republican-leaning areas appear more satisfied in evaluative terms, but partisan gaps in happiness largely flatten outside major metros, indicating context-dependent political effects. Temporal shocks dominate the hedonic layer: happiness falls sharply during 2020-2022, whereas life satisfaction moves more modestly. These patterns are robust across logistic and OLS specifications and align with well-being theory. Interpreted as associations for the population of social-media posts, the results show that large-scale, language-based indicators can resolve conflicting findings about the rural-urban divide by distinguishing the type of well-being expressed, offering a transparent, reproducible complement to traditional surveys.

Subjects: Social and Information Networks , Machine Learning , Applications

Publish: 2025-11-13 17:41:11 UTC


#3 Estimating the true number of principal components under the random design [PDF] [Copy] [Kimi] [REL]

Author: Yasuyuki Matsumura

Principal component analysis (PCA) is frequently employed as a dimension reduction tool when the number of covariates is large. However, the number of principal components to be retained in PCA is typically determined in a researcher-dependent manner. To mitigate the subjectivity in PCA, this paper proposes a data-driven testing procedure to estimate the number of underlying principal components. While existing work such as G'Sell et al. (2016), Taylor et al. (2016) and Choi et al. (2017) discuss similar tests under fixed design, this paper investigates an extension of their framework to a more general econometric setup with the random design. The proposed test is proved to achieve asymptotically exact type 1 error controls under a locally defined null hypothesis, with simulation examples indicating an asymptotic validity of our test.

Subjects: Econometrics , Applications

Publish: 2025-11-13 15:40:38 UTC


#4 A tutorial for propensity score weighting methods under violations of the positivity assumption [PDF] [Copy] [Kimi] [REL]

Authors: Yi Liu, Yuan Wang, Ying Gao, Tonia Poteat, Roland A. Matsouaka

Violations of the positivity assumption can render conventional causal estimands unidentifiable, including the average treatment effect (ATE), the average treatment effect on the treated (ATT), and the average treatment effect on the controls (ATC). Shifting the inferential focus to their alternative counterparts -- the weighted ATE (WATE), the weighted ATT (WATT), and the weighted ATC (WATC) -- offers valuable insights into treatment effects while preserving internal validity. In this tutorial, we provide a comprehensive review of recent advances in propensity score (PS) weighting methods, along with practical guidance on how to select a primary target estimand (while other estimands serve as supplementary analyses), implement the corresponding PS-weighted estimators, and conduct post-weighting diagnostic assessments. The tutorial is accompanied by a user-friendly R package, ChiPS. We demonstrate the pertinence of various estimators through extensive simulation studies. We illustrate the flow of the tutorial on two real-world case studies: (i) Effect of smoking on blood lead level using data from the 2007-2008 National Health and Nutrition Examination Survey (NHANES); and (ii) Impact of history of sex work on HIV status among transgender women in South Africa.

Subjects: Methodology , Applications

Publish: 2025-11-13 08:31:04 UTC


#5 A Clustering Approach for Basket Trials Based on Treatment Response Trajectories [PDF] [Copy] [Kimi] [REL]

Authors: Masahiro Kojima, Keisuke Hanada, Atsuya Sato

Heterogeneity in efficacy is sometimes observed across baskets in basket trials. In this study, we propose a model-free clustering framework that groups baskets based on transition probabilities derived from the trajectories of treatment response, rather than relying solely on a single efficacy endpoint such as the objective response rate. The number of clusters is not predetermined but is automatically determined in a data-driven manner based on the similarity structure among baskets. After clustering, baskets within the same cluster are analyzed using a hierarchical Bayesian model. This framework aims to improve the estimation precision of efficacy endpoints and enhance statistical power while maintaining the type~I error rate at the nominal level. The performance of the proposed method was evaluated through simulation studies. The results demonstrated that the proposed method can accurately identify cluster structures in heterogeneous settings and, even under such conditions, maintain the type~I error rate at the nominal level while improving statistical power.

Subjects: Methodology , Applications

Publish: 2025-11-13 02:51:25 UTC


#6 Modelos Empiricos de Pos-Dupla Selecao por LASSO: Discussoes para Estudos do Transporte Aereo [PDF] [Copy] [Kimi] [REL]

Author: Alessandro V. M. Oliveira

This paper presents and discusses forms of estimation by regularized regression and model selection using the LASSO method - Least Absolute Shrinkage and Selection Operator. LASSO is recognized as one of the main supervised learning methods applied to high-dimensional econometrics, allowing work with large volumes of data and multiple correlated controls. Conceptual issues related to the consequences of high dimensionality in modern econometrics and the principle of sparsity, which underpins regularization procedures, are addressed. The study examines the main post-double selection and post-regularization models, including variations applied to instrumental variable models. A brief description of the lassopack routine package, its syntaxes, and examples of HD, HDS (High-Dimension Sparse), and IV-HDS models, with combinations involving fixed effects estimators, is also presented. Finally, the potential application of the approach in research focused on air transport is discussed, with emphasis on an empirical study on the operational efficiency of airlines and aircraft fuel consumption.

Subjects: Methodology , Machine Learning , General Economics , Systems and Control , Applications

Publish: 2025-11-12 22:00:35 UTC


#7 Masked Mineral Modeling: Continent-Scale Mineral Prospecting via Geospatial Infilling [PDF] [Copy] [Kimi] [REL]

Authors: Sujay Nair, Evan Coleman, Sherrie Wang, Elsa Olivetti

Minerals play a critical role in the advanced energy technologies necessary for decarbonization, but characterizing mineral deposits hidden underground remains costly and challenging. Inspired by recent progress in generative modeling, we develop a learning method which infers the locations of minerals by masking and infilling geospatial maps of resource availability. We demonstrate this technique using mineral data for the conterminous United States, and train performant models, with the best achieving Dice coefficients of $0.31 \pm 0.01$ and recalls of $0.22 \pm 0.02$ on test data at 1$\times$1 mi$^2$ spatial resolution. One major advantage of our approach is that it can easily incorporate auxiliary data sources for prediction which may be more abundant than mineral data. We highlight the capabilities of our model by adding input layers derived from geophysical sources, along with a nation-wide ground survey of soils originally intended for agronomic purposes. We find that employing such auxiliary features can improve inference performance, while also enabling model evaluation in regions with no recorded minerals.

Subjects: Machine Learning , Machine Learning , Applications

Publish: 2025-11-12 20:28:40 UTC


#8 Bayesian inference for precise and uncertainty-quantified single-shot widefield interferometric geometrical nanometrology [PDF] [Copy] [Kimi] [REL]

Authors: Damian Suski, Maria Cywinska, Julianna Winnik, Michal Jozwik, Piotr Zdankowski, Azeem Ahmad, Balpreet S. Ahluwalia, Maciej Trusiak

Advanced geometrical nanometrology is critical for process control in semiconductor manufacturing, supporting applications in, e.g., photonic integrated circuits, nanoelectronics, and emerging quantum and optoelectronic technologies. Widefield interferometric approach provide a cost-effective, non-destructive solution for characterizing semiconductor optical waveguides, which are fundamental to nanophotonic devices. This work presents a Bayesian inference framework, implemented using Dynamic Nested Sampling, for estimating geometric parameters - such as width and height - of a semiconductor optical waveguide from a single widefield interferogram. The proposed framework reduces the need of leveraging near field scanning microscopy methods for measurements. The notable advantage is that Bayesian statistics not only provide the estimated parameter values but also quantify the uncertainty of the inference results and the fitness of the used model. The proposed full-field, single-shot interferometric approach, supported by Bayesian-based data analysis, achieves high accuracy and sensitivity - down to successful measurement of 8 nm rib waveguide - while remaining resilient to noise. Thus, the demonstrated methodology provides a cost-effective, robust, and scalable tool for semiconductor fabrication monitoring and process verification, as confirmed by both numerical simulations and experimental validation on optical waveguides. This method contributes to high-precision nanometrology by integrating advanced statistical modeling and inference techniques.

Subjects: Optics , Applications

Publish: 2025-11-12 19:13:35 UTC