2025-05-23 | | Total: 2
In this study, we propose a novel machine-learning-based measure for stock price crash risk, utilizing the minimum covariance determinant methodology. Employing this newly introduced dependent variable, we predict stock price crash risk through cross-sectional regression analysis. The findings confirm that the proposed method effectively captures stock price crash risk, with the model demonstrating strong performance in terms of both statistical significance and economic relevance. Furthermore, leveraging a newly developed firm-specific investor sentiment index, the analysis identifies a positive correlation between stock price crash risk and firm-specific investor sentiment. Specifically, higher levels of sentiment are associated with an increased likelihood of stock price crash risk. This relationship remains robust across different firm sizes and when using the detoned version of the firm-specific investor sentiment index, further validating the reliability of the proposed approach.
This study introduces an interpretable machine learning (ML) framework to extract macroeconomic alpha from global news sentiment. We process the Global Database of Events, Language, and Tone (GDELT) Project's worldwide news feed using FinBERT -- a Bidirectional Encoder Representations from Transformers (BERT) based model pretrained on finance-specific language -- to construct daily sentiment indices incorporating mean tone, dispersion, and event impact. These indices drive an XGBoost classifier, benchmarked against logistic regression, to predict next-day returns for EUR/USD, USD/JPY, and 10-year U.S. Treasury futures (ZN). Rigorous out-of-sample (OOS) backtesting (5-fold expanding-window cross-validation, OOS period: c. 2017-April 2025) demonstrates exceptional, cost-adjusted performance for the XGBoost strategy: Sharpe ratios achieve 5.87 (EUR/USD), 4.65 (USD/JPY), and 4.65 (Treasuries), with respective compound annual growth rates (CAGRs) exceeding 50% in Foreign Exchange (FX) and 22% in bonds. Shapley Additive Explanations (SHAP) affirm that sentiment dispersion and article impact are key predictive features. Our findings establish that integrating domain-specific Natural Language Processing (NLP) with interpretable ML offers a potent and explainable source of macro alpha.