Computational Engineering, Finance, and Science

2026-01-21 | | Total: 9

#1 TransMode-LLM: Feature-Informed Natural Language Modeling with Domain-Enhanced Prompting for Travel Behavior Modeling [PDF] [Copy] [Kimi] [REL]

Authors: Meijing Zhang, Ying Xu

Understanding traveler behavior and accurately predicting travel mode choice are at the heart of transportation planning and policy-making. This study proposes TransMode-LLM, an innovative framework that integrates statistical methods with LLM-based techniques to predict travel modes from travel survey data. The framework operates through three phases: (1) statistical analysis identifies key behavioral features, (2) natural language encoding transforms structured data into contextual descriptions, and (3) LLM adaptation predicts travel mode through multiple learning paradigms including zero-shot and one/few-shot learning and domain-enhanced prompting. We evaluate TransMode-LLM using both general-purpose models (GPT-4o, GPT-4o-mini) and reasoning-focused models (o3-mini, o4-mini) with varying sample sizes on real-world travel survey data. Extensive experiment results demonstrate that the LLM-based approach achieves competitive accuracy compared to state-of-the-art baseline classifiers models. Moreover, few-shot learning significantly improves prediction accuracy, with models like o3-mini showing consistent improvements of up to 42.9\% with 5 provided examples. However, domain-enhanced prompting shows divergent effects across LLM architectures. In detail, it is helpful to improve performance for general-purpose models with GPT-4o achieving improvements of 2.27% to 12.50%. However, for reasoning-oriented models (o3-mini, o4-mini), domain knowledge enhancement does not universally improve performance. This study advances the application of LLMs in travel behavior modeling, providing promising and valuable insights for both academic research and transportation policy-making in the future.

Subject: Computational Engineering, Finance, and Science

Publish: 2026-01-20 09:14:20 UTC


#2 Text2Structure3D: Graph-Based Generative Modeling of Equilibrium Structures with Diffusion Transformers [PDF] [Copy] [Kimi] [REL]

Authors: Lazlo Bleker, Zifeng Guo, Kaleb Smith, Kam-Ming Mark Tam, Karla Saldaña Ochoa, Pierluigi D'Acunto

This paper presents Text2Structure3D, a graph-based Machine Learning (ML) model that generates equilibrium structures from natural language prompts. Text2Structure3D is designed to support new intuitive ways of design exploration and iteration in the conceptual structural design process. The approach combines latent diffusion with a Variational Graph Auto-Encoder (VGAE) and graph transformers to generate structural graphs that are close to an equilibrium state. Text2Structure3D integrates a residual force optimization post-processing step that ensures generated structures fully satisfy static equilibrium. The model was trained and validated using a cross-typological dataset of funicular form-found and statically determinate bridge structures, paired with text descriptions that capture the formal and structural features of each bridge. Results demonstrate that Text2Structure3D generates equilibrium structures with strong adherence to text-based specifications and greatly improves generalization capabilities compared to parametric model-based approaches. Text2Structure3D represents an early step toward a general-purpose foundation model for structural design, enabling the integration of generative AI into conceptual design workflows.

Subject: Computational Engineering, Finance, and Science

Publish: 2026-01-19 09:25:00 UTC


#3 A Model Fusion Approach for Enhancing Credit Approval Decision Making [PDF] [Copy] [Kimi] [REL]

Authors: Yuanhong Wu, Jingyan Xu, Wei Ye, Christina Schweikert, D. Frank Hsu

Credit default poses significant challenges to financial institutions and consumers, resulting in substantial financial losses and diminished trust. As such, credit default risk management has been a critical topic in the financial industry. In this paper, we present Combinatorial Fusion Analysis (CFA), a model fusion framework, that combines multiple machine learning algorithms to detect and predict credit card approval with high accuracy. We present the design methodology and implementation using five pre-trained models. The CFA results show an accuracy of 89.13% which is better than conventional machine learning and ensemble methods.

Subject: Computational Engineering, Finance, and Science

Publish: 2026-01-19 03:07:58 UTC


#4 Traffic Collisions: Temporal Patterns and Severity-Weighted Hotspot Analysis [PDF] [Copy] [Kimi] [REL]

Authors: Nael Alsaleh, Noura Falis, Tareq Alsaleh, Farah Ba Fakih

Understanding traffic collision patterns is of high importance for effective road safety planning in fast-growing urban environments. This study examines the temporal and spatial patterns of traffic collisions in Dubai, UAE, with a particular focus on collision severity. To this end, traffic collision records from November 2024 to June 2025 were analyzed to examine hourly, daily, and monthly variations in collision frequency and severity for both overall traffic collisions and pedestrian-related accidents. Temporal associations with severity were evaluated using chi-square tests and Cramer's V, while spatial patterns were analyzed using severity-weighted hotspot analysis based on the Getis-Ord Gi* statistic, complemented by inverse distance weighting (IDW) interpolation. The results show a clear temporal variation in overall collision frequency and severity, with higher collision frequencies during evening and nighttime periods with 44% higher probability of high-severity outcomes at night compared to the afternoon. On the other hand, pedestrian-related accidents showed a distinct temporal profile, characterized by higher occurrence during late-evening hours and relatively limited variation across days of the week and months. Spatial analysis identified statistically significant severity hotspots for overall collisions in the northern and northwestern parts of Dubai and along the Al Ain-Dubai Highway, while pedestrian severity hotspots were concentrated near industrial areas in the southwestern region. Several policy measures are proposed based on the findings including, reducing nighttime speed limits, enhancing automated enforcement, improving roadway lighting, and implementing pedestrian-focused treatments in statistically significant hotspots.

Subject: Computational Engineering, Finance, and Science

Publish: 2026-01-18 19:22:14 UTC


#5 The Limits of Conditional Volatility: Assessing Cryptocurrency VaR under EWMA and IGARCH Models [PDF] [Copy] [Kimi] [REL]

Author: Ekleen Kaur

The application of the standard static Geometric Brownian Motion (GBM) model for cryptocurrency risk management resulted in a systemic failure, evidenced by a 80.67% chance of loss in the 5% value-at-risk benchmark. This study addresses a critical literature gap by comparatively testing three conditional volatility models the EWMA/IGARCH baseline, an IGARCH model augmented with explicit mean reversion (IGARCH + MR), and a modified EGARCH-style asymmetric shock model within a correlated Monte Carlo VaR framework. Crucially, the analysis is applied specifically to high-beta altcoins (XRP, SOL, ADA), an asset class largely neglected by mainstream GARCH literature. Our results demonstrate that imposing stationarity (IGARCH + MR) drastically underestimates downside risk (5 percent value-at-risk reduced by 50%), while the asymmetric model (Model 3) leads to severe over-penalization. The EWMA/IGARCH baseline, characterized by infinite volatility persistence (alpha + beta = 1), provided the only robust conditional volatility estimate. This finding constitutes a formal rejection of the conventional financial hypotheses of volatility mean reversion and the asymmetric leverage effect in the altcoin asset class, establishing that non-stationary frameworks are a prerequisite for regulatory-grade risk modeling in this domain.

Subjects: Cryptography and Security , Computational Engineering, Finance, and Science

Publish: 2026-01-20 09:11:24 UTC


#6 RAG: A Random-Forest-Based Generative Design Framework for Uncertainty-Aware Design of Metamaterials with Complex Functional Response Requirements [PDF] [Copy] [Kimi] [REL]

Authors: Bolin Chen, Dex Doksoo Lee, Wei "Wayne'' Chen, Wei Chen

Metamaterials design for advanced functionality often entails the inverse design on nonlinear and condition-dependent responses (e.g., stress-strain relation and dispersion relation), which are described by continuous functions. Most existing design methods focus on vector-valued responses (e.g., Young's modulus and bandgap width), while the inverse design of functional responses remains challenging due to their high-dimensionality, the complexity of accommodating design requirements in inverse-design frameworks, and non-existence or non-uniqueness of feasible solutions. Although generative design approaches have shown promise, they are often data-hungry, handle design requirements heuristically, and may generate infeasible designs without uncertainty quantification. To address these challenges, we introduce a RAndom-forest-based Generative approach (RAG). By leveraging the small-data compatibility of random forests, RAG enables data-efficient predictions of high-dimensional functional responses. During the inverse design, the framework estimates the likelihood through the ensemble which quantifies the trustworthiness of generated designs while reflecting the relative difficulty across different requirements. The one-to-many mapping is addressed through single-shot design generation by sampling from the conditional likelihood. We demonstrate RAG on: 1) acoustic metamaterials with prescribed partial passbands/stopbands, and 2) mechanical metamaterials with targeted snap-through responses, using 500 and 1057 samples, respectively. Its data-efficiency is benchmarked against neural networks on a public mechanical metamaterial dataset with nonlinear stress-strain relations. Our framework provides a lightweight, trustworthy pathway to inverse design involving functional responses, expensive simulations, and complex design requirements, beyond metamaterials.

Subjects: Artificial Intelligence , Computational Engineering, Finance, and Science

Publish: 2026-01-19 17:06:12 UTC


#7 FutureX-Pro: Extending Future Prediction to High-Value Vertical Domains [PDF] [Copy] [Kimi] [REL]

Authors: Jiashuo Liu, Siyuan Chen, Zaiyuan Wang, Zhiyuan Zeng, Jiacheng Guo, Liang Hu, Lingyue Yin, Suozhi Huang, Wenxin Hao, Yang Yang, Zerui Cheng, Zixin Yao, Lingyue Yin, Haoxin Liu, Jiayi Cheng, Yuzhen Li, Zezhong Ma, Bingjie Wang, Bingsen Qiu, Xiao Liu, Zeyang Zhang, Zijian Liu, Jinpeng Wang, Mingren Yin, Tianci He, Yali Liao, Yixiao Tian, Zhenwei Zhu, Anqi Dai, Ge Zhang, Jingkai Liu, Kaiyuan Zhang, Wenlong Wu, Xiang Gao, Xinjie Chen, Zhixin Yao, Zhoufutu Wen, B. Aditya Prakash, Jose Blanchet, Mengdi Wang, Nian Si, Wenhao Huang et al. (12 additional authors not shown)

Building upon FutureX, which established a live benchmark for general-purpose future prediction, this report introduces FutureX-Pro, including FutureX-Finance, FutureX-Retail, FutureX-PublicHealth, FutureX-NaturalDisaster, and FutureX-Search. These together form a specialized framework extending agentic future prediction to high-value vertical domains. While generalist agents demonstrate proficiency in open-domain search, their reliability in capital-intensive and safety-critical sectors remains under-explored. FutureX-Pro targets four economically and socially pivotal verticals: Finance, Retail, Public Health, and Natural Disaster. We benchmark agentic Large Language Models (LLMs) on entry-level yet foundational prediction tasks -- ranging from forecasting market indicators and supply chain demands to tracking epidemic trends and natural disasters. By adapting the contamination-free, live-evaluation pipeline of FutureX, we assess whether current State-of-the-Art (SOTA) agentic LLMs possess the domain grounding necessary for industrial deployment. Our findings reveal the performance gap between generalist reasoning and the precision required for high-value vertical applications.

Subjects: Artificial Intelligence , Computational Engineering, Finance, and Science , Machine Learning

Publish: 2026-01-18 04:44:49 UTC


#8 CoSMeTIC: Zero-Knowledge Computational Sparse Merkle Trees with Inclusion-Exclusion Proofs for Clinical Research [PDF] [Copy] [Kimi] [REL]

Authors: Mohammad Shahid, Paritosh Ramanan, Mohammad Fili, Guiping Hu, Hillel Haim

Analysis of clinical data is a cornerstone of biomedical research with applications in areas such as genomic testing and response characterization of therapeutic drugs. Maintaining strict privacy controls is essential because such data typically contains personally identifiable health information of patients. At the same time, regulatory compliance often requires study managers to demonstrate the integrity and authenticity of participant data used in analyses. Balancing these competing requirements, privacy preservation and verifiable accountability, remains a critical challenge. In this paper, we present CoSMeTIC, a zero-knowledge computational framework that proposes computational Sparse Merkle Trees (SMTs) as a means to generate verifiable inclusion and exclusion proofs for individual participants' data in clinical studies. We formally analyze the zero-knowledge properties of CoSMeTIC and evaluate its computational efficiency through extensive experiments. Using the Kolmogorov-Smirnov and likelihood-ratio hypothesis tests, along with logistic-regression-based genomic analyses on real-world Huntington's disease datasets, we demonstrate that CoSMeTIC achieves strong privacy guarantees while maintaining statistical fidelity. Our results suggest that CoSMeTIC provides a scalable and practical alternative for achieving regulatory compliance with rigorous privacy protection in large-scale clinical research.

Subjects: Cryptography and Security , Computational Engineering, Finance, and Science

Publish: 2026-01-17 18:47:17 UTC


#9 Utilizing Metadata for Better Retrieval-Augmented Generation [PDF] [Copy] [Kimi] [REL]

Authors: Raquib Bin Yousuf, Shengzhe Xu, Mandar Sharma, Andrew Neeser, Chris Latimer, Naren Ramakrishnan

Retrieval-Augmented Generation systems depend on retrieving semantically relevant document chunks to support accurate, grounded outputs from large language models. In structured and repetitive corpora such as regulatory filings, chunk similarity alone often fails to distinguish between documents with overlapping language. Practitioners often flatten metadata into input text as a heuristic, but the impact and trade-offs of this practice remain poorly understood. We present a systematic study of metadata-aware retrieval strategies, comparing plain-text baselines with approaches that embed metadata directly. Our evaluation spans metadata-as-text (prefix and suffix), a dual-encoder unified embedding that fuses metadata and content in a single index, dual-encoder late-fusion retrieval, and metadata-aware query reformulation. Across multiple retrieval metrics and question types, we find that prefixing and unified embeddings consistently outperform plain-text baselines, with the unified at times exceeding prefixing while being easier to maintain. Beyond empirical comparisons, we analyze embedding space, showing that metadata integration improves effectiveness by increasing intra-document cohesion, reducing inter-document confusion, and widening the separation between relevant and irrelevant chunks. Field-level ablations show that structural cues provide strong disambiguating signals. Our code, evaluation framework, and the RAGMATE-10K dataset are publicly hosted.

Subjects: Information Retrieval , Artificial Intelligence , Computational Engineering, Finance, and Science , Computation and Language

Publish: 2026-01-17 01:11:03 UTC