Applications | Cool Papers - Immersive Paper Discovery

#1 FOKE: A Personalized and Explainable Education Framework Integrating Foundation Models, Knowledge Graphs, and Prompt Engineering [PDF¹] [Copy] [Kimi¹]

Integrating large language models (LLMs) and knowledge graphs (KGs) holds great promise for revolutionizing intelligent education, but challenges remain in achieving personalization, interactivity, and explainability. We propose FOKE, a Forest Of Knowledge and Education framework that synergizes foundation models, knowledge graphs, and prompt engineering to address these challenges. FOKE introduces three key innovations: (1) a hierarchical knowledge forest for structured domain knowledge representation; (2) a multi-dimensional user profiling mechanism for comprehensive learner modeling; and (3) an interactive prompt engineering scheme for generating precise and tailored learning guidance. We showcase FOKE's application in programming education, homework assessment, and learning path planning, demonstrating its effectiveness and practicality. Additionally, we implement Scholar Hero, a real-world instantiation of FOKE. Our research highlights the potential of integrating foundation models, knowledge graphs, and prompt engineering to revolutionize intelligent education practices, ultimately benefiting learners worldwide. FOKE provides a principled and unified approach to harnessing cutting-edge AI technologies for personalized, interactive, and explainable educational services, paving the way for further research and development in this critical direction.

#2 Return to Office and the Tenure Distribution [PDF] [Copy] [Kimi¹]

Authors: David Van Dijcke ; Florian Gunsilius ; Austin Wright

With the official end of the COVID-19 pandemic, debates about the return to office have taken center stage among companies and employees. Despite their ubiquity, the economic implications of return to office policies are not fully understood. Using 260 million resumes matched to company data, we analyze the causal effects of such policies on employees' tenure and seniority levels at three of the largest US tech companies: Microsoft, SpaceX, and Apple. Our estimation procedure is nonparametric and captures the full heterogeneity of tenure and seniority of employees in a distributional synthetic controls framework. We estimate a reduction in counterfactual tenure that increases for employees with longer tenure. Similarly, we document a leftward shift in the seniority distribution towards positions below the senior level. These shifts appear to be driven by employees leaving to larger firms that are direct competitors. Our results suggest that return to office policies can lead to an outflow of senior employees, posing a potential threat to the productivity, innovation, and competitiveness of the wider firm.

#3 Scalable Amortized GPLVMs for Single Cell Transcriptomics Data [PDF] [Copy] [Kimi]

Authors: Sarah Zhao ; Aditya Ravuri ; Vidhi Lalchand ; Neil D. Lawrence

Dimensionality reduction is crucial for analyzing large-scale single-cell RNA-seq data. Gaussian Process Latent Variable Models (GPLVMs) offer an interpretable dimensionality reduction method, but current scalable models lack effectiveness in clustering cell types. We introduce an improved model, the amortized stochastic variational Bayesian GPLVM (BGPLVM), tailored for single-cell RNA-seq with specialized encoder, kernel, and likelihood designs. This model matches the performance of the leading single-cell variational inference (scVI) approach on synthetic and real-world COVID datasets and effectively incorporates cell-cycle and batch information to reveal more interpretable latent structures as we demonstrate on an innate immunity dataset.

#4 UQ state-dependent framework for seismic fragility assessment of industrial components [PDF] [Copy] [Kimi]

Authors: C. Nardin ; S. Marelli ; O. S. Bursi ; B. Sudret ; M. Broccardo

In this study, we propose a novel surrogate modelling approach to efficiently and accurately approximate the response of complex dynamical systems driven by time-varying Recently, there has been increased interest in assessing the seismic fragility of industrial plants and process equipment. This is reflected in the growing number of studies, community-funded research projects and experimental campaigns on the matter.Nonetheless, the complexity of the problem and its inherent modelling, coupled with a general scarcity of available data on process equipment, has limited the development of risk assessment methods. In fact, these limitations have led to the creation of simplified and quick-to-run models. In this context, we propose an innovative framework for developing state-dependent fragility functions. This new methodology combines limited data with the power of metamodelling and statistical techniques, namely polynomial chaos expansions (PCE) and bootstrapping. Therefore, we validated the framework on a simplified and inexpensive-to-run MDoF system endowed with Bouc-Wen hysteresis.Then, we tested it on a real nonstructural industrial process component. Specifically, we applied the state-dependent fragility framework to a critical vertical tank of a multicomponent full-scale 3D steel braced frame (BF). The seismic performance of the BF endowed with process components was captured by means of shake table campaign within the European SPIF project. Finally, we derived state-dependent fragility functions based on the combination of PCE and bootstrap at a greatly reduced computational cost.

#5 Stochastic behavior of an n-node blockchain under cyber attacks from multiple hackers with random re-setting times [PDF] [Copy] [Kimi]

Authors: Xiufeng Xu ; Liang Hong

This paper investigates the stochastic behavior of an n-node blockchain which is continuously monitored and faces non-stop cyber attacks from multiple hackers. The blockchain will start being re-set once hacking is detected, forfeiting previous efforts of all hackers. It is assumed the re-setting process takes a random amount of time. Multiple independent hackers will keep attempting to hack into the blockchain until one of them succeeds. For arbitrary distributions of the hacking times, detecting times, and re-setting times, we derive the instantaneous functional probability, the limiting functional probability, and the mean functional time of the blockchain. Moreover, we establish that these quantities are increasing functions of the number of nodes, formalizing the intuition that the more nodes a blockchain has the more secure it is.

#6 Non-locality and Spillover Effects of Residential Flood Damage on Community Recovery: Insights from High-resolution Flood Claim and Mobility Data [PDF] [Copy] [Kimi]

Authors: Junwei Ma ; Russell Blessing ; Samuel Brody ; Ali Mostafavi

Examining the relationship between vulnerability of the built environment and community recovery is crucial for understanding disaster resilience. Yet, this relationship is rather neglected in the existing literature due to previous limitations in the availability of empirical datasets needed for such analysis. In this study, we combine fine-resolution flood damage claims data (composed of both insured and uninsured losses) and human mobility data (composed of millions of movement trajectories) during the 2017 Hurricane Harvey in Harris County, Texas, to specify the extent to which vulnerability of the built environment (i.e., flood property damage) affects community recovery (based on the speed of human mobility recovery) locally and regionally. We examine this relationship using a spatial lag, spatial reach, and spatial decay models to measure the extent of spillover effects of residential damage on community recovery. The findings show that: first, the severity of residential damage significantly affects the speed of community recovery. A greater extent of residential damage suppresses community recovery not only locally but also in the surrounding areas. Second, the spatial spillover effect of residential damage on community recovery speed decays with distance from the highly damaged areas. Third, spatial areas display heterogeneous spatial decay coefficients, which are associated with urban structure features such as the density of points-of-interest facilities and roads. These findings provide a novel data-driven characterization of the spatial diffusion of residential flood damage effects on community recovery and move us closer to a better understanding of complex spatial processes that shape community resilience to hazards. This study also provides valuable insights for emergency managers and public officials seeking to mitigate the non-local effects of residential damage.

#7 An Analysis of Sea Level Spatial Variability by Topological Indicators and $k$-means Clustering Algorithm [PDF] [Copy] [Kimi]

Authors: Zixin Lin ; Nur Fariha Syaqina Zulkepli ; Mohd Shareduwan Mohd Kasihmuddin ; R. U. Gobithaasan

The time-series data of sea level rise and fall contains crucial information on the variability of sea level patterns. Traditional $k$-means clustering is commonly used for categorizing regional variability of sea level, however, its results are not robust against a number of factors. This study analyzed fourteen datasets of monthly sea level in fourteen shoreline regions of Peninsular Malaysia. We applied a hybridization of clustering technique to analyze data categorization and topological data analysis method to enhance the performance of our clustering analysis. Specifically, our approach utilized the persistent homology and $k$-means/$k$-means++ clustering. The fourteen data sets from fourteen tide gauge stations were categorized in classes based on a prior categorization that was determined by topological information, and the probability of data points that belong to certain groups that is yielded by $k$-means/$k$-means++ clustering. Our results demonstrated that our method significantly improves the performance of traditional clustering techniques.

#8 New allometric models for the USA create a step-change in forest carbon estimation, modeling, and mapping [PDF] [Copy] [Kimi]

Authors: Lucas K. Johnson ; Michael J. Mahoney ; Grant Domke ; Colin M. Beier

The United States national forest inventory (NFI) serves as the foundation for forest aboveground biomass (AGB) and carbon accounting across the nation. These data enable design-based estimates of forest carbon stocks and stock-changes at state and regional levels, but also serve as inputs to model-based approaches for characterizing forest carbon stocks and stock-changes at finer resolutions. Although NFI tree and plot-level data are often treated as truth in these models, they are in fact estimates based on regional species-group models known collectively as the Component Ratio Method (CRM). In late 2023 the Forest Inventory and Analysis (FIA) program introduced a new National Scale Volume and Biomass Estimators (NSVB) system to replace CRM nationwide and offer more precise and accurate representations of forest AGB and carbon. Given the prevalence of model-based AGB studies relying on FIA, there is concern about the transferability of methods from CRM to NSVB models, as well as the comparability of existing CRM AGB products (e.g. maps) to new and forthcoming NSVB AGB products. To begin addressing these concerns we compared previously published CRM AGB maps to new maps produced using identical methods with NSVB AGB reference data. Our results suggest that models relying on passive satellite imagery (e.g. Landsat) provide acceptable estimates of point-in-time NSVB AGB and carbon stocks, but fail to accurately quantify growth in mature closed-canopy forests. We highlight that existing estimates, models, and maps based on FIA reference data are no longer compatible with NSVB, and recommend new methods as well as updated models and maps for accommodating this step-change. Our collective ability to adopt NSVB in our modeling and mapping workflows will help us provide the most accurate spatial forest carbon data possible in order to better inform local management and decision making.