Processing math: 100%

Computer Science

2025-04-28 | | Total: 397

#1 Generalization Capability for Imitation Learning [PDF2] [Copy] [Kimi2] [REL]

Author: Yixiao Wang

Imitation learning holds the promise of equipping robots with versatile skills by learning from expert demonstrations. However, policies trained on finite datasets often struggle to generalize beyond the training distribution. In this work, we present a unified perspective on the generalization capability of imitation learning, grounded in both information theorey and data distribution property. We first show that the generalization gap can be upper bounded by (i) the conditional information bottleneck on intermediate representations and (ii) the mutual information between the model parameters and the training dataset. This characterization provides theoretical guidance for designing effective training strategies in imitation learning, particularly in determining whether to freeze, fine-tune, or train large pretrained encoders (e.g., vision-language models or vision foundation models) from scratch to achieve better generalization. Furthermore, we demonstrate that high conditional entropy from input to output induces a flatter likelihood landscape, thereby reducing the upper bound on the generalization gap. In addition, it shortens the stochastic gradient descent (SGD) escape time from sharp local minima, which may increase the likelihood of reaching global optima under fixed optimization budgets. These insights explain why imitation learning often exhibits limited generalization and underscore the importance of not only scaling the diversity of input data but also enriching the variability of output labels conditioned on the same input.

Subjects: Machine Learning , Artificial Intelligence , Robotics

Publish: 2025-04-25 17:59:59 UTC


#2 Adapting Probabilistic Risk Assessment for AI [PDF2] [Copy] [Kimi8] [REL]

Authors: Anna Katariina Wisakanto, Joe Rogero, Avyay M. Casheekar, Richard Mallah

Modern general-purpose artificial intelligence (AI) systems present an urgent risk management challenge, as their rapidly evolving capabilities and potential for catastrophic harm outpace our ability to reliably assess their risks. Current methods often rely on selective testing and undocumented assumptions about risk priorities, frequently failing to make a serious attempt at assessing the set of pathways through which Al systems pose direct or indirect risks to society and the biosphere. This paper introduces the probabilistic risk assessment (PRA) for AI framework, adapting established PRA techniques from high-reliability industries (e.g., nuclear power, aerospace) for the new challenges of advanced AI. The framework guides assessors in identifying potential risks, estimating likelihood and severity, and explicitly documenting evidence, underlying assumptions, and analyses at appropriate granularities. The framework's implementation tool synthesizes the results into a risk report card with aggregated risk estimates from all assessed risks. This systematic approach integrates three advances: (1) Aspect-oriented hazard analysis provides systematic hazard coverage guided by a first-principles taxonomy of AI system aspects (e.g. capabilities, domain knowledge, affordances); (2) Risk pathway modeling analyzes causal chains from system aspects to societal impacts using bidirectional analysis and incorporating prospective techniques; and (3) Uncertainty management employs scenario decomposition, reference scales, and explicit tracing protocols to structure credible projections with novelty or limited data. Additionally, the framework harmonizes diverse assessment methods by integrating evidence into comparable, quantified absolute risk estimates for critical decisions. We have implemented this as a workbook tool for AI developers, evaluators, and regulators, available on the project website.

Subjects: Artificial Intelligence , Computers and Society , Machine Learning , Systems and Control , Applications

Publish: 2025-04-25 17:59:14 UTC


#3 TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation [PDF5] [Copy] [Kimi12] [REL]

Authors: Gwen Yidou Weng, Benjie Wang, Guy Van den Broeck

As large language models (LMs) advance, there is an increasing need to control their outputs to align with human values (e.g., detoxification) or desired attributes (e.g., personalization, topic). However, autoregressive models focus on next-token predictions and struggle with global properties that require looking ahead. Existing solutions either tune or post-train LMs for each new attribute - expensive and inflexible - or approximate the Expected Attribute Probability (EAP) of future sequences by sampling or training, which is slow and unreliable for rare attributes. We introduce TRACE (Tractable Probabilistic Reasoning for Adaptable Controllable gEneration), a novel framework that efficiently computes EAP and adapts to new attributes through tractable probabilistic reasoning and lightweight control. TRACE distills a Hidden Markov Model (HMM) from an LM and pairs it with a small classifier to estimate attribute probabilities, enabling exact EAP computation over the HMM's predicted futures. This EAP is then used to reweigh the LM's next-token probabilities for globally compliant continuations. Empirically, TRACE achieves state-of-the-art results in detoxification with only 10% decoding overhead, adapts to 76 low-resource personalized LLMs within seconds, and seamlessly extends to composite attributes.

Subjects: Computation and Language , Machine Learning

Publish: 2025-04-25 17:59:13 UTC


#4 Scaling Laws For Scalable Oversight [PDF2] [Copy] [Kimi7] [REL]

Authors: Joshua Engels, David D. Baek, Subhash Kantamneni, Max Tegmark

Scalable oversight, the process by which weaker AI systems supervise stronger ones, has been proposed as a key strategy to control future superintelligent systems. However, it is still unclear how scalable oversight itself scales. To address this gap, we propose a framework that quantifies the probability of successful oversight as a function of the capabilities of the overseer and the system being overseen. Specifically, our framework models oversight as a game between capability-mismatched players; the players have oversight-specific and deception-specific Elo scores that are a piecewise-linear function of their general intelligence, with two plateaus corresponding to task incompetence and task saturation. We validate our framework with a modified version of the game Nim and then apply it to four oversight games: "Mafia", "Debate", "Backdoor Code" and "Wargames". For each game, we find scaling laws that approximate how domain performance depends on general AI system capability (using Chatbot Arena Elo as a proxy for general capability). We then build on our findings in a theoretical study of Nested Scalable Oversight (NSO), a process in which trusted models oversee untrusted stronger models, which then become the trusted models in the next step. We identify conditions under which NSO succeeds and derive numerically (and in some cases analytically) the optimal number of oversight levels to maximize the probability of oversight success. In our numerical examples, the NSO success rate is below 52% when overseeing systems that are 400 Elo points stronger than the baseline overseer, and it declines further for overseeing even stronger systems.

Subjects: Artificial Intelligence , Computers and Society , Machine Learning

Publish: 2025-04-25 17:54:27 UTC


#5 Practical Type-Based Taint Checking and Inference (Extended Version) [PDF] [Copy] [Kimi] [REL]

Authors: Nima Karimipour, Kanak Das, Manu Sridharan, Behnaz Hassanshahi

Many important security properties can be formulated in terms of flows of tainted data, and improved taint analysis tools to prevent such flows are of critical need. Most existing taint analyses use whole-program static analysis, leading to scalability challenges. Type-based checking is a promising alternative, as it enables modular and incremental checking for fast performance. However, type-based approaches have not been widely adopted in practice, due to challenges with false positives and annotating existing codebases. In this paper, we present a new approach to type-based checking of taint properties that addresses these challenges, based on two key techniques. First, we present a new type-based tainting checker with significantly reduced false positives, via more practical handling of third-party libraries and other language constructs. Second, we present a novel technique to automatically infer tainting type qualifiers for existing code. Our technique supports inference of generic type argument annotations, crucial for tainting properties. We implemented our techniques in a tool TaintTyper and evaluated it on real-world benchmarks. TaintTyper exceeds the recall of a state-of-the-art whole-program taint analyzer, with comparable precision, and 2.93X-22.9X faster checking time. Further, TaintTyper infers annotations comparable to those written by hand, suitable for insertion into source code. TaintTyper is a promising new approach to efficient and practical taint checking.

Subject: Programming Languages

Publish: 2025-04-25 17:53:52 UTC


#6 Robust semi-implicit multilevel SDC methods for conservation laws [PDF] [Copy] [Kimi] [REL]

Authors: Erik Pfister, Jörg Stiller

Semi-implicit multilevel spectral deferred correction (SI-MLSDC) methods provide a promising approach for high-order time integration for nonlinear evolution equations including conservation laws. However, existing methods lack robustness and often do not achieve the expected advantage over single-level SDC. This work adopts the novel SI time integrators from [44] for enhanced stability and extends the single-level SI-SDC method with a multilevel approach to increase computational efficiency. The favourable properties of the resulting SI-MLSDC method are shown by linear temporal stability analysis for a convection-diffusion problem. The robustness and efficiency of the fully discrete method involving a high-order discontinuous Galerkin SEM discretization are demonstrated through numerical experiments for the convection-diffusion, Burgers, Euler and Navier-Stokes equations. The method is shown to yield substantial reductions in fine-grid iterations compared to single-level SI-SDC across a broad range of test cases. Finally, current limitations of the SI-MLSDC framework are identified and discussed, providing guidance for future improvements.

Subject: Numerical Analysis

Publish: 2025-04-25 17:50:36 UTC


#7 Augmenting Perceptual Super-Resolution via Image Quality Predictors [PDF10] [Copy] [Kimi5] [REL]

Authors: Fengjia Zhang, Samrudhdhi B. Rangrej, Tristan Aumentado-Armstrong, Afsaneh Fazly, Alex Levinshtein

Super-resolution (SR), a classical inverse problem in computer vision, is inherently ill-posed, inducing a distribution of plausible solutions for every input. However, the desired result is not simply the expectation of this distribution, which is the blurry image obtained by minimizing pixelwise error, but rather the sample with the highest image quality. A variety of techniques, from perceptual metrics to adversarial losses, are employed to this end. In this work, we explore an alternative: utilizing powerful non-reference image quality assessment (NR-IQA) models in the SR context. We begin with a comprehensive analysis of NR-IQA metrics on human-derived SR data, identifying both the accuracy (human alignment) and complementarity of different metrics. Then, we explore two methods of applying NR-IQA models to SR learning: (i) altering data sampling, by building on an existing multi-ground-truth SR framework, and (ii) directly optimizing a differentiable quality score. Our results demonstrate a more human-centric perception-distortion tradeoff, focusing less on non-perceptual pixel-wise distortion, instead improving the balance between perceptual fidelity and human-tuned NR-IQA measures.

Subject: Computer Vision and Pattern Recognition

Publish: 2025-04-25 17:47:38 UTC


#8 E-VLC: A Real-World Dataset for Event-based Visible Light Communication And Localization [PDF4] [Copy] [Kimi1] [REL]

Authors: Shintaro Shiba, Quan Kong, Norimasa Kobori

Optical communication using modulated LEDs (e.g., visible light communication) is an emerging application for event cameras, thanks to their high spatio-temporal resolutions. Event cameras can be used simply to decode the LED signals and also to localize the camera relative to the LED marker positions. However, there is no public dataset to benchmark the decoding and localization in various real-world settings. We present, to the best of our knowledge, the first public dataset that consists of an event camera, a frame camera, and ground-truth poses that are precisely synchronized with hardware triggers. It provides various camera motions with various sensitivities in different scene brightness settings, both indoor and outdoor. Furthermore, we propose a novel method of localization that leverages the Contrast Maximization framework for motion estimation and compensation. The detailed analysis and experimental results demonstrate the advantages of LED-based localization with events over the conventional AR-marker--based one with frames, as well as the efficacy of the proposed method in localization. We hope that the proposed dataset serves as a future benchmark for both motion-related classical computer vision tasks and LED marker decoding tasks simultaneously, paving the way to broadening applications of event cameras on mobile devices. https://woven-visionai.github.io/evlc-dataset

Subjects: Computer Vision and Pattern Recognition , Robotics , Signal Processing

Publish: 2025-04-25 17:43:20 UTC


#9 Intelligent Attacks and Defense Methods in Federated Learning-enabled Energy-Efficient Wireless Networks [PDF] [Copy] [Kimi] [REL]

Authors: Han Zhang, Hao Zhou, Medhat Elsayed, Majid Bavand, Raimundas Gaigalas, Yigit Ozcan, Melike Erol-Kantarci

Federated learning (FL) is a promising technique for learning-based functions in wireless networks, thanks to its distributed implementation capability. On the other hand, distributed learning may increase the risk of exposure to malicious attacks where attacks on a local model may spread to other models by parameter exchange. Meanwhile, such attacks can be hard to detect due to the dynamic wireless environment, especially considering local models can be heterogeneous with non-independent and identically distributed (non-IID) data. Therefore, it is critical to evaluate the effect of malicious attacks and develop advanced defense techniques for FL-enabled wireless networks. In this work, we introduce a federated deep reinforcement learning-based cell sleep control scenario that enhances the energy efficiency of the network. We propose multiple intelligent attacks targeting the learning-based approach and we propose defense methods to mitigate such attacks. In particular, we have designed two attack models, generative adversarial network (GAN)-enhanced model poisoning attack and regularization-based model poisoning attack. As a counteraction, we have proposed two defense schemes, autoencoder-based defense, and knowledge distillation (KD)-enabled defense. The autoencoder-based defense method leverages an autoencoder to identify the malicious participants and only aggregate the parameters of benign local models during the global aggregation, while KD-based defense protects the model from attacks by controlling the knowledge transferred between the global model and local models.

Subject: Machine Learning

Publish: 2025-04-25 17:40:35 UTC


#10 PODNO: Proper Orthogonal Decomposition Neural Operators [PDF] [Copy] [Kimi1] [REL]

Authors: Zilan Cheng, Zhongjian Wang, Li-Lian Wang, Mejdi Azaiez

In this paper, we introduce Proper Orthogonal Decomposition Neural Operators (PODNO) for solving partial differential equations (PDEs) dominated by high-frequency components. Building on the structure of Fourier Neural Operators (FNO), PODNO replaces the Fourier transform with (inverse) orthonormal transforms derived from the Proper Orthogonal Decomposition (POD) method to construct the integral kernel. Due to the optimality of POD basis, the PODNO has potential to outperform FNO in both accuracy and computational efficiency for high-frequency problems. From analysis point of view, we established the universality of a generalization of PODNO, termed as Generalized Spectral Operator (GSO). In addition, we evaluate PODNO's performance numerically on dispersive equations such as the Nonlinear Schrodinger (NLS) equation and the Kadomtsev-Petviashvili (KP) equation.

Subjects: Numerical Analysis , Machine Learning , Computational Physics

Publish: 2025-04-25 17:30:44 UTC


#11 Co-Change Graph Entropy: A New Process Metric for Defect Prediction [PDF] [Copy] [Kimi] [REL]

Authors: Ethari Hrishikesh, Amit Kumar, Meher Bhardwaj, Sonali Agarwal

Process metrics, valued for their language independence and ease of collection, have been shown to outperform product metrics in defect prediction. Among these, change entropy (Hassan, 2009) is widely used at the file level and has proven highly effective. Additionally, past research suggests that co-change patterns provide valuable insights into software quality. Building on these findings, we introduce Co-Change Graph Entropy, a novel metric that models co-changes as a graph to quantify co-change scattering. Experiments on eight Apache projects reveal a significant correlation between co-change entropy and defect counts at the file level, with a Pearson correlation coefficient of up to 0.54. In filelevel defect classification, replacing change entropy with co-change entropy improves AUROC in 72.5% of cases and MCC in 62.5% across 40 experimental settings (five machine learning classifiers and eight projects), though these improvements are not statistically significant. However, when co-change entropy is combined with change entropy, AUROC improves in 82.5% of cases and MCC in 65%, with statistically significant gains confirmed via the Friedman test followed by the post-hoc Nemenyi test. These results indicate that co-change entropy complements change entropy, significantly enhancing defect classification performance and underscoring its practical importance in defect prediction.

Subject: Software Engineering

Publish: 2025-04-25 17:27:18 UTC


#12 Examining the Impact of Optical Aberrations to Image Classification and Object Detection Models [PDF6] [Copy] [Kimi] [REL]

Authors: Patrick Müller, Alexander Braun, Margret Keuper

Deep neural networks (DNNs) have proven to be successful in various computer vision applications such that models even infer in safety-critical situations. Therefore, vision models have to behave in a robust way to disturbances such as noise or blur. While seminal benchmarks exist to evaluate model robustness to diverse corruptions, blur is often approximated in an overly simplistic way to model defocus, while ignoring the different blur kernel shapes that result from optical systems. To study model robustness against realistic optical blur effects, this paper proposes two datasets of blur corruptions, which we denote OpticsBench and LensCorruptions. OpticsBench examines primary aberrations such as coma, defocus, and astigmatism, i.e. aberrations that can be represented by varying a single parameter of Zernike polynomials. To go beyond the principled but synthetic setting of primary aberrations, LensCorruptions samples linear combinations in the vector space spanned by Zernike polynomials, corresponding to 100 real lenses. Evaluations for image classification and object detection on ImageNet and MSCOCO show that for a variety of different pre-trained models, the performance on OpticsBench and LensCorruptions varies significantly, indicating the need to consider realistic image corruptions to evaluate a model's robustness against blur.

Subject: Computer Vision and Pattern Recognition

Publish: 2025-04-25 17:23:47 UTC


#13 Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation [PDF7] [Copy] [Kimi] [REL]

Authors: Shivam Duggal, Yushi Hu, Oscar Michel, Aniruddha Kembhavi, William T. Freeman, Noah A. Smith, Ranjay Krishna, Antonio Torralba, Ali Farhadi, Wei-Chiu Ma

Despite the unprecedented progress in the field of 3D generation, current systems still often fail to produce high-quality 3D assets that are visually appealing and geometrically and semantically consistent across multiple viewpoints. To effectively assess the quality of the generated 3D data, there is a need for a reliable 3D evaluation tool. Unfortunately, existing 3D evaluation metrics often overlook the geometric quality of generated assets or merely rely on black-box multimodal large language models for coarse assessment. In this paper, we introduce Eval3D, a fine-grained, interpretable evaluation tool that can faithfully evaluate the quality of generated 3D assets based on various distinct yet complementary criteria. Our key observation is that many desired properties of 3D generation, such as semantic and geometric consistency, can be effectively captured by measuring the consistency among various foundation models and tools. We thus leverage a diverse set of models and tools as probes to evaluate the inconsistency of generated 3D assets across different aspects. Compared to prior work, Eval3D provides pixel-wise measurement, enables accurate 3D spatial feedback, and aligns more closely with human judgments. We comprehensively evaluate existing 3D generation models using Eval3D and highlight the limitations and challenges of current models.

Subject: Computer Vision and Pattern Recognition

Publish: 2025-04-25 17:22:05 UTC


#14 Action-Minimization Meets Generative Modeling: Efficient Transition Path Sampling with the Onsager-Machlup Functional [PDF] [Copy] [Kimi] [REL]

Authors: Sanjeev Raja, Martin Šípka, Michael Psenka, Tobias Kreiman, Michal Pavelka, Aditi S. Krishnapriyan

Transition path sampling (TPS), which involves finding probable paths connecting two points on an energy landscape, remains a challenge due to the complexity of real-world atomistic systems. Current machine learning approaches use expensive, task-specific, and data-free training procedures, limiting their ability to benefit from recent advances in atomistic machine learning, such as high-quality datasets and large-scale pre-trained models. In this work, we address TPS by interpreting candidate paths as trajectories sampled from stochastic dynamics induced by the learned score function of pre-trained generative models, specifically denoising diffusion and flow matching. Under these dynamics, finding high-likelihood transition paths becomes equivalent to minimizing the Onsager-Machlup (OM) action functional. This enables us to repurpose pre-trained generative models for TPS in a zero-shot manner, in contrast with bespoke, task-specific TPS models trained in previous work. We demonstrate our approach on varied molecular systems, obtaining diverse, physically realistic transition pathways and generalizing beyond the pre-trained model's original training dataset. Our method can be easily incorporated into new generative models, making it practically relevant as models continue to scale and improve with increased data availability.

Subjects: Machine Learning , Materials Science , Chemical Physics , Biomolecules

Publish: 2025-04-25 17:17:17 UTC


#15 Information Freshness in Dynamic Gossip Networks [PDF] [Copy] [Kimi] [REL]

Authors: Arunabh Srivastava, Thomas Jacob Maranzatto, Sennur Ulukus

We consider a source that shares updates with a network of n gossiping nodes. The network's topology switches between two arbitrary topologies, with switching governed by a two-state continuous time Markov chain (CTMC) process. Information freshness is well-understood for static networks. This work evaluates the impact of time-varying connections on information freshness. In order to quantify the freshness of information, we use the version age of information metric. If the two networks have static long-term average version ages of f1(n) and f2(n) with f1(n)f2(n), then the version age of the varying-topologies network is related to f1(n), f2(n), and the transition rates in the CTMC. If the transition rates in the CTMC are faster than f1(n), the average version age of the varying-topologies network is f1(n). Further, we observe that the behavior of a vanishingly small fraction of nodes can severely impact the long-term average version age of a network in a negative way. This motivates the definition of a typical set of nodes in the network. We evaluate the impact of fast and slow CTMC transition rates on the typical set of nodes.

Subjects: Information Theory , Networking and Internet Architecture , Social and Information Networks , Signal Processing

Publish: 2025-04-25 17:16:36 UTC


#16 Online Distributed Queue Length Estimation [PDF] [Copy] [Kimi] [REL]

Authors: Aditya Bhaskara, Sreenivas Gollapudi, Sungjin Im, Kostas Kollias, Kamesh Munagala

Queue length monitoring is a commonly arising problem in numerous applications such as queue management systems, scheduling, and traffic monitoring. Motivated by such applications, we formulate a queue monitoring problem, where there is a FIFO queue with arbitrary arrivals and departures, and a server needs to monitor the length of a queue by using decentralized pings from packets in the queue. Packets can send pings informing the server about the number of packets ahead of them in the queue. Via novel online policies and lower bounds, we tightly characterize the trade-off between the number of pings sent and the accuracy of the server's real time estimates. Our work studies the trade-off under various arrival and departure processes, including constant-rate, Poisson, and adversarial processes.

Subject: Data Structures and Algorithms

Publish: 2025-04-25 17:15:20 UTC


#17 Boxi: Design Decisions in the Context of Algorithmic Performance for Robotics [PDF] [Copy] [Kimi] [REL]

Authors: Jonas Frey, Turcan Tuna, Lanke Frank Tarimo Fu, Cedric Weibel, Katharine Patterson, Benjamin Krummenacher, Matthias Müller, Julian Nubert, Maurice Fallon, Cesar Cadena, Marco Hutter

Achieving robust autonomy in mobile robots operating in complex and unstructured environments requires a multimodal sensor suite capable of capturing diverse and complementary information. However, designing such a sensor suite involves multiple critical design decisions, such as sensor selection, component placement, thermal and power limitations, compute requirements, networking, synchronization, and calibration. While the importance of these key aspects is widely recognized, they are often overlooked in academia or retained as proprietary knowledge within large corporations. To improve this situation, we present Boxi, a tightly integrated sensor payload that enables robust autonomy of robots in the wild. This paper discusses the impact of payload design decisions made to optimize algorithmic performance for downstream tasks, specifically focusing on state estimation and mapping. Boxi is equipped with a variety of sensors: two LiDARs, 10 RGB cameras including high-dynamic range, global shutter, and rolling shutter models, an RGB-D camera, 7 inertial measurement units (IMUs) of varying precision, and a dual antenna RTK GNSS system. Our analysis shows that time synchronization, calibration, and sensor modality have a crucial impact on the state estimation performance. We frame this analysis in the context of cost considerations and environment-specific challenges. We also present a mobile sensor suite `cookbook` to serve as a comprehensive guideline, highlighting generalizable key design considerations and lessons learned during the development of Boxi. Finally, we demonstrate the versatility of Boxi being used in a variety of applications in real-world scenarios, contributing to robust autonomy. More details and code: https://github.com/leggedrobotics/grand_tour_box

Subjects: Robotics , Systems and Control

Publish: 2025-04-25 17:11:48 UTC


#18 DeSIA: Attribute Inference Attacks Against Limited Fixed Aggregate Statistics [PDF1] [Copy] [Kimi] [REL]

Authors: Yifeng Mao, Bozhidar Stevanoski, Yves-Alexandre de Montjoye

Empirical inference attacks are a popular approach for evaluating the privacy risk of data release mechanisms in practice. While an active attack literature exists to evaluate machine learning models or synthetic data release, we currently lack comparable methods for fixed aggregate statistics, in particular when only a limited number of statistics are released. We here propose an inference attack framework against fixed aggregate statistics and an attribute inference attack called DeSIA. We instantiate DeSIA against the U.S. Census PPMF dataset and show it to strongly outperform reconstruction-based attacks. In particular, we show DeSIA to be highly effective at identifying vulnerable users, achieving a true positive rate of 0.14 at a false positive rate of 103. We then show DeSIA to perform well against users whose attributes cannot be verified and when varying the number of aggregate statistics and level of noise addition. We also perform an extensive ablation study of DeSIA and show how DeSIA can be successfully adapted to the membership inference task. Overall, our results show that aggregation alone is not sufficient to protect privacy, even when a relatively small number of aggregates are being released, and emphasize the need for formal privacy mechanisms and testing before aggregate statistics are released.

Subjects: Cryptography and Security , Artificial Intelligence

Publish: 2025-04-25 17:10:33 UTC


#19 Facets, Taxonomies, and Syntheses: Navigating Structured Representations in LLM-Assisted Literature Review [PDF] [Copy] [Kimi] [REL]

Authors: Raymond Fok, Joseph Chee Chang, Marissa Radensky, Pao Siangliulue, Jonathan Bragg, Amy X. Zhang, Daniel S. Weld

Comprehensive literature review requires synthesizing vast amounts of research -- a labor intensive and cognitively demanding process. Most prior work focuses either on helping researchers deeply understand a few papers (e.g., for triaging or reading), or retrieving from and visualizing a vast corpus. Deep analysis and synthesis of large paper collections (e.g., to produce a survey paper) is largely conducted manually with little support. We present DimInd, an interactive system that scaffolds literature review across large paper collections through LLM-generated structured representations. DimInd scaffolds literature understanding with multiple levels of compression, from papers, to faceted literature comparison tables with information extracted from individual papers, to taxonomies of concepts, to narrative syntheses. Users are guided through these successive information transformations while maintaining provenance to source text. In an evaluation with 23 researchers, DimInd supported participants in extracting information and conceptually organizing papers with less effort compared to a ChatGPT-assisted baseline workflow.

Subject: Human-Computer Interaction

Publish: 2025-04-25 17:09:29 UTC


#20 An Improved ResNet50 Model for Predicting Pavement Condition Index (PCI) Directly from Pavement Images [PDF2] [Copy] [Kimi1] [REL]

Authors: Andrews Danyo, Anthony Dontoh, Armstrong Aboah

Accurately predicting the Pavement Condition Index (PCI), a measure of roadway conditions, from pavement images is crucial for infrastructure maintenance. This study proposes an enhanced version of the Residual Network (ResNet50) architecture, integrated with a Convolutional Block Attention Module (CBAM), to predict PCI directly from pavement images without additional annotations. By incorporating CBAM, the model autonomously prioritizes critical features within the images, improving prediction accuracy. Compared to the original baseline ResNet50 and DenseNet161 architectures, the enhanced ResNet50-CBAM model achieved a significantly lower mean absolute percentage error (MAPE) of 58.16%, compared to the baseline models that achieved 70.76% and 65.48% respectively. These results highlight the potential of using attention mechanisms to refine feature extraction, ultimately enabling more accurate and efficient assessments of pavement conditions. This study emphasizes the importance of targeted feature refinement in advancing automated pavement analysis through attention mechanisms.

Subject: Computer Vision and Pattern Recognition

Publish: 2025-04-25 17:00:50 UTC


#21 Tight Lower Bound for Multicolor Discrepancy [PDF] [Copy] [Kimi] [REL]

Authors: Pasin Manurangsi, Raghu Meka

We prove the following asymptotically tight lower bound for k-color discrepancy: For any k2, there exists a hypergraph with n vertices such that its k-color discrepancy is at least Ω(n). This improves on the previously known lower bound of Ω(n/logk) due to Caragiannis et al. (arXiv:2502.10516). As an application, we show that our result implies improved lower bounds for group fair division.

Subjects: Discrete Mathematics , Computer Science and Game Theory

Publish: 2025-04-25 16:58:59 UTC


#22 Investigating Co-Constructive Behavior of Large Language Models in Explanation Dialogues [PDF] [Copy] [Kimi] [REL]

Authors: Leandra Fichtel, Maximilian Spliethöver, Eyke Hüllermeier, Patricia Jimenez, Nils Klowait, Stefan Kopp, Axel-Cyrille Ngonga Ngomo, Amelie Robrecht, Ingrid Scharlau, Lutz Terfloth, Anna-Lisa Vollmer, Henning Wachsmuth

The ability to generate explanations that are understood by explainees is the quintessence of explainable artificial intelligence. Since understanding depends on the explainee's background and needs, recent research has focused on co-constructive explanation dialogues, where the explainer continuously monitors the explainee's understanding and adapts explanations dynamically. We investigate the ability of large language models (LLMs) to engage as explainers in co-constructive explanation dialogues. In particular, we present a user study in which explainees interact with LLMs, of which some have been instructed to explain a predefined topic co-constructively. We evaluate the explainees' understanding before and after the dialogue, as well as their perception of the LLMs' co-constructive behavior. Our results indicate that current LLMs show some co-constructive behaviors, such as asking verification questions, that foster the explainees' engagement and can improve understanding of a topic. However, their ability to effectively monitor the current understanding and scaffold the explanations accordingly remains limited.

Subject: Computation and Language

Publish: 2025-04-25 16:47:44 UTC


#23 Instrumentation for Better Demonstrations: A Case Study [PDF] [Copy] [Kimi] [REL]

Authors: Remko Proesmans, Thomas Lips, Francis wyffels

Learning from demonstrations is a powerful paradigm for robot manipulation, but its effectiveness hinges on both the quantity and quality of the collected data. In this work, we present a case study of how instrumentation, i.e. integration of sensors, can improve the quality of demonstrations and automate data collection. We instrument a squeeze bottle with a pressure sensor to learn a liquid dispensing task, enabling automated data collection via a PI controller. Transformer-based policies trained on automated demonstrations outperform those trained on human data in 78% of cases. Our findings indicate that instrumentation not only facilitates scalable data collection but also leads to better-performing policies, highlighting its potential in the pursuit of generalist robotic agents.

Subject: Robotics

Publish: 2025-04-25 16:43:20 UTC


#24 Boundaried Kernelization [PDF] [Copy] [Kimi] [REL]

Authors: Leonid Antipov, Stefan Kratsch

The notion of a (polynomial) kernelization from parameterized complexity is a well-studied model for efficient preprocessing for hard computational problems. By now, it is quite well understood which parameterized problems do or (conditionally) do not admit a polynomial kernelization. Unfortunately, polynomial kernelizations seem to require strong restrictions on the global structure of inputs. To avoid this restriction, we propose a model for efficient local preprocessing that is aimed at local structure in inputs. Our notion, dubbed boundaried kernelization, is inspired by protrusions and protrusion replacement, which are tools in meta-kernelization [Bodlaender et al. J'ACM 2016]. Unlike previous work, we study the preprocessing of suitable boundaried graphs in their own right, in significantly more general settings, and aiming for polynomial rather than exponential bounds. We establish polynomial boundaried kernelizations for a number of problems, while unconditionally ruling out such results for others. We also show that boundaried kernelization can be a tool for regular kernelization by using it to obtain an improved kernelization for Vertex Cover parameterized by the vertex-deletion distance to a graph of bounded treedepth.

Subject: Data Structures and Algorithms

Publish: 2025-04-25 16:32:21 UTC


#25 Generative Induction of Dialogue Task Schemas with Streaming Refinement and Simulated Interactions [PDF] [Copy] [Kimi] [REL]

Authors: James D. Finch, Yasasvi Josyula, Jinho D. Choi

In task-oriented dialogue (TOD) systems, Slot Schema Induction (SSI) is essential for automatically identifying key information slots from dialogue data without manual intervention. This paper presents a novel state-of-the-art (SoTA) approach that formulates SSI as a text generation task, where a language model incrementally constructs and refines a slot schema over a stream of dialogue data. To develop this approach, we present a fully automatic LLM-based TOD simulation method that creates data with high-quality state labels for novel task domains. Furthermore, we identify issues in SSI evaluation due to data leakage and poor metric alignment with human judgment. We resolve these by creating new evaluation data using our simulation method with human guidance and correction, as well as designing improved evaluation metrics. These contributions establish a foundation for future SSI research and advance the SoTA in dialogue understanding and system development.

Subject: Computation and Language

Publish: 2025-04-25 16:29:45 UTC