Artificial Intelligence

Date: Fri, 19 Jul 2024 | Total: 154

#1 Scaling Granite Code Models to 128K Context [PDF4] [Copy] [Kimi7]

Authors: Matt Stallone ; Vaibhav Saxena ; Leonid Karlinsky ; Bridget McGinn ; Tim Bula ; Mayank Mishra ; Adriana Meza Soria ; Gaoyuan Zhang ; Aditya Prasad ; Yikang Shen ; Saptha Surendran ; Shanmukha Guttula ; Hima Patel ; Parameswaran Selvam ; Xuan-Hong Dang ; Yan Koyfman ; Atin Sood ; Rogerio Feris ; Nirmit Desai ; David D. Cox ; Ruchir Puri ; Rameswar Panda

This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency with repository-level file packing and length-upsampled long-context data. Additionally, we also release instruction-tuned models with long-context support which are derived by further finetuning the long context base models on a mix of permissively licensed short and long-context instruction-response pairs. While comparing to the original short-context Granite code models, our long-context models achieve significant improvements on long-context tasks without any noticeable performance degradation on regular code completion benchmarks (e.g., HumanEval). We release all our long-context Granite code models under an Apache 2.0 license for both research and commercial use.

Subjects: Artificial Intelligence ; Computation and Language ; Software Engineering

Publish: 2024-07-18 17:46:02 UTC

#2 Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization [PDF2] [Copy] [Kimi2]

Authors: Audrey Huang ; Wenhao Zhan ; Tengyang Xie ; Jason D. Lee ; Wen Sun ; Akshay Krishnamurthy ; Dylan J. Foster

Language model alignment methods, such as reinforcement learning from human feedback (RLHF), have led to impressive advances in language model capabilities, but existing techniques are limited by a widely observed phenomenon known as overoptimization, where the quality of the language model plateaus or degrades over the course of the alignment process. Overoptimization is often attributed to overfitting to an inaccurate reward model, and while it can be mitigated through online data collection, this is infeasible in many settings. This raises a fundamental question: Do existing offline alignment algorithms make the most of the data they have, or can their sample-efficiency be improved further? We address this question with a new algorithm for offline alignment, $\chi^2$-Preference Optimization ($\chi$PO). $\chi$PO is a one-line change to Direct Preference Optimization (DPO; Rafailov et al., 2023), which only involves modifying the logarithmic link function in the DPO objective. Despite this minimal change, $\chi$PO implicitly implements the principle of pessimism in the face of uncertainty via regularization with the $\chi^2$-divergence -- which quantifies uncertainty more effectively than KL-regularization -- and provably alleviates overoptimization, achieving sample-complexity guarantees based on single-policy concentrability -- the gold standard in offline reinforcement learning. $\chi$PO's simplicity and strong guarantees make it the first practical and general-purpose offline alignment algorithm that is provably robust to overoptimization.

Subjects: Artificial Intelligence ; Computation and Language ; Machine Learning

Publish: 2024-07-18 11:08:40 UTC

#3 Sortability of Time Series Data [PDF] [Copy] [Kimi]

Authors: Christopher Lohse ; Jonas Wahl

Evaluating the performance of causal discovery algorithms that aim to find causal relationships between time-dependent processes remains a challenging topic. In this paper, we show that certain characteristics of datasets, such as varsortability (Reisach et al. 2021) and $R^2$-sortability (Reisach et al. 2023), also occur in datasets for autocorrelated stationary time series. We illustrate this empirically using four types of data: simulated data based on SVAR models and Erd\H{o}s-R\'enyi graphs, the data used in the 2019 causality-for-climate challenge (Runge et al. 2019), real-world river stream datasets, and real-world data generated by the Causal Chamber of (Gamella et al. 2024). To do this, we adapt var- and $R^2$-sortability to time series data. We also investigate the extent to which the performance of score-based causal discovery methods goes hand in hand with high sortability. Arguably, our most surprising finding is that the investigated real-world datasets exhibit high varsortability and low $R^2$-sortability indicating that scales may carry a significant amount of causal information.

Subject: Artificial Intelligence

Publish: 2024-07-18 09:15:39 UTC

#4 Mixture of Experts based Multi-task Supervise Learning from Crowds [PDF] [Copy] [Kimi2]

Authors: Tao Han ; Huaixuan Shi ; Xinyi Ding ; Xiao Ma ; Huamao Gu ; Yili Fang

Existing truth inference methods in crowdsourcing aim to map redundant labels and items to the ground truth. They treat the ground truth as hidden variables and use statistical or deep learning-based worker behavior models to infer the ground truth. However, worker behavior models that rely on ground truth hidden variables overlook workers' behavior at the item feature level, leading to imprecise characterizations and negatively impacting the quality of truth inference. This paper proposes a new paradigm of multi-task supervised learning from crowds, which eliminates the need for modeling of items's ground truth in worker behavior models. Within this paradigm, we propose a worker behavior model at the item feature level called Mixture of Experts based Multi-task Supervised Learning from Crowds (MMLC). Two truth inference strategies are proposed within MMLC. The first strategy, named MMLC-owf, utilizes clustering methods in the worker spectral space to identify the projection vector of the oracle worker. Subsequently, the labels generated based on this vector are considered as the inferred truth. The second strategy, called MMLC-df, employs the MMLC model to fill the crowdsourced data, which can enhance the effectiveness of existing truth inference methods. Experimental results demonstrate that MMLC-owf outperforms state-of-the-art methods and MMLC-df enhances the quality of existing truth inference methods.

Subjects: Artificial Intelligence ; Machine Learning

Publish: 2024-07-18 08:21:31 UTC

#5 LLM-Empowered State Representation for Reinforcement Learning [PDF] [Copy] [Kimi1]

Authors: Boyuan Wang ; Yun Qu ; Yuhang Jiang ; Jianzhun Shao ; Chang Liu ; Wenming Yang ; Xiangyang Ji

Conventional state representations in reinforcement learning often omit critical task-related details, presenting a significant challenge for value networks in establishing accurate mappings from states to task rewards. Traditional methods typically depend on extensive sample learning to enrich state representations with task-specific information, which leads to low sample efficiency and high time costs. Recently, surging knowledgeable large language models (LLM) have provided promising substitutes for prior injection with minimal human intervention. Motivated by this, we propose LLM-Empowered State Representation (LESR), a novel approach that utilizes LLM to autonomously generate task-related state representation codes which help to enhance the continuity of network mappings and facilitate efficient training. Experimental results demonstrate LESR exhibits high sample efficiency and outperforms state-of-the-art baselines by an average of 29% in accumulated reward in Mujoco tasks and 30% in success rates in Gym-Robotics tasks.

Subject: Artificial Intelligence

Publish: 2024-07-18 07:47:51 UTC

#6 SciCode: A Research Coding Benchmark Curated by Scientists [PDF] [Copy] [Kimi1]

Authors: Minyang Tian ; Luyu Gao ; Shizhuo Dylan Zhang ; Xinan Chen ; Cunwei Fan ; Xuefei Guo ; Roland Haas ; Pan Ji ; Kittithat Krongchon ; Yao Li ; Shengyan Liu ; Di Luo ; Yutao Ma ; Hao Tong ; Kha Trinh ; Chenyu Tian ; Zihan Wang ; Bohao Wu ; Yanyu Xiong ; Shengzhu Yin ; Minhui Zhu ; Kilian Lieret ; Yanxin Lu ; Genglin Liu ; Yufeng Du ; Tianhua Tao ; Ofir Press ; Jamie Callan ; Eliu Huerta ; Hao Peng

Since language models (LMs) now outperform average humans on many challenging tasks, it has become increasingly difficult to develop challenging, high-quality, and realistic evaluations. We address this issue by examining LMs' capabilities to generate code for solving real scientific research problems. Incorporating input from scientists and AI researchers in 16 diverse natural science sub-fields, including mathematics, physics, chemistry, biology, and materials science, we created a scientist-curated coding benchmark, SciCode. The problems in SciCode naturally factorize into multiple subproblems, each involving knowledge recall, reasoning, and code synthesis. In total, SciCode contains 338 subproblems decomposed from 80 challenging main problems. It offers optional descriptions specifying useful scientific background information and scientist-annotated gold-standard solutions and test cases for evaluation. Claude3.5-Sonnet, the best-performing model among those tested, can solve only 4.6% of the problems in the most realistic setting. We believe that SciCode demonstrates both contemporary LMs' progress towards becoming helpful scientific assistants and sheds light on the development and evaluation of scientific AI in the future.

Subjects: Artificial Intelligence ; Computation and Language

Publish: 2024-07-18 05:15:24 UTC

#7 Multiobjective Vehicle Routing Optimization with Time Windows: A Hybrid Approach Using Deep Reinforcement Learning and NSGA-II [PDF] [Copy] [Kimi]

Authors: Rixin Wu ; Ran Wang ; Jie Hao ; Qiang Wu ; Ping Wang ; Dusit Niyato

This paper proposes a weight-aware deep reinforcement learning (WADRL) approach designed to address the multiobjective vehicle routing problem with time windows (MOVRPTW), aiming to use a single deep reinforcement learning (DRL) model to solve the entire multiobjective optimization problem. The Non-dominated sorting genetic algorithm-II (NSGA-II) method is then employed to optimize the outcomes produced by the WADRL, thereby mitigating the limitations of both approaches. Firstly, we design an MOVRPTW model to balance the minimization of travel cost and the maximization of customer satisfaction. Subsequently, we present a novel DRL framework that incorporates a transformer-based policy network. This network is composed of an encoder module, a weight embedding module where the weights of the objective functions are incorporated, and a decoder module. NSGA-II is then utilized to optimize the solutions generated by WADRL. Finally, extensive experimental results demonstrate that our method outperforms the existing and traditional methods. Due to the numerous constraints in VRPTW, generating initial solutions of the NSGA-II algorithm can be time-consuming. However, using solutions generated by the WADRL as initial solutions for NSGA-II significantly reduces the time required for generating initial solutions. Meanwhile, the NSGA-II algorithm can enhance the quality of solutions generated by WADRL, resulting in solutions with better scalability. Notably, the weight-aware strategy significantly reduces the training time of DRL while achieving better results, enabling a single DRL model to solve the entire multiobjective optimization problem.

Subject: Artificial Intelligence

Publish: 2024-07-18 02:46:06 UTC

#8 On Causally Disentangled State Representation Learning for Reinforcement Learning based Recommender Systems [PDF1] [Copy] [Kimi]

Authors: Siyu Wang ; Xiaocong Chen ; Lina Yao

In Reinforcement Learning-based Recommender Systems (RLRS), the complexity and dynamism of user interactions often result in high-dimensional and noisy state spaces, making it challenging to discern which aspects of the state are truly influential in driving the decision-making process. This issue is exacerbated by the evolving nature of user preferences and behaviors, requiring the recommender system to adaptively focus on the most relevant information for decision-making while preserving generaliability. To tackle this problem, we introduce an innovative causal approach for decomposing the state and extracting \textbf{C}ausal-\textbf{I}n\textbf{D}ispensable \textbf{S}tate Representations (CIDS) in RLRS. Our method concentrates on identifying the \textbf{D}irectly \textbf{A}ction-\textbf{I}nfluenced \textbf{S}tate Variables (DAIS) and \textbf{A}ction-\textbf{I}nfluence \textbf{A}ncestors (AIA), which are essential for making effective recommendations. By leveraging conditional mutual information, we develop a framework that not only discerns the causal relationships within the generative process but also isolates critical state variables from the typically dense and high-dimensional state representations. We provide theoretical evidence for the identifiability of these variables. Then, by making use of the identified causal relationship, we construct causal-indispensable state representations, enabling the training of policies over a more advantageous subset of the agent's state space. We demonstrate the efficacy of our approach through extensive experiments, showcasing our method outperforms state-of-the-art methods.

Subjects: Artificial Intelligence ; Information Retrieval

Publish: 2024-07-18 01:41:05 UTC

#9 MetaSumPerceiver: Multimodal Multi-Document Evidence Summarization for Fact-Checking [PDF] [Copy] [Kimi1]

Authors: Ting-Chih Chen ; Chia-Wei Tang ; Chris Thomas

Fact-checking real-world claims often requires reviewing multiple multimodal documents to assess a claim's truthfulness, which is a highly laborious and time-consuming task. In this paper, we present a summarization model designed to generate claim-specific summaries useful for fact-checking from multimodal, multi-document datasets. The model takes inputs in the form of documents, images, and a claim, with the objective of assisting in fact-checking tasks. We introduce a dynamic perceiver-based model that can handle inputs from multiple modalities of arbitrary lengths. To train our model, we leverage a novel reinforcement learning-based entailment objective to generate summaries that provide evidence distinguishing between different truthfulness labels. To assess the efficacy of our approach, we conduct experiments on both an existing benchmark and a new dataset of multi-document claims that we contribute. Our approach outperforms the SOTA approach by 4.6% in the claim verification task on the MOCHEG dataset and demonstrates strong performance on our new Multi-News-Fact-Checking dataset.

Subjects: Artificial Intelligence ; Computation and Language

Publish: 2024-07-18 01:33:20 UTC

#10 Comprehensive Review and Empirical Evaluation of Causal Discovery Algorithms for Numerical Data [PDF] [Copy] [Kimi]

Authors: Wenjin Niu ; Zijun Gao ; Liyan Song ; Lingbo Li

Causal analysis has become an essential component in understanding the underlying causes of phenomena across various fields. Despite its significance, the existing literature on causal discovery algorithms is fragmented, with inconsistent methodologies and a lack of comprehensive evaluations. This study addresses these gaps by conducting an exhaustive review and empirical evaluation of causal discovery methods for numerical data, aiming to provide a clearer and more structured understanding of the field. Our research began with a comprehensive literature review spanning over a decade, revealing that existing surveys fall short in covering the vast array of causal discovery advancements. We meticulously analyzed over 200 scholarly articles to identify 24 distinct algorithms. This extensive analysis led to the development of a novel taxonomy tailored to the complexities of causal discovery, categorizing methods into six main types. Addressing the lack of comprehensive evaluations, our study conducts an extensive empirical assessment of more than 20 causal discovery algorithms on synthetic and real-world datasets. We categorize synthetic datasets based on size, linearity, and noise distribution, employing 5 evaluation metrics, and summarized the top-3 algorithm recommendations for different data scenarios. The recommendations have been validated on 2 real-world datasets. Our results highlight the significant impact of dataset characteristics on algorithm performance. Moreover, a metadata extraction strategy was developed to assist users in algorithm selection on unknown datasets. The accuracy of estimating metadata is higher than 80%. Based on these insights, we offer professional and practical recommendations to help users choose the most suitable causal discovery methods for their specific dataset needs.

Subject: Artificial Intelligence

Publish: 2024-07-17 23:47:05 UTC

#11 Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems [PDF] [Copy] [Kimi]

Authors: Tamer Abuelsaad ; Deepak Akkil ; Prasenjit Dey ; Ashish Jagmohan ; Aditya Vempaty ; Ravi Kokku

AI Agents are changing the way work gets done, both in consumer and enterprise domains. However, the design patterns and architectures to build highly capable agents or multi-agent systems are still developing, and the understanding of the implication of various design choices and algorithms is still evolving. In this paper, we present our work on building a novel web agent, Agent-E \footnote{Our code is available at \url{}}. Agent-E introduces numerous architectural improvements over prior state-of-the-art web agents such as hierarchical architecture, flexible DOM distillation and denoising method, and the concept of \textit{change observation} to guide the agent towards more accurate performance. We first present the results of an evaluation of Agent-E on WebVoyager benchmark dataset and show that Agent-E beats other SOTA text and multi-modal web agents on this benchmark in most categories by 10-30\%. We then synthesize our learnings from the development of Agent-E into general design principles for developing agentic systems. These include the use of domain-specific primitive skills, the importance of distillation and de-noising of environmental observations, the advantages of a hierarchical architecture, and the role of agentic self-improvement to enhance agent efficiency and efficacy as the agent gathers experience.

Subject: Artificial Intelligence

Publish: 2024-07-17 21:44:28 UTC

#12 A Three-Stage Algorithm for the Closest String Problem on Artificial and Real Gene Sequences [PDF] [Copy] [Kimi]

Authors: Alireza Abdi ; Marko Djukanovic ; Hesam Tahmasebi Boldaji ; Hadis Salehi ; Aleksandar Kartelj

The Closest String Problem is an NP-hard problem that aims to find a string that has the minimum distance from all sequences that belong to the given set of strings. Its applications can be found in coding theory, computational biology, and designing degenerated primers, among others. There are efficient exact algorithms that have reached high-quality solutions for binary sequences. However, there is still room for improvement concerning the quality of solutions over DNA and protein sequences. In this paper, we introduce a three-stage algorithm that comprises the following process: first, we apply a novel alphabet pruning method to reduce the search space for effectively finding promising search regions. Second, a variant of beam search to find a heuristic solution is employed. This method utilizes a newly developed guiding function based on an expected distance heuristic score of partial solutions. Last, we introduce a local search to improve the quality of the solution obtained from the beam search. Furthermore, due to the lack of real-world benchmarks, two real-world datasets are introduced to verify the robustness of the method. The extensive experimental results show that the proposed method outperforms the previous approaches from the literature.

Subject: Artificial Intelligence

Publish: 2024-07-17 21:26:27 UTC

#13 Beyond the Veil of Similarity: Quantifying Semantic Continuity in Explainable AI [PDF] [Copy] [Kimi]

Authors: Qi Huang ; Emanuele Mezzi ; Osman Mutlu ; Miltiadis Kofinas ; Vidya Prasad ; Shadnan Azwad Khan ; Elena Ranguelova ; Niki van Stein

We introduce a novel metric for measuring semantic continuity in Explainable AI methods and machine learning models. We posit that for models to be truly interpretable and trustworthy, similar inputs should yield similar explanations, reflecting a consistent semantic understanding. By leveraging XAI techniques, we assess semantic continuity in the task of image recognition. We conduct experiments to observe how incremental changes in input affect the explanations provided by different XAI methods. Through this approach, we aim to evaluate the models' capability to generalize and abstract semantic concepts accurately and to evaluate different XAI methods in correctly capturing the model behaviour. This paper contributes to the broader discourse on AI interpretability by proposing a quantitative measure for semantic continuity for XAI methods, offering insights into the models' and explainers' internal reasoning processes, and promoting more reliable and transparent AI systems.

Subjects: Artificial Intelligence ; Computer Vision and Pattern Recognition ; Machine Learning

Publish: 2024-07-17 18:32:41 UTC

#14 Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data [PDF1] [Copy] [Kimi2]

Author: Charles Jin

As language models (LMs) deliver increasing performance on a range of NLP tasks, probing classifiers have become an indispensable technique in the effort to better understand their inner workings. A typical setup involves (1) defining an auxiliary task consisting of a dataset of text annotated with labels, then (2) supervising small classifiers to predict the labels from the representations of a pretrained LM as it processed the dataset. A high probing accuracy is interpreted as evidence that the LM has learned to perform the auxiliary task as an unsupervised byproduct of its original pretraining objective. Despite the widespread usage of probes, however, the robust design and analysis of probing experiments remains a challenge. We develop a formal perspective on probing using structural causal models (SCM). Specifically, given an SCM which explains the distribution of tokens observed during training, we frame the central hypothesis as whether the LM has learned to represent the latent variables of the SCM. Empirically, we extend a recent study of LMs in the context of a synthetic grid-world navigation task, where having an exact model of the underlying causal structure allows us to draw strong inferences from the result of probing experiments. Our techniques provide robust empirical evidence for the ability of LMs to learn the latent causal concepts underlying text.

Subjects: Computation and Language ; Artificial Intelligence

Publish: 2024-07-18 17:59:27 UTC

#15 Neural Network Tire Force Modeling for Automated Drifting [PDF] [Copy] [Kimi]

Authors: Nicholas Drake Broadbent ; Trey Weber ; Daiki Mori ; J. Christian Gerdes

Automated drifting presents a challenge problem for vehicle control, requiring models and control algorithms that can precisely handle nonlinear, coupled tire forces at the friction limits. We present a neural network architecture for predicting front tire lateral force as a drop-in replacement for physics-based approaches. With a full-scale automated vehicle purpose-built for the drifting application, we deploy these models in a nonlinear model predictive controller tuned for tracking a reference drifting trajectory, for direct comparisons of model performance. The neural network tire model exhibits significantly improved path tracking performance over the brush tire model in cases where front-axle braking force is applied, suggesting the neural network's ability to express previously unmodeled, latent dynamics in the drifting condition.

Subjects: Systems and Control ; Artificial Intelligence ; Systems and Control

Publish: 2024-07-18 17:58:01 UTC

#16 Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models [PDF] [Copy] [Kimi3]

Authors: Zhuo Chen ; Jiawei Liu ; Haotan Liu ; Qikai Cheng ; Fan Zhang ; Wei Lu ; Xiaozhong Liu

Retrieval-Augmented Generation (RAG) is applied to solve hallucination problems and real-time constraints of large language models, but it also induces vulnerabilities against retrieval corruption attacks. Existing research mainly explores the unreliability of RAG in white-box and closed-domain QA tasks. In this paper, we aim to reveal the vulnerabilities of Retrieval-Enhanced Generative (RAG) models when faced with black-box attacks for opinion manipulation. We explore the impact of such attacks on user cognition and decision-making, providing new insight to enhance the reliability and security of RAG models. We manipulate the ranking results of the retrieval model in RAG with instruction and use these results as data to train a surrogate model. By employing adversarial retrieval attack methods to the surrogate model, black-box transfer attacks on RAG are further realized. Experiments conducted on opinion datasets across multiple topics show that the proposed attack strategy can significantly alter the opinion polarity of the content generated by RAG. This demonstrates the model's vulnerability and, more importantly, reveals the potential negative impact on user cognition and decision-making, making it easier to mislead users into accepting incorrect or biased information.

Subjects: Computation and Language ; Artificial Intelligence ; Cryptography and Security

Publish: 2024-07-18 17:55:55 UTC

#17 Temporal Representation Learning for Stock Similarities and Its Applications in Investment Management [PDF] [Copy] [Kimi]

Authors: Yoontae Hwang ; Stefan Zohren ; Yongjae Lee

In the era of rapid globalization and digitalization, accurate identification of similar stocks has become increasingly challenging due to the non-stationary nature of financial markets and the ambiguity in conventional regional and sector classifications. To address these challenges, we examine SimStock, a novel temporal self-supervised learning framework that combines techniques from self-supervised learning (SSL) and temporal domain generalization to learn robust and informative representations of financial time series data. The primary focus of our study is to understand the similarities between stocks from a broader perspective, considering the complex dynamics of the global financial landscape. We conduct extensive experiments on four real-world datasets with thousands of stocks and demonstrate the effectiveness of SimStock in finding similar stocks, outperforming existing methods. The practical utility of SimStock is showcased through its application to various investment strategies, such as pairs trading, index tracking, and portfolio optimization, where it leads to superior performance compared to conventional methods. Our findings empirically examine the potential of data-driven approach to enhance investment decision-making and risk management practices by leveraging the power of temporal self-supervised learning in the face of the ever-changing global financial landscape.

Subjects: Computational Finance ; Artificial Intelligence

Publish: 2024-07-18 17:54:13 UTC

#18 LLMs as Function Approximators: Terminology, Taxonomy, and Questions for Evaluation [PDF1] [Copy] [Kimi2]

Author: David Schlangen

Natural Language Processing has moved rather quickly from modelling specific tasks to taking more general pre-trained models and fine-tuning them for specific tasks, to a point where we now have what appear to be inherently generalist models. This paper argues that the resultant loss of clarity on what these models model leads to metaphors like "artificial general intelligences" that are not helpful for evaluating their strengths and weaknesses. The proposal is to see their generality, and their potential value, in their ability to approximate specialist function, based on a natural language specification. This framing brings to the fore questions of the quality of the approximation, but beyond that, also questions of discoverability, stability, and protectability of these functions. As the paper will show, this framing hence brings together in one conceptual framework various aspects of evaluation, both from a practical and a theoretical perspective, as well as questions often relegated to a secondary status (such as "prompt injection" and "jailbreaking").

Subjects: Computation and Language ; Artificial Intelligence

Publish: 2024-07-18 17:49:56 UTC

#19 CellularLint: A Systematic Approach to Identify Inconsistent Behavior in Cellular Network Specifications [PDF] [Copy] [Kimi]

Authors: Mirza Masfiqur Rahman ; Imtiaz Karim ; Elisa Bertino

In recent years, there has been a growing focus on scrutinizing the security of cellular networks, often attributing security vulnerabilities to issues in the underlying protocol design descriptions. These protocol design specifications, typically extensive documents that are thousands of pages long, can harbor inaccuracies, underspecifications, implicit assumptions, and internal inconsistencies. In light of the evolving landscape, we introduce CellularLint--a semi-automatic framework for inconsistency detection within the standards of 4G and 5G, capitalizing on a suite of natural language processing techniques. Our proposed method uses a revamped few-shot learning mechanism on domain-adapted large language models. Pre-trained on a vast corpus of cellular network protocols, this method enables CellularLint to simultaneously detect inconsistencies at various levels of semantics and practical use cases. In doing so, CellularLint significantly advances the automated analysis of protocol specifications in a scalable fashion. In our investigation, we focused on the Non-Access Stratum (NAS) and the security specifications of 4G and 5G networks, ultimately uncovering 157 inconsistencies with 82.67% accuracy. After verification of these inconsistencies on open-source implementations and 17 commercial devices, we confirm that they indeed have a substantial impact on design decisions, potentially leading to concerns related to privacy, integrity, availability, and interoperability.

Subjects: Cryptography and Security ; Artificial Intelligence ; Information Retrieval

Publish: 2024-07-18 17:48:46 UTC

#20 Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review [PDF1] [Copy] [Kimi1]

Authors: Masatoshi Uehara ; Yulai Zhao ; Tommaso Biancalani ; Sergey Levine

This tutorial provides a comprehensive survey of methods for fine-tuning diffusion models to optimize downstream reward functions. While diffusion models are widely known to provide excellent generative modeling capability, practical applications in domains such as biology require generating samples that maximize some desired metric (e.g., translation efficiency in RNA, docking score in molecules, stability in protein). In these cases, the diffusion model can be optimized not only to generate realistic samples but also to explicitly maximize the measure of interest. Such methods are based on concepts from reinforcement learning (RL). We explain the application of various RL algorithms, including PPO, differentiable optimization, reward-weighted MLE, value-weighted sampling, and path consistency learning, tailored specifically for fine-tuning diffusion models. We aim to explore fundamental aspects such as the strengths and limitations of different RL-based fine-tuning algorithms across various scenarios, the benefits of RL-based fine-tuning compared to non-RL-based approaches, and the formal objectives of RL-based fine-tuning (target distributions). Additionally, we aim to examine their connections with related topics such as classifier guidance, Gflownets, flow-based diffusion models, path integral control theory, and sampling from unnormalized distributions such as MCMC. The code of this tutorial is available at

Subjects: Machine Learning ; Artificial Intelligence ; Quantitative Methods ; Machine Learning

Publish: 2024-07-18 17:35:32 UTC

#21 CoDefeater: Using LLMs To Find Defeaters in Assurance Cases [PDF] [Copy] [Kimi]

Authors: Usman Gohar ; Michael C. Hunter ; Robyn R. Lutz ; Myra B. Cohen

Constructing assurance cases is a widely used, and sometimes required, process toward demonstrating that safety-critical systems will operate safely in their planned environment. To mitigate the risk of errors and missing edge cases, the concept of defeaters - arguments or evidence that challenge claims in an assurance case - has been introduced. Defeaters can provide timely detection of weaknesses in the arguments, prompting further investigation and timely mitigations. However, capturing defeaters relies on expert judgment, experience, and creativity and must be done iteratively due to evolving requirements and regulations. This paper proposes CoDefeater, an automated process to leverage large language models (LLMs) for finding defeaters. Initial results on two systems show that LLMs can efficiently find known and unforeseen feasible defeaters to support safety analysts in enhancing the completeness and confidence of assurance cases.

Subjects: Software Engineering ; Artificial Intelligence

Publish: 2024-07-18 17:16:35 UTC

#22 FSP-Laplace: Function-Space Priors for the Laplace Approximation in Bayesian Deep Learning [PDF] [Copy] [Kimi]

Authors: Tristan Cinquin ; Marvin Pförtner ; Vincent Fortuin ; Philipp Hennig ; Robert Bamler

Laplace approximations are popular techniques for endowing deep networks with epistemic uncertainty estimates as they can be applied without altering the predictions of the neural network, and they scale to large models and datasets. While the choice of prior strongly affects the resulting posterior distribution, computational tractability and lack of interpretability of weight space typically limit the Laplace approximation to isotropic Gaussian priors, which are known to cause pathological behavior as depth increases. As a remedy, we directly place a prior on function space. More precisely, since Lebesgue densities do not exist on infinite-dimensional function spaces, we have to recast training as finding the so-called weak mode of the posterior measure under a Gaussian process (GP) prior restricted to the space of functions representable by the neural network. Through the GP prior, one can express structured and interpretable inductive biases, such as regularity or periodicity, directly in function space, while still exploiting the implicit inductive biases that allow deep networks to generalize. After model linearization, the training objective induces a negative log-posterior density to which we apply a Laplace approximation, leveraging highly scalable methods from matrix-free linear algebra. Our method provides improved results where prior knowledge is abundant, e.g., in many scientific inference tasks. At the same time, it stays competitive for black-box regression and classification tasks where neural networks typically excel.

Subjects: Machine Learning ; Artificial Intelligence

Publish: 2024-07-18 17:08:58 UTC

#23 OxonFair: A Flexible Toolkit for Algorithmic Fairness [PDF] [Copy] [Kimi]

Authors: Eoin Delaney ; Zihao Fu ; Sandra Wachter ; Brent Mittelstadt ; Chris Russell

We present OxonFair, a new open source toolkit for enforcing fairness in binary classification. Compared to existing toolkits: (i) We support NLP and Computer Vision classification as well as standard tabular problems. (ii) We support enforcing fairness on validation data, making us robust to a wide range of overfitting challenges. (iii) Our approach can optimize any measure based on True Positives, False Positive, False Negatives, and True Negatives. This makes it easily extendable and much more expressive than existing toolkits. It supports 9/9 and 10/10 of the decision-based group metrics of two popular review papers. (iv) We jointly optimize a performance objective. This not only minimizes degradation while enforcing fairness, but can improve the performance of otherwise inadequately tuned unfair baselines. OxonFair is compatible with standard ML toolkits including sklearn, Autogluon, and PyTorch and is available online at

Subjects: Computers and Society ; Artificial Intelligence ; Machine Learning

Publish: 2024-06-30 16:41:28 UTC

#24 Cross-Task Attack: A Self-Supervision Generative Framework Based on Attention Shift [PDF1] [Copy] [Kimi]

Authors: Qingyuan Zeng ; Yunpeng Gong ; Min Jiang

Studying adversarial attacks on artificial intelligence (AI) systems helps discover model shortcomings, enabling the construction of a more robust system. Most existing adversarial attack methods only concentrate on single-task single-model or single-task cross-model scenarios, overlooking the multi-task characteristic of artificial intelligence systems. As a result, most of the existing attacks do not pose a practical threat to a comprehensive and collaborative AI system. However, implementing cross-task attacks is highly demanding and challenging due to the difficulty in obtaining the real labels of different tasks for the same picture and harmonizing the loss functions across different tasks. To address this issue, we propose a self-supervised Cross-Task Attack framework (CTA), which utilizes co-attention and anti-attention maps to generate cross-task adversarial perturbation. Specifically, the co-attention map reflects the area to which different visual task models pay attention, while the anti-attention map reflects the area that different visual task models neglect. CTA generates cross-task perturbations by shifting the attention area of samples away from the co-attention map and closer to the anti-attention map. We conduct extensive experiments on multiple vision tasks and the experimental results confirm the effectiveness of the proposed design for adversarial attacks.

Subjects: Computer Vision and Pattern Recognition ; Artificial Intelligence

Publish: 2024-07-18 17:01:10 UTC

#25 A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice [PDF1] [Copy] [Kimi]

Authors: Shaina Raza ; Mizanur Rahman ; Safiullah Kamawal ; Armin Toroghi ; Ananya Raval ; Farshad Navah ; Amirmohammad Kazemeini

Recommender Systems (RS) play an integral role in enhancing user experiences by providing personalized item suggestions. This survey reviews the progress in RS inclusively from 2017 to 2024, effectively connecting theoretical advances with practical applications. We explore the development from traditional RS techniques like content-based and collaborative filtering to advanced methods involving deep learning, graph-based models, reinforcement learning, and large language models. We also discuss specialized systems such as context-aware, review-based, and fairness-aware RS. The primary goal of this survey is to bridge theory with practice. It addresses challenges across various sectors, including e-commerce, healthcare, and finance, emphasizing the need for scalable, real-time, and trustworthy solutions. Through this survey, we promote stronger partnerships between academic research and industry practices. The insights offered by this survey aim to guide industry professionals in optimizing RS deployment and to inspire future research directions, especially in addressing emerging technological and societal trends

Subjects: Information Retrieval ; Artificial Intelligence

Publish: 2024-07-18 17:00:53 UTC