cs.AI, cs.CL, cs.CV, cs.LG | Cool Papers - Immersive Paper Discovery

#1 XQSV: A Structurally Variable Network to Imitate Human Play in Xiangqi [PDF⁵] [Copy] [Kimi¹⁶]

In this paper, we introduce an innovative deep learning architecture, termed Xiangqi Structurally Variable (XQSV), designed to emulate the behavioral patterns of human players in Xiangqi, or Chinese Chess. The unique attribute of XQSV is its capacity to alter its structural configuration dynamically, optimizing performance for the task based on the particular subset of data on which it is trained. We have incorporated several design improvements to significantly enhance the network's predictive accuracy, including a local illegal move filter, an Elo range partitioning, a sequential one-dimensional input, and a simulation of imperfect memory capacity. Empirical evaluations reveal that XQSV attains a predictive accuracy of approximately 40%, with its performance peaking within the trained Elo range. This indicates the model's success in mimicking the play behavior of individuals within that specific range. A three-terminal Turing Test was employed to demonstrate that the XQSV model imitates human behavior more accurately than conventional Xiangqi engines, rendering it indistinguishable from actual human opponents. Given the inherent nondeterminism in human gameplay, we propose two supplementary relaxed evaluation metrics. To our knowledge, XQSV represents the first model to mimic Xiangqi players.

Subject: Artificial Intelligence

Publish: 2024-07-05 17:43:05 UTC

#2 Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games [PDF] [Copy] [Kimi¹⁴]

Authors: Nathan Herr ; Fernando Acero ; Roberta Raileanu ; María Pérez-Ortiz ; Zhibin Li

Large Language Models (LLMs) have been increasingly used in real-world settings, yet their strategic abilities remain largely unexplored. Game theory provides a good framework for assessing the decision-making abilities of LLMs in interactions with other agents. Although prior studies have shown that LLMs can solve these tasks with carefully curated prompts, they fail when the problem setting or prompt changes. In this work we investigate LLMs' behaviour in strategic games, Stag Hunt and Prisoner Dilemma, analyzing performance variations under different settings and prompts. Our results show that the tested state-of-the-art LLMs exhibit at least one of the following systematic biases: (1) positional bias, (2) payoff bias, or (3) behavioural bias. Subsequently, we observed that the LLMs' performance drops when the game configuration is misaligned with the affecting biases. Performance is assessed based on the selection of the correct action, one which agrees with the prompted preferred behaviours of both players. Alignment refers to whether the LLM's bias aligns with the correct action. For example, GPT-4o's average performance drops by 34% when misaligned. Additionally, the current trend of "bigger and newer is better" does not hold for the above, where GPT-4o (the current best-performing LLM) suffers the most substantial performance drop. Lastly, we note that while chain-of-thought prompting does reduce the effect of the biases on most models, it is far from solving the problem at the fundamental level.

Subjects: Artificial Intelligence ; Computation and Language ; Computer Science and Game Theory

Publish: 2024-07-05 12:30:02 UTC

#3 The Complexity of Symmetry Breaking Beyond Lex-Leader [PDF] [Copy] [Kimi]

Authors: Markus Anders ; Sofia Brenner ; Gaurav Rattan

Symmetry breaking is a widely popular approach to enhance solvers in constraint programming, such as those for SAT or MIP. Symmetry breaking predicates (SBPs) typically impose an order on variables and single out the lexicographic leader (lex-leader) in each orbit of assignments. Although it is NP-hard to find complete lex-leader SBPs, incomplete lex-leader SBPs are widely used in practice. In this paper, we investigate the complexity of computing complete SBPs, lex-leader or otherwise, for SAT. Our main result proves a natural barrier for efficiently computing SBPs: efficient certification of graph non-isomorphism. Our results explain the difficulty of obtaining short SBPs for important CP problems, such as matrix-models with row-column symmetries and graph generation problems. Our results hold even when SBPs are allowed to introduce additional variables. We show polynomial upper bounds for breaking certain symmetry groups, namely automorphism groups of trees and wreath products of groups with efficient SBPs.

Subject: Artificial Intelligence

Publish: 2024-07-05 11:09:55 UTC

#4 A systematic review on expert systems for improving energy efficiency in the manufacturing industry [PDF] [Copy] [Kimi]

Authors: Borys Ioshchikhes ; Michael Frank ; Matthias Weigold

Against the backdrop of the European Union's commitment to achieve climate neutrality by 2050, efforts to improve energy efficiency are being intensified. The manufacturing industry is a key focal point of these endeavors due to its high final electrical energy demand, while simultaneously facing a growing shortage of skilled workers crucial for meeting established goals. Expert systems (ESs) offer the chance to overcome this challenge by automatically identifying potential energy efficiency improvements and thereby playing a significant role in reducing electricity consumption. This paper systematically reviews state-of-the-art approaches of ESs aimed at improving energy efficiency in industry, with a focus on manufacturing. The literature search yields 1692 results, of which 54 articles published between 1987 and 2023 are analyzed in depth. These publications are classified according to the system boundary, manufacturing type, application perspective, application purpose, ES type, and industry. Furthermore, we examine the structure, implementation, utilization, and development of ESs in this context. Through this analysis, the review reveals research gaps, pointing toward promising topics for future research.

Subject: Artificial Intelligence

Publish: 2024-07-05 09:28:31 UTC

#5 AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents [PDF¹] [Copy] [Kimi⁶]

Authors: Petr Anokhin ; Nikita Semenov ; Artyom Sorokin ; Dmitry Evseev ; Mikhail Burtsev ; Evgeny Burnaev

Advancements in generative AI have broadened the potential applications of Large Language Models (LLMs) in the development of autonomous agents. Achieving true autonomy requires accumulating and updating knowledge gained from interactions with the environment and effectively utilizing it. Current LLM-based approaches leverage past experiences using a full history of observations, summarization or retrieval augmentation. However, these unstructured memory representations do not facilitate the reasoning and planning essential for complex decision-making. In our study, we introduce AriGraph, a novel method wherein the agent constructs a memory graph that integrates semantic and episodic memories while exploring the environment. This graph structure facilitates efficient associative retrieval of interconnected concepts, relevant to the agent's current state and goals, thus serving as an effective environmental model that enhances the agent's exploratory and planning capabilities. We demonstrate that our Ariadne LLM agent, equipped with this proposed memory architecture augmented with planning and decision-making, effectively handles complex tasks on a zero-shot basis in the TextWorld environment. Our approach markedly outperforms established methods such as full-history, summarization, and Retrieval-Augmented Generation in various tasks, including the cooking challenge from the First TextWorld Problems competition and novel tasks like house cleaning and puzzle Treasure Hunting.

Subject: Artificial Intelligence

Publish: 2024-07-05 09:06:47 UTC

#6 Dance of the ADS: Orchestrating Failures through Historically-Informed Scenario Fuzzing [PDF] [Copy] [Kimi]

Authors: Tong Wang ; Taotao Gu ; Huan Deng ; Hu Li ; Xiaohui Kuang ; Gang Zhao

As autonomous driving systems (ADS) advance towards higher levels of autonomy, orchestrating their safety verification becomes increasingly intricate. This paper unveils ScenarioFuzz, a pioneering scenario-based fuzz testing methodology. Designed like a choreographer who understands the past performances, it uncovers vulnerabilities in ADS without the crutch of predefined scenarios. Leveraging map road networks, such as OPENDRIVE, we extract essential data to form a foundational scenario seed corpus. This corpus, enriched with pertinent information, provides the necessary boundaries for fuzz testing in the absence of starting scenarios. Our approach integrates specialized mutators and mutation techniques, combined with a graph neural network model, to predict and filter out high-risk scenario seeds, optimizing the fuzzing process using historical test data. Compared to other methods, our approach reduces the time cost by an average of 60.3%, while the number of error scenarios discovered per unit of time increases by 103%. Furthermore, we propose a self-supervised collision trajectory clustering method, which aids in identifying and summarizing 54 high-risk scenario categories prone to inducing ADS faults. Our experiments have successfully uncovered 58 bugs across six tested systems, emphasizing the critical safety concerns of ADS.

Subjects: Artificial Intelligence ; Neural and Evolutionary Computing ; Software Engineering

Publish: 2024-07-05 08:58:09 UTC

#7 Knowledge-based Drug Samples' Comparison [PDF] [Copy] [Kimi]

Authors: Sébastien Guillemin ; Ana Roxin ; Laurence Dujourdy ; Ludovic Journaux

Drug sample comparison is a process used by the French National police to identify drug distribution networks. The current approach is based on manual comparison done by forensic experts. In this article, we present our approach to acquire, formalise, and specify expert knowledge to improve the current process. For modelling the underlying knowledge we use an ontology coupled with logical rules. The different steps of our approach are designed to be reused in other application domains. The results obtained are explainable making them usable by experts in different fields.

Subject: Artificial Intelligence

Publish: 2024-07-05 07:40:25 UTC

#8 Autoverse: An Evolvable Game Langugage for Learning Robust Embodied Agents [PDF] [Copy] [Kimi¹]

Authors: Sam Earle ; Julian Togelius

We introduce Autoverse, an evolvable, domain-specific language for single-player 2D grid-based games, and demonstrate its use as a scalable training ground for Open-Ended Learning (OEL) algorithms. Autoverse uses cellular-automaton-like rewrite rules to describe game mechanics, allowing it to express various game environments (e.g. mazes, dungeons, sokoban puzzles) that are popular testbeds for Reinforcement Learning (RL) agents. Each rewrite rule can be expressed as a series of simple convolutions, allowing for environments to be parallelized on the GPU, thereby drastically accelerating RL training. Using Autoverse, we propose jump-starting open-ended learning by imitation learning from search. In such an approach, we first evolve Autoverse environments (their rules and initial map topology) to maximize the number of iterations required by greedy tree search to discover a new best solution, producing a curriculum of increasingly complex environments and playtraces. We then distill these expert playtraces into a neural-network-based policy using imitation learning. Finally, we use the learned policy as a starting point for open-ended RL, where new training environments are continually evolved to maximize the RL player agent's value function error (a proxy for its regret, or the learnability of generated environments), finding that this approach improves the performance and generality of resultant player agents.

Subject: Artificial Intelligence

Publish: 2024-07-05 02:18:02 UTC

#9 Smart Vision-Language Reasoners [PDF⁴] [Copy] [Kimi⁶]

Authors: Denisa Roberts ; Lucas Roberts

In this article, we investigate vision-language models (VLM) as reasoners. The ability to form abstractions underlies mathematical reasoning, problem-solving, and other Math AI tasks. Several formalisms have been given to these underlying abstractions and skills utilized by humans and intelligent systems for reasoning. Furthermore, human reasoning is inherently multimodal, and as such, we focus our investigations on multimodal AI. In this article, we employ the abstractions given in the SMART task (Simple Multimodal Algorithmic Reasoning Task) introduced in \cite{cherian2022deep} as meta-reasoning and problem-solving skills along eight axes: math, counting, path, measure, logic, spatial, and pattern. We investigate the ability of vision-language models to reason along these axes and seek avenues of improvement. Including composite representations with vision-language cross-attention enabled learning multimodal representations adaptively from fused frozen pretrained backbones for better visual grounding. Furthermore, proper hyperparameter and other training choices led to strong improvements (up to $48\%$ gain in accuracy) on the SMART task, further underscoring the power of deep multimodal learning. The smartest VLM, which includes a novel QF multimodal layer, improves upon the best previous baselines in every one of the eight fundamental reasoning skills. End-to-end code is available at https://github.com/smarter-vlm/smarter.

Subject: Artificial Intelligence

Publish: 2024-07-05 01:47:21 UTC

#10 Orchestrating LLMs with Different Personalizations [PDF³] [Copy] [Kimi⁶]

Authors: Jin Peng Zhou ; Katie Z Luo ; Jingwen Gu ; Jason Yuan ; Kilian Q. Weinberger ; Wen Sun

This paper presents a novel approach to aligning large language models (LLMs) with individual human preferences, sometimes referred to as Reinforcement Learning from \textit{Personalized} Human Feedback (RLPHF). Given stated preferences along multiple dimensions, such as helpfulness, conciseness, or humor, the goal is to create an LLM without re-training that best adheres to this specification. Starting from specialized expert LLMs, each trained for one such particular preference dimension, we propose a black-box method that merges their outputs on a per-token level. We train a lightweight Preference Control Model (PCM) that dynamically translates the preference description and current context into next-token prediction weights. By combining the expert models' outputs at the token level, our approach dynamically generates text that optimizes the given preference. Empirical tests show that our method matches or surpasses existing preference merging techniques, providing a scalable, efficient alternative to fine-tuning LLMs for individual personalization.

Subjects: Artificial Intelligence ; Computation and Language

Publish: 2024-07-04 22:55:02 UTC

#11 ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild [PDF²] [Copy] [Kimi²]

Authors: Ahmed Masry ; Megh Thakkar ; Aayush Bajaj ; Aaryaman Kartha ; Enamul Hoque ; Shafiq Joty

Given the ubiquity of charts as a data analysis, visualization, and decision-making tool across industries and sciences, there has been a growing interest in developing pre-trained foundation models as well as general purpose instruction-tuned models for chart understanding and reasoning. However, existing methods suffer crucial drawbacks across two critical axes affecting the performance of chart representation models: they are trained on data generated from underlying data tables of the charts, ignoring the visual trends and patterns in chart images, and use weakly aligned vision-language backbone models for domain-specific training, limiting their generalizability when encountering charts in the wild. We address these important drawbacks and introduce ChartGemma, a novel chart understanding and reasoning model developed over PaliGemma. Rather than relying on underlying data tables, ChartGemma is trained on instruction-tuning data generated directly from chart images, thus capturing both high-level trends and low-level visual information from a diverse set of charts. Our simple approach achieves state-of-the-art results across $5$ benchmarks spanning chart summarization, question answering, and fact-checking, and our elaborate qualitative studies on real-world charts show that ChartGemma generates more realistic and factually correct summaries compared to its contemporaries. We release the code, model checkpoints, dataset, and demos at https://github.com/vis-nlp/ChartGemma.

Subject: Artificial Intelligence

Publish: 2024-07-04 22:16:40 UTC

#12 MiniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis [PDF] [Copy] [Kimi¹]

Authors: Asma Alkhaldi ; Raneem Alnajim ; Layan Alabdullatef ; Rawan Alyahya ; Jun Chen ; Deyao Zhu ; Ahmed Alsinan ; Mohamed Elhoseiny

Recent advancements in artificial intelligence (AI) have precipitated significant breakthroughs in healthcare, particularly in refining diagnostic procedures. However, previous studies have often been constrained to limited functionalities. This study introduces MiniGPT-Med, a vision-language model derived from large-scale language models and tailored for medical applications. MiniGPT-Med demonstrates remarkable versatility across various imaging modalities, including X-rays, CT scans, and MRIs, enhancing its utility. The model is capable of performing tasks such as medical report generation, visual question answering (VQA), and disease identification within medical imagery. Its integrated processing of both image and textual clinical data markedly improves diagnostic accuracy. Our empirical assessments confirm MiniGPT-Med's superior performance in disease grounding, medical report generation, and VQA benchmarks, representing a significant step towards reducing the gap in assisting radiology practice. Furthermore, it achieves state-of-the-art performance on medical report generation, higher than the previous best model by 19\% accuracy. MiniGPT-Med promises to become a general interface for radiology diagnoses, enhancing diagnostic efficiency across a wide range of medical imaging applications.

Subjects: Artificial Intelligence ; Computation and Language ; Computer Vision and Pattern Recognition

Publish: 2024-07-04 18:21:10 UTC

#13 Craftium: An Extensible Framework for Creating Reinforcement Learning Environments [PDF] [Copy] [Kimi²]

Authors: Mikel Malagón ; Josu Ceberio ; Jose A. Lozano

Most Reinforcement Learning (RL) environments are created by adapting existing physics simulators or video games. However, they usually lack the flexibility required for analyzing specific characteristics of RL methods often relevant to research. This paper presents Craftium, a novel framework for exploring and creating rich 3D visual RL environments that builds upon the Minetest game engine and the popular Gymnasium API. Minetest is built to be extended and can be used to easily create voxel-based 3D environments (often similar to Minecraft), while Gymnasium offers a simple and common interface for RL research. Craftium provides a platform that allows practitioners to create fully customized environments to suit their specific research requirements, ranging from simple visual tasks to infinite and procedurally generated worlds. We also provide five ready-to-use environments for benchmarking and as examples of how to develop new ones. The code and documentation are available at https://github.com/mikelma/craftium/.

Subject: Artificial Intelligence

Publish: 2024-07-04 14:38:02 UTC

#14 Diverse and Fine-Grained Instruction-Following Ability Exploration with Synthetic Data [PDF⁴] [Copy] [Kimi⁴]

Authors: Zihui Gu ; Xingwu Sun ; Fengzong Lian ; Zhanhui Kang ; Cheng-Zhong Xu ; Ju Fan

Instruction-following is particularly crucial for large language models (LLMs) to support diverse user requests. While existing work has made progress in aligning LLMs with human preferences, evaluating their capabilities on instruction following remains a challenge due to complexity and diversity of real-world user instructions. While existing evaluation methods focus on general skills, they suffer from two main shortcomings, i.e., lack of fine-grained task-level evaluation and reliance on singular instruction expression. To address these problems, this paper introduces DINGO, a fine-grained and diverse instruction-following evaluation dataset that has two main advantages: (1) DINGO is based on a manual annotated, fine-grained and multi-level category tree with 130 nodes derived from real-world user requests; (2) DINGO includes diverse instructions, generated by both GPT-4 and human experts. Through extensive experiments, we demonstrate that DINGO can not only provide more challenging and comprehensive evaluation for LLMs, but also provide task-level fine-grained directions to further improve LLMs.

Subject: Artificial Intelligence

Publish: 2024-07-04 13:54:41 UTC

#15 Dancing to the State of the Art? How Candidate Lists Influence LKH for Solving the Traveling Salesperson Problem [PDF] [Copy] [Kimi¹]

Authors: Jonathan Heins ; Lennart Schäpermeier ; Pascal Kerschke ; Darrell Whitley

Solving the Traveling Salesperson Problem (TSP) remains a persistent challenge, despite its fundamental role in numerous generalized applications in modern contexts. Heuristic solvers address the demand for finding high-quality solutions efficiently. Among these solvers, the Lin-Kernighan-Helsgaun (LKH) heuristic stands out, as it complements the performance of genetic algorithms across a diverse range of problem instances. However, frequent timeouts on challenging instances hinder the practical applicability of the solver. Within this work, we investigate a previously overlooked factor contributing to many timeouts: The use of a fixed candidate set based on a tree structure. Our investigations reveal that candidate sets based on Hamiltonian circuits contain more optimal edges. We thus propose to integrate this promising initialization strategy, in the form of POPMUSIC, within an efficient restart version of LKH. As confirmed by our experimental studies, this refined TSP heuristic is much more efficient - causing fewer timeouts and improving the performance (in terms of penalized average runtime) by an order of magnitude - and thereby challenges the state of the art in TSP solving.

Subjects: Artificial Intelligence ; Neural and Evolutionary Computing

Publish: 2024-07-04 13:38:19 UTC

#16 MobileExperts: A Dynamic Tool-Enabled Agent Team in Mobile Devices [PDF⁴] [Copy] [Kimi²]

Authors: Jiayi Zhang ; Chuang Zhao ; Yihan Zhao ; Zhaoyang Yu ; Ming He ; Jianping Fan

The attainment of autonomous operations in mobile computing devices has consistently been a goal of human pursuit. With the development of Large Language Models (LLMs) and Visual Language Models (VLMs), this aspiration is progressively turning into reality. While contemporary research has explored automation of simple tasks on mobile devices via VLMs, there remains significant room for improvement in handling complex tasks and reducing high reasoning costs. In this paper, we introduce MobileExperts, which for the first time introduces tool formulation and multi-agent collaboration to address the aforementioned challenges. More specifically, MobileExperts dynamically assembles teams based on the alignment of agent portraits with the human requirements. Following this, each agent embarks on an independent exploration phase, formulating its tools to evolve into an expert. Lastly, we develop a dual-layer planning mechanism to establish coordinate collaboration among experts. To validate our effectiveness, we design a new benchmark of hierarchical intelligence levels, offering insights into algorithm's capability to address tasks across a spectrum of complexity. Experimental results demonstrate that MobileExperts performs better on all intelligence levels and achieves ~ 22% reduction in reasoning costs, thus verifying the superiority of our design.

Subject: Artificial Intelligence

Publish: 2024-07-04 13:12:19 UTC

#17 From Data to Commonsense Reasoning: The Use of Large Language Models for Explainable AI [PDF²] [Copy] [Kimi²]

Authors: Stefanie Krause ; Frieder Stolzenburg

Commonsense reasoning is a difficult task for a computer, but a critical skill for an artificial intelligence (AI). It can enhance the explainability of AI models by enabling them to provide intuitive and human-like explanations for their decisions. This is necessary in many areas especially in question answering (QA), which is one of the most important tasks of natural language processing (NLP). Over time, a multitude of methods have emerged for solving commonsense reasoning problems such as knowledge-based approaches using formal logic or linguistic analysis. In this paper, we investigate the effectiveness of large language models (LLMs) on different QA tasks with a focus on their abilities in reasoning and explainability. We study three LLMs: GPT-3.5, Gemma and Llama 3. We further evaluate the LLM results by means of a questionnaire. We demonstrate the ability of LLMs to reason with commonsense as the models outperform humans on different datasets. While GPT-3.5's accuracy ranges from 56% to 93% on various QA benchmarks, Llama 3 achieved a mean accuracy of 90% on all eleven datasets. Thereby Llama 3 is outperforming humans on all datasets with an average 21% higher accuracy over ten datasets. Furthermore, we can appraise that, in the sense of explainable artificial intelligence (XAI), GPT-3.5 provides good explanations for its decisions. Our questionnaire revealed that 66% of participants rated GPT-3.5's explanations as either "good" or "excellent". Taken together, these findings enrich our understanding of current LLMs and pave the way for future investigations of reasoning and explainability.

Subject: Artificial Intelligence

Publish: 2024-07-04 09:38:49 UTC

#18 Convolutional vs Large Language Models for Software Log Classification in Edge-Deployable Cellular Network Testing [PDF] [Copy] [Kimi]

Authors: Achintha Ihalage ; Sayed M. Taheri ; Faris Muhammad ; Hamed Al-Raweshidy

Software logs generated by sophisticated network emulators in the telecommunications industry, such as VIAVI TM500, are extremely complex, often comprising tens of thousands of text lines with minimal resemblance to natural language. Only specialised expert engineers can decipher such logs and troubleshoot defects in test runs. While AI offers a promising solution for automating defect triage, potentially leading to massive revenue savings for companies, state-of-the-art large language models (LLMs) suffer from significant drawbacks in this specialised domain. These include a constrained context window, limited applicability to text beyond natural language, and high inference costs. To address these limitations, we propose a compact convolutional neural network (CNN) architecture that offers a context window spanning up to 200,000 characters and achieves over 96% accuracy (F1>0.9) in classifying multifaceted software logs into various layers in the telecommunications protocol stack. Specifically, the proposed model is capable of identifying defects in test runs and triaging them to the relevant department, formerly a manual engineering process that required expert knowledge. We evaluate several LLMs; LLaMA2-7B, Mixtral 8x7B, Flan-T5, BERT and BigBird, and experimentally demonstrate their shortcomings in our specialized application. Despite being lightweight, our CNN significantly outperforms LLM-based approaches in telecommunications log classification while minimizing the cost of production. Our defect triaging AI model is deployable on edge devices without dedicated hardware and widely applicable across software logs in various industries.

Subjects: Artificial Intelligence ; Machine Learning ; Networking and Internet Architecture

Publish: 2024-07-04 09:12:08 UTC

#19 Neural Probabilistic Logic Learning for Knowledge Graph Reasoning [PDF¹] [Copy] [Kimi¹]

Authors: Fengsong Sun ; Jinyu Wang ; Zhiqing Wei ; Xianchao Zhang

Knowledge graph (KG) reasoning is a task that aims to predict unknown facts based on known factual samples. Reasoning methods can be divided into two categories: rule-based methods and KG-embedding based methods. The former possesses precise reasoning capabilities but finds it challenging to reason efficiently over large-scale knowledge graphs. While gaining the ability to reason over large-scale knowledge graphs, the latter sacrifices reasoning accuracy. This paper aims to design a reasoning framework called Neural Probabilistic Logic Learning(NPLL) that achieves accurate reasoning on knowledge graphs. Our approach introduces a scoring module that effectively enhances the expressive power of embedding networks, striking a balance between model simplicity and reasoning capabilities. We improve the interpretability of the model by incorporating a Markov Logic Network based on variational inference. We empirically evaluate our approach on several benchmark datasets, and the experimental results validate that our method substantially enhances the accuracy and quality of the reasoning results.

Subjects: Artificial Intelligence ; Machine Learning

Publish: 2024-07-04 07:45:46 UTC

#20 Over the Edge of Chaos? Excess Complexity as a Roadblock to Artificial General Intelligence [PDF] [Copy] [Kimi²]

Authors: Teo Susnjak ; Timothy R. McIntosh ; Andre L. C. Barczak ; Napoleon H. Reyes ; Tong Liu ; Paul Watters ; Malka N. Halgamuge

In this study, we explored the progression trajectories of artificial intelligence (AI) systems through the lens of complexity theory. We challenged the conventional linear and exponential projections of AI advancement toward Artificial General Intelligence (AGI) underpinned by transformer-based architectures, and posited the existence of critical points, akin to phase transitions in complex systems, where AI performance might plateau or regress into instability upon exceeding a critical complexity threshold. We employed agent-based modelling (ABM) to simulate hypothetical scenarios of AI systems' evolution under specific assumptions, using benchmark performance as a proxy for capability and complexity. Our simulations demonstrated how increasing the complexity of the AI system could exceed an upper criticality threshold, leading to unpredictable performance behaviours. Additionally, we developed a practical methodology for detecting these critical thresholds using simulation data and stochastic gradient descent to fine-tune detection thresholds. This research offers a novel perspective on AI advancement that has a particular relevance to Large Language Models (LLMs), emphasising the need for a tempered approach to extrapolating AI's growth potential and underscoring the importance of developing more robust and comprehensive AI performance benchmarks.

Subjects: Artificial Intelligence ; Computational Complexity

Publish: 2024-07-04 05:46:39 UTC

#21 AgentInstruct: Toward Generative Teaching with Agentic Flows [PDF¹] [Copy] [Kimi³]

Authors: Arindam Mitra ; Luciano Del Corro ; Guoqing Zheng ; Shweti Mahajan ; Dany Rouhana ; Andres Codas ; Yadong Lu ; Wei-ge Chen ; Olga Vrousgos ; Corby Rosset ; Fillipe Silva ; Hamed Khanpour ; Yash Lara ; Ahmed Awadallah

Synthetic data is becoming increasingly important for accelerating the development of language models, both large and small. Despite several successful use cases, researchers also raised concerns around model collapse and drawbacks of imitating other models. This discrepancy can be attributed to the fact that synthetic data varies in quality and diversity. Effective use of synthetic data usually requires significant human effort in curating the data. We focus on using synthetic data for post-training, specifically creating data by powerful models to teach a new skill or behavior to another model, we refer to this setting as Generative Teaching. We introduce AgentInstruct, an extensible agentic framework for automatically creating large amounts of diverse and high-quality synthetic data. AgentInstruct can create both the prompts and responses, using only raw data sources like text documents and code files as seeds. We demonstrate the utility of AgentInstruct by creating a post training dataset of 25M pairs to teach language models different skills, such as text editing, creative writing, tool usage, coding, reading comprehension, etc. The dataset can be used for instruction tuning of any base model. We post-train Mistral-7b with the data. When comparing the resulting model Orca-3 to Mistral-7b-Instruct (which uses the same base model), we observe significant improvements across many benchmarks. For example, 40% improvement on AGIEval, 19% improvement on MMLU, 54% improvement on GSM8K, 38% improvement on BBH and 45% improvement on AlpacaEval. Additionally, it consistently outperforms other models such as LLAMA-8B-instruct and GPT-3.5-turbo.

Subjects: Artificial Intelligence ; Machine Learning

Publish: 2024-07-03 21:01:12 UTC

#22 An Outline of Prognostics and Health Management Large Model: Concepts, Paradigms, and Challenges [PDF] [Copy] [Kimi]

Authors: Laifa Tao ; Shangyu Li ; Haifei Liu ; Qixuan Huang ; Liang Ma ; Guoao Ning ; Yiling Chen ; Yunlong Wu ; Bin Li ; Weiwei Zhang ; Zhengduo Zhao ; Wenchao Zhan ; Wenyan Cao ; Chao Wang ; Hongmei Liu ; Jian Ma ; Mingliang Suo ; Yujie Cheng ; Yu Ding ; Dengwei Song ; Chen Lu

Prognosis and Health Management (PHM), critical for ensuring task completion by complex systems and preventing unexpected failures, is widely adopted in aerospace, manufacturing, maritime, rail, energy, etc. However, PHM's development is constrained by bottlenecks like generalization, interpretation and verification abilities. Presently, generative artificial intelligence (AI), represented by Large Model, heralds a technological revolution with the potential to fundamentally reshape traditional technological fields and human production methods. Its capabilities, including strong generalization, reasoning, and generative attributes, present opportunities to address PHM's bottlenecks. To this end, based on a systematic analysis of the current challenges and bottlenecks in PHM, as well as the research status and advantages of Large Model, we propose a novel concept and three progressive paradigms of Prognosis and Health Management Large Model (PHM-LM) through the integration of the Large Model with PHM. Subsequently, we provide feasible technical approaches for PHM-LM to bolster PHM's core capabilities within the framework of the three paradigms. Moreover, to address core issues confronting PHM, we discuss a series of technical challenges of PHM-LM throughout the entire process of construction and application. This comprehensive effort offers a holistic PHM-LM technical framework, and provides avenues for new PHM technologies, methodologies, tools, platforms and applications, which also potentially innovates design, research & development, verification and application mode of PHM. And furthermore, a new generation of PHM with AI will also capably be realized, i.e., from custom to generalized, from discriminative to generative, and from theoretical conditions to practical applications.

Subjects: Artificial Intelligence ; Software Engineering ; Systems and Control ; Signal Processing ; Systems and Control

Publish: 2024-07-01 09:37:00 UTC

#23 ML Updates for OpenStreetMap: Analysis of Research Gaps and Future Directions [PDF] [Copy] [Kimi]

Authors: Lasith Niroshan ; James D. Carswell

Maintaining accurate, up-to-date maps is important in any dynamic urban landscape, supporting various aspects of modern society, such as urban planning, navigation, and emergency response. However, traditional (i.e. largely manual) map production and crowdsourced mapping methods still struggle to keep pace with rapid changes in the built environment. Such manual mapping workflows are time-consuming and prone to human errors, leading to early obsolescence and/or the need for extensive auditing. The current map updating process in OpenStreetMap provides an example of this limitation, relying on numerous manual steps in its online map updating workflow. To address this, there is a need to explore automating the entire end-to-end map up-dating process. Tech giants such as Google and Microsoft have already started investigating Machine Learning (ML) techniques to tackle this contemporary mapping problem. This paper offers an analysis of these ML approaches, focusing on their application to updating Open-StreetMap in particular. By analysing the current state-of-the-art in this field, this study identi-fies some key research gaps and introduces DeepMapper as a practical solution for advancing the automatic online map updating process in the future.

Subjects: Artificial Intelligence ; Computers and Society ; Machine Learning

Publish: 2024-06-28 23:51:04 UTC

#24 A Multi-Modal Explainability Approach for Human-Aware Robots in Multi-Party Conversation [PDF¹] [Copy] [Kimi]

Authors: Iveta Bečková ; Štefan Pócoš ; Giulia Belgiovine ; Marco Matarese ; Alessandra Sciutti ; Carlo Mazzola

The addressee estimation (understanding to whom somebody is talking) is a fundamental task for human activity recognition in multi-party conversation scenarios. Specifically, in the field of human-robot interaction, it becomes even more crucial to enable social robots to participate in such interactive contexts. However, it is usually implemented as a binary classification task, restricting the robot's capability to estimate whether it was addressed and limiting its interactive skills. For a social robot to gain the trust of humans, it is also important to manifest a certain level of transparency and explainability. Explainable artificial intelligence thus plays a significant role in the current machine learning applications and models, to provide explanations for their decisions besides excellent performance. In our work, we a) present an addressee estimation model with improved performance in comparison with the previous SOTA; b) further modify this model to include inherently explainable attention-based segments; c) implement the explainable addressee estimation as part of a modular cognitive architecture for multi-party conversation in an iCub robot; d) propose several ways to incorporate explainability and transparency in the aforementioned architecture; and e) perform a pilot user study to analyze the effect of various explanations on how human participants perceive the robot.

Subjects: Artificial Intelligence ; Computation and Language ; Human-Computer Interaction ; Machine Learning ; Robotics ; Image and Video Processing

Publish: 2024-05-20 13:09:32 UTC

#25 LaRa: Efficient Large-Baseline Radiance Fields [PDF¹⁴] [Copy] [Kimi⁹]

Authors: Anpei Chen ; Haofei Xu ; Stefano Esposito ; Siyu Tang ; Andreas Geiger

Radiance field methods have achieved photorealistic novel view synthesis and geometry reconstruction. But they are mostly applied in per-scene optimization or small-baseline settings. While several recent works investigate feed-forward reconstruction with large baselines by utilizing transformers, they all operate with a standard global attention mechanism and hence ignore the local nature of 3D reconstruction. We propose a method that unifies local and global reasoning in transformer layers, resulting in improved quality and faster convergence. Our model represents scenes as Gaussian Volumes and combines this with an image encoder and Group Attention Layers for efficient feed-forward reconstruction. Experimental results demonstrate that our model, trained for two days on four GPUs, demonstrates high fidelity in reconstructing 360&deg radiance fields, and robustness to zero-shot and out-of-domain testing.

Subjects: Computer Vision and Pattern Recognition ; Artificial Intelligence

Publish: 2024-07-05 17:59:58 UTC