| Total: 39
Decoding visual information from human brain activity has seen remarkable advancements in recent research. However, the diversity in cortical parcellation and fMRI patterns across individuals has prompted the development of deep learning models tailored to each subject. The personalization limits the broader applicability of brain visual decoding in real-world scenarios. To address this issue, we introduce Wills Aligner, a novel approach designed to achieve multi-subject collaborative brain visual decoding. Wills Aligner begins by aligning the fMRI data from different subjects at the anatomical level. It then employs delicate mixture-of-brain-expert adapters and a meta-learning strategy to account for individual fMRI pattern differences. Additionally, Wills Aligner leverages the semantic relation of visual stimuli to guide the learning of inter-subject commonality, enabling visual decoding for each subject to draw insights from other subjects' data. We rigorously evaluate our Wills Aligner across various visual decoding tasks, including classification, cross-modal retrieval, and image reconstruction. The experimental results demonstrate that Wills Aligner achieves promising performance.
Individual human decision-makers may benefit from different forms of support to improve decision outcomes, but when will each form of support yield better outcomes? In this work, we posit that personalizing access to decision support tools can be an effective mechanism for instantiating the appropriate use of AI assistance. Specifically, we propose the general problem of learning a decision support policy that, for a given input, chooses which form of support to provide to decision-makers for whom we initially have no prior information. We develop Modiste, an interactive tool to learn personalized decision support policies. Modiste leverages stochastic contextual bandit techniques to personalize a decision support policy for each decision-maker. In our computational experiments, we characterize the expertise profiles of decision-makers for whom personalized policies will outperform offline policies, including population-wide baselines. Our experiments include realistic forms of support (e.g., expert consensus and predictions from a large language model) on vision and language tasks. Our human subject experiments add nuance to and bolster our computational experiments, demonstrating the practical utility of personalized policies when real users benefit from accessing support across tasks.
Multiphysics simulation aims to predict and understand interactions between multiple physical phenomena, aiding in comprehending natural processes and guiding engineering design. The system of Partial Differential Equations (PDEs) is crucial for representing these physical fields, and solving these PDEs is fundamental to such simulations. However, current methods primarily yield numerical outputs, limiting interpretability and generalizability. We introduce T-NNGP, a hybrid genetic programming algorithm that integrates traditional numerical methods with deep learning to derive approximate symbolic expressions for multiple unknown functions within a system of PDEs. T-NNGP initially obtains numerical solutions using traditional methods, then generates candidate symbolic expressions via deep reinforcement learning, and finally optimizes these expressions using genetic programming. Furthermore, a universal decoupling strategy guides the search direction and addresses coupling problems, thereby accelerating the search process. Experimental results on three types of PDEs demonstrate that our method can reliably obtain human-understandable symbolic expressions that fit both the PDEs and the numerical solutions from traditional methods. This work advances multiphysics simulation by enhancing our ability to derive approximate symbolic solutions for PDEs, thereby improving our understanding of complex physical phenomena.
Deep learning models have recently shown great success in classifying epileptic patients using EEG recordings. Unfortunately, classification-based methods lack a sound mechanism to detect the onset of seizure events. In this work, we propose a two-stage framework, SODor, that explicitly models seizure onset through a novel task formulation of subsequence clustering. Given an EEG sequence, the framework first learns a set of second-level embeddings with label supervision. It then employs model-based clustering to explicitly capture long-term temporal dependencies in EEG sequences and identify meaningful subsequences. Epochs within a subsequence share a common cluster assignment (normal or seizure), with cluster or state transitions representing successful onset detections. Extensive experiments on three datasets demonstrate that our method can correct misclassifications, achieving 5%-11% classification improvements over other baselines and accurately detecting seizure onsets.
Ensuring annotator quality in training and evaluation data is a key piece of machine learning in NLP. Tasks such as sentiment analysis and offensive speech detection are intrinsically subjective, creating a challenging scenario for traditional quality assessment approaches because it is hard to distinguish disagreement due to poor work from that due to differences of opinions between sincere annotators. With the goal of increasing diverse perspectives in annotation while ensuring consistency, we propose ARTICLE, an in-context learning (ICL) framework to estimate annotation quality through self-consistency. We evaluate this framework on two offensive speech datasets using multiple LLMs and compare its performance with traditional methods. Our findings indicate that ARTICLE can be used as a robust method for identifying reliable annotators, hence improving data quality.
Human-AI collaboration has the potential to transform various domains by leveraging the complementary strengths of human experts and Artificial Intelligence (AI) systems. However, unobserved confounding can undermine the effectiveness of this collaboration, leading to biased and unreliable outcomes. In this paper, we propose a novel solution to address unobserved confounding in human-AI collaboration by employing the marginal sensitivity model (MSM). Our approach combines domain expertise with AI-driven statistical modeling to account for potential confounders that may otherwise remain hidden. We present a deferral collaboration framework for incorporating the MSM into policy learning from observational data, enabling the system to control for the influence of unobserved confounding factors. In addition, we propose a personalized deferral collaboration system to leverage the diverse expertise of different human decision-makers. By adjusting for potential biases, our proposed solution enhances the robustness and reliability of collaborative outcomes. The empirical and theoretical analyses demonstrate the efficacy of our approach in mitigating unobserved confounding and improving the overall performance of human-AI collaborations.
Decoding natural visual scenes from brain activity has flourished, with extensive research in single-subject tasks and, however, less in cross-subject tasks. Reconstructing high-quality images in cross-subject tasks is a challenging problem due to profound individual differences between subjects and the scarcity of data annotation. In this work, we proposed MindTuner for cross-subject visual decoding, which achieves high-quality and rich semantic reconstructions using only 1 hour of fMRI training data benefiting from the phenomena of visual fingerprint in the human visual system and a novel fMRI-to-text alignment paradigm. Firstly, we pre-train a multi-subject model among 7 subjects and fine-tune it with scarce data on new subjects, where LoRAs with Skip-LoRAs are utilized to learn the visual fingerprint. Then, we take the image modality as the intermediate pivot modality to achieve fMRI-to-text alignment, which achieves impressive fMRI-to-text retrieval performance and corrects fMRI-to-image reconstruction with fine-tuned semantics. The results of both qualitative and quantitative analyses demonstrate that MindTuner surpasses state-of-the-art cross-subject visual decoding models on the Natural Scenes Dataset (NSD), whether using training data of 1 hour or 40 hours.
Existing learning-from-crowds methods aim to design proper aggregation strategies to infer the unknown true labels from noisy labels provided by crowdsourcing. They treat the ground truth as hidden variables and use statistical or deep learning based worker behavior models to infer the ground truth. However, worker behavior models that rely on ground truth hidden variables overlook workers' behavior at the item feature level, leading to imprecise characterizations and negatively impacting the quality of learning-from-crowds. This paper proposes a new paradigm of multi-task supervised learning-from-crowds, which eliminates the need for modeling of items's ground truth in worker behavior models. Within this paradigm, we propose a worker behavior model at the item feature level called Mixture of Experts based Multi-task Supervised Learning-from-Crowds (MMLC), then, two aggregation strategies are proposed within MMLC. The first strategy, named MMLC-owf, utilizes clustering methods in the worker spectral space to identify the projection vector of the oracle worker. Subsequently, the labels generated based on this vector are regarded as the items's ground truth The second strategy, called MMLC-df, employs the MMLC model to fill the crowdsourced data, which can enhance the effectiveness of existing aggregation strategies . Experimental results demonstrate that MMLC-owf outperforms state-of-the-art methods and MMLC-df enhances the quality of existing learning-from-crowds methods.
Knowledge tracing (KT) involves using the historical records of student-learning interactions to anticipate their performance on forthcoming questions. Central to this process is the modeling of human cognition to gain deeper insights into how knowledge is acquired and retained. Human cognition is characterized by two key features: long-term cognitive trends, reflecting the gradual accumulation and stabilization of knowledge over time, and short-term cognitive fluctuations, which arise from transient factors such as forgetting or momentary lapses in attention. Although existing attention-based KT models effectively capture long-term cognitive trends, they often fail to adequately address short-term cognitive fluctuations. These limitations lead to overly smoothed cognitive features and reduced model performance, especially when the test data length exceeds the training data length. To address these problems, we propose FlucKT, a novel short-term cognitive fluctuations enhanced attention network for KT tasks. FlucKT improves the attention mechanism in two ways: First, by using a decomposition-based layer with causal convolution to separate and dynamically reweight long-term and short-term cognitive features. Second, by introducing a kernelized bias attention score penalty to enhance focus on short-term fluctuations, improving length generalization capabilities. Our contributions are validated through extensive experiments on three real-world datasets, demonstrating significant improvements in length generalization and prediction performance.
Deep learning models have demonstrated exceptional performance in a variety of real-world applications. These successes are often attributed to strong base models that can generalize to novel tasks with limited supporting data while keeping prior knowledge intact. However, these impressive results are based on the availability of a large amount of high-quality data, which is often lacking in specialized biomedical applications. In such fields, models are often developed with limited data that arrive incrementally with novel categories. This requires the model to adapt to new information while preserving existing knowledge. Few-Shot Class-Incremental Learning (FSCIL) methods offer a promising approach to addressing these challenges, but they also depend on strong base models that face the same aforementioned limitations. To overcome these constraints, we propose AnchorInv following the straightforward and efficient buffer-replay strategy. Instead of selecting and storing raw data, AnchorInv generates synthetic samples guided by anchor points in the feature space. This approach protects privacy and regularizes the model for adaptation. When evaluated on three public physiological time series datasets, AnchorInv exhibits efficient knowledge forgetting prevention and improved adaptation to novel classes, surpassing state-of-the-art baselines.
Quality control is a crucial issue of label data collection by crowdsourcing. Typically, aggregation methods to redundant crowd labels are proposed for estimating high-quality labels from noisy crowd labels. Most of the existing works concentrate on the label aggregation for Single Crowd Tasks (SCTs) which have a single object set with homogeneous question types. However, it is useful for a requester to combine multiple relevant but different crowd tasks into a Composite Crowd Task (CCT) which have heterogeneous question types and (or) multiple object sets for diverse purposes. Instead of the label aggregation on each crowd task respectively, label aggregation methods by bridging multiple SCTs in CCTs can potentially improve the label quality of all tasks. In this paper, we propose a general label aggregation approach for such CCTs by worker ability constraint satisfaction and relaxed optimization. We collected real crowd datasets of CCTs with diverse task settings based on heterogeneous question types, including categorization, pairwise preference comparisons, and pairwise similarity comparisons. The results demonstrate that our approach can effectively bridge the worker information of CCTs to improve the quality of aggregated labels and outperforms the baselines proposed for SCTs.
WiFi-based human activity recognition (HAR) holds significant application potential across various fields. To handle dynamic environments where new activities are continuously introduced, WiFi-based HAR systems must adapt by learning new concepts without forgetting previously learned ones. Furthermore, retaining knowledge from old activities by storing historical exemplar is impractical for WiFi-based HAR due to privacy concerns and limited storage capacity of edge devices. In this work, we propose ConSense, a lightweight and fast-adapted exemplar-free class incremental learning framework for WiFi-based HAR. The framework leverages the transformer architecture and involves dynamic model expansion and selective retraining to preserve previously learned knowledge while integrating new information. Specifically, during incremental sessions, small-scale trainable parameters that are trained specifically on the data of each task are added in the multi-head self-attention layer. In addition, a selective retraining strategy that dynamically adjusts the weights in multilayer perceptron based on the performance stability of neurons across tasks is used. Rather than training the entire model, the proposed strategies of dynamic model expansion and selective retraining reduce the overall computational load while balancing stability on previous tasks and plasticity on new tasks. Evaluation results on three public WiFi datasets demonstrate that ConSense not only outperforms several competitive approaches but also requires fewer parameters, highlighting its practical utility in class-incremental scenarios for HAR.
Dependency-aware spatial crowdsourcing (DASC) addresses the unique challenges posed by subtask dependencies in spatial task assignment. This paper investigates the task assignment problem in DASC and proposes a two-stage Recommend and Match Optimization (RMO) framework, leveraging multi-agent reinforcement learning for subtask recommendation and a multi-dimensional utility function for subtask matching. The RMO framework primarily addresses two key challenges: credit assignment for subtasks with interdependencies and maintaining overall coherence between subtask recommendation and matching. Specifically, we employ meta-gradients to construct auxiliary policies and establish a gradient connection between two stages, which can effectively address credit assignment and joint optimization of subtask recommendation and matching, while concurrently accelerating network training. We further establish a unified gradient descent process through gradient synchronization across recommendation networks, auxiliary policies, and the matching utility evaluation function. Experiments on two real-world datasets validate the effectiveness and feasibility of our proposed approach.
This paper considers the challenging problem of 3D Human Pose Estimation (HPE) from a sparse set of Inertial Measurement Units (IMUs). Existing efforts typically reconstruct a pose sequence by either directly tackling whole-body motions or focusing on distinctive spatio-temporal features of local body parts. Unfortunately, these methods ignore existing interdependent motor synergies amongst body parts, which may lead to pose estimation with ambiguous local parts. This observation motivates us to propose a hierarchical learning-based approach, HiPoser, which utilizes a hierarchical shared structure using Mamba blocks as the backbone to focus on the following estimation tasks, involving: 1) torso pose, 2) lower limbs pose, 3) upper limbs pose, and finally 4) global translation. These tasks selectively incorporate body motion states and are to be carried out sequentially in reconstructing part-based poses, which are amalgamated to estimate the final full-body pose with the global translation that satisfies inter-part consistencies. Our hierarchical structure allows HiPoser the flexibility in prioritizing different aspects of pose estimation, to emphasize more on detail or stability. Empirical evaluations over three benchmark datasets demonstrate the superiority of HiPoser over existing state-of-the-art models, suggesting that analyzing the synergistic movement of body parts is indeed important for advancing IMU-based 3D HPE.
Artificial intelligence (AI) models for computer vision trained with supervised machine learning are assumed to solve classification tasks by imitating human behavior learned from training labels. Most efforts in recent vision research focus on measuring the model task performance using standardized benchmarks such as accuracy. However limited work has sought to understand the perceptual difference between humans and machines. To fill this gap, this study first analyzes the statistical distributions of mistakes from the two sources, and then explores how task difficulty level affects these distributions. We find that even when AI learns an excellent model from the training data, one that outperforms humans in overall accuracy, these AI models have significant and consistent differences from human perception. We demonstrate the importance of studying these differences with a simple human-AI teaming algorithm that outperforms humans alone, AI alone, or AI-AI teaming.
As AI chatbots increasingly incorporate empathy, understanding user-centered perceptions of chatbot empathy and its impact on conversation quality remains essential yet under-explored. This study examines how chatbot identity and perceived empathy influence users' overall conversation experience. Analyzing 155 conversations from two datasets, we found that while GPT-based chatbots were rated significantly higher in conversational quality, they were consistently perceived as less empathetic than human conversational partners. Empathy ratings from GPT-4o annotations aligned with user ratings, reinforcing the perception of lower empathy in chatbots compared to humans. Our findings underscore the critical role of perceived empathy in shaping conversation quality, revealing that achieving high-quality human-AI interactions requires more than simply embedding empathetic language; it necessitates addressing the nuanced ways users interpret and experience empathy in conversations with chatbots.
Even though data annotation is extremely important for interpretability, research, and development of artificial intelligence solutions, annotating data remains costly. Research efforts such as active learning or few-shot learning alleviate the cost by increasing sample efficiency, yet the problem of annotating data more quickly has received comparatively little attention. Leveraging a predictor has been shown to reduce annotation cost in practice but has not been theoretically considered. We ask the following question: to annotate a binary classification dataset with N samples, can the annotator answer less than N yes/no questions? Framing this question-and-answer (Q&A) game as an optimal encoding problem, we find a positive answer given by the Huffman encoding of the possible labelings. Unfortunately, the algorithm is computationally intractable even for small dataset sizes. As a practical method, we propose to minimize a cost function a few steps ahead, similarly to lookahead minimization in optimal control. This solution is analyzed, compared with the optimal one, and evaluated using several synthetic and real-world datasets. The method allows a significant improvement (23-86%) in the annotation efficiency of real-world datasets.
Procedural Content Generation via Machine Learning (PCGML) has enhanced game content creation, yet challenges in controllability and limited training data persist. This study addresses these issues by distilling a constructive PCG algorithm into a controllable PCGML model. We first generate a large amount of content with a constructive algorithm and label it using a Large Language Model (LLM). We use these synthetic labels to condition two PCGML models for content-specific generation, a diffusion model and the five-dollar model. This neural network distillation process ensures that the generation aligns with the original algorithm while introducing controllability through plain text. We define this text-conditioned PCGML as a Text-to-game-Map (T2M) task, offering an alternative to prevalent text-to-image multi-modal tasks. We compare our distilled models with the baseline constructive algorithm. Our analysis of the variety, accuracy, and quality of our generation demonstrates the efficacy of distilling constructive methods into controllable text-conditioned PCGML models.
The early diagnosis of Parkinson’s disease (PD) is crucial for potential patients to receive timely treatment and prevent disease progression. Recent studies have shown that PD is closely linked to impairments in facial muscle control, resulting in characteristic “masked face” symptoms. This discovery offers a novel perspective for PD diagnosis by leveraging facial expression recognition and analysis techniques to capture and quantify these features, thereby distinguishing between PD patients and non-PD individuals based on their facial expressions. However, concerns about data privacy and legal restrictions have led to significant “data silos”, posing challenges to data sharing and limiting the accuracy and generalization of existing diagnostic models due to small, localized datasets. To address this issue, we propose an innovative adaptive federated learning approach that aims to jointly analyze facial expression data from multiple medical institutions while preserving data privacy. Our proposed approach comprehensively evaluates each client's contributions in terms of gradient, data, and learning efficiency, overcoming the non-IID issues caused by varying data sizes or heterogeneity across clients. To demonstrate the real-world impact of our approach, we collected a new facial expression dataset of PD patients in collaboration with a hospital. Extensive experiments validate the effectiveness of our proposed method for PD diagnosis and facial expression recognition, offering a promising avenue for rapid, non-invasive initial screening and advancing healthcare intelligence.
In this paper, we offer a learning framework in which the agent's knowledge gaps are overcome through corrective feedback from a teacher whenever the agent explains its (incorrect) predictions. We test it in a low-resource visual processing scenario, in which the agent must learn to recognize distinct types of toy truck. The agent starts the learning process with no ontology about what types of truck exist nor which parts they have, and a deficient model for recognizing those parts from visual input. The teacher's feedback to the agent's explanations addresses its lack of relevant knowledge in the ontology via a generic rule (e.g., "dump trucks have dumpers"), whereas an inaccurate part recognition is corrected by a deictic statement (e.g., "this is not a dumper"). The learner utilizes this feedback not only to improve its estimate of the hypothesis space of possible domain ontologies and probability distributions over them but also to use those estimates to update its visual interpretation of the scene. Our experiments demonstrate that teacher-learner pairs utilizing explanations and corrections are more data-efficient than those without such a faculty.
The gold standard in human-AI collaboration is complementarity: when combined performance exceeds both the human and algorithm alone. We investigate this challenge in binary classification settings where the goal is to maximize 0-1 accuracy. Given two or more agents who can make calibrated probabilistic predictions, we show a "No Free Lunch"-style result. Any deterministic collaboration strategy (a function mapping calibrated probabilities into binary classifications) that does not essentially always defer to the same agent will sometimes perform worse than the least accurate agent. In other words, complementarity cannot be achieved "for free." The result does suggest one model of collaboration with guarantees, where one agent identifies "obvious" errors of the other agent. We also use the result to understand the necessary conditions enabling the success of other collaboration techniques, providing guidance to human-AI collaboration.
Social media platforms like X(Twitter) and Reddit are vital to global communication. However, advancements in Large Language Model (LLM) technology give rise to social media bots with unprecedented intelligence. These bots adeptly simulate human profiles, conversations, and interactions, disseminating large amounts of false information and posing significant challenges to platform regulation. To better understand and counter these threats, we innovatively design BotSim, a malicious social botnet simulation powered by LLM. BotSim mimics the information dissemination patterns of real-world social networks, creating a virtual environment composed of intelligent agent bots and real human users. In the temporal simulation constructed by BotSim, these advanced agent bots autonomously engage in social interactions such as posting and commenting, effectively modeling scenarios of information flow and user interaction. Building on the BotSim framework, we construct a highly human-like, LLM-driven bot dataset called BotSim-24 and benchmark multiple bot detection strategies against it. The experimental results indicate that detection methods effective on traditional bot datasets perform worse on BotSim-24, highlighting the urgent need for new detection strategies to address the cybersecurity threats posed by these advanced bots.
Diffusion models have achieved remarkable success in sequential decision-making by leveraging the highly expressive model capabilities in policy learning. A central problem for learning diffusion policies is to align the policy output with human intents in various tasks. To achieve this, previous methods conduct return-conditioned policy generation or Reinforcement Learning (RL)-based policy optimization, while they both rely on pre-defined reward functions. In this work, we propose a novel framework, Forward KL regularized Preference optimization for aligning Diffusion policies, to align the diffusion policy with preferences directly. We first train a diffusion policy from the offline dataset without considering the preference, and then align the policy to the preference data via direct preference optimization. During the alignment phase, we formulate direct preference learning in a diffusion policy, where the forward KL regularization is employed in preference optimization to avoid generating out-of-distribution actions. We conduct extensive experiments for MetaWorld manipulation and D4RL tasks. The results show our method exhibits superior alignment with preferences and outperforms previous state-of-the-art algorithms.
Neural decoding, which transforms neural signals into motor commands, plays a key role in brain-computer interfaces (BCIs). Existing neural decoding approaches mainly rely on the assumption of independent noises, which could perform poorly in case the assumption is invalid. However, correlations in noises have been commonly observed in neural signals. Specifically, noise in different neural channels can be similar or highly related, which could degrade the performance of those neural decoders. To tackle this problem, we propose the DeCorrNet, which explicitly removes noise correlation in neural decoding. DeCorrNet could incorporate diverse neural decoders as an ensemble module to enhance the neural decoding performance. Experiments with benchmark BCI datasets demonstrated the superiority of DeCorrNet and achieved state-of-the-art results.
Explainable AI is increasingly employing argumentation methods to facilitate interactive explanations between AI agents and human users. While existing approaches typically rely on predetermined human user models, there remains a critical gap in dynamically learning and updating these models during interactions. In this paper, we present a framework that enables AI agents to adapt their understanding of human users through argumentation-based dialogues. Our approach, called Persona, draws on prospect theory and integrates a probability weighting function with a Bayesian belief update mechanism that refines a probability distribution over possible human models based on exchanged arguments. Through empirical evaluations with human users in an applied argumentation setting, we demonstrate that Persona effectively captures evolving human beliefs, facilitates personalized interactions, and outperforms state-of-the-art methods.