IJCAI.2024 - Machine Learning | Cool Papers

#1 Fine-tuning Pre-trained Models for Robustness under Noisy Labels [PDF¹] [Copy] [Kimi²] [REL]

Authors: Sumyeong Ahn ; Sihyeon Kim ; Jongwoo Ko ; Se-Young Yun

The presence of noisy labels in a training dataset can significantly impact the performance of machine learning models. In response to this issue, researchers have focused on identifying clean samples and reducing the influence of noisy labels. Recent works in this field have achieved notable success in terms of generalizability, albeit at the expense of extensive computing resources. Therefore, reducing computational costs remains a crucial challenge. Concurrently, in other research areas, there has been a focus on developing fine-tuning techniques to efficiently achieve high generalization performance. Despite their proven efficiently achievable generalization capabilities, these techniques have seen limited exploration from a label noise point of view. In this research, we aim to find an effective approach to fine-tune pre-trained models for noisy labeled datasets. To achieve this goal, we empirically investigate the characteristics of pre-trained models on noisy labels and propose an algorithm, named TURN. We present the results of extensive testing and demonstrate both efficient and improved denoising performance on various benchmarks, surpassing previous methods.

#2 Contract Scheduling with Distributional and Multiple Advice [PDF] [Copy] [Kimi] [REL]

Authors: Spyros Angelopoulos ; Marcin Bienkowski ; Christoph Dürr ; Bertrand Simon

Contract scheduling is a widely studied framework for designing real-time systems with interruptible capabilities. Previous work has showed that a prediction on the interruption time can help improve the performance of contract-based systems, however it has relied on a single prediction that is provided by a deterministic oracle. In this work, we introduce and study more general and realistic learning-augmented settings in which the prediction is in the form of a probability distribution, or it is given as a set of multiple possible interruption times. For both prediction settings, we design and analyze schedules which perform optimally if the prediction is accurate, while simultaneously guaranteeing the best worst-case performance if the prediction is adversarial. We also provide evidence that the resulting system is robust to prediction errors in the distributional setting. Last, we present an experimental evaluation that confirms the theoretical findings, and illustrates the performance improvements that can be attained in practice.

#3 Contrastive Learning Is Not Optimal for Quasiperiodic Time Series [PDF] [Copy] [Kimi] [REL]

Authors: Adrian Atienza ; Jakob Bardram ; Sadasivan Puthusserypady

Despite recent advancements in Self-Supervised Learning (SSL) for Time Series analysis, a noticeable gap persists between the anticipated achievements and actual performance. While these methods have demonstrated formidable generalization capabilities with minimal labels in various domains, their effectiveness in distinguishing between different classes based on a limited number of annotated records is notably lacking. Our hypothesis attributes this bottleneck to the prevalent use of Contrastive Learning, a shared training objective in previous state-of-the-art (SOTA) methods. By mandating distinctiveness between representations for negative pairs drawn from separate records, this approach compels the model to encode unique record-based patterns but simultaneously neglects changes occurring across the entire record. To overcome this challenge, we introduce Distilled Embedding for Almost-Periodic Time Series (DEAPS) in this paper, offering a non-contrastive method tailored for quasiperiodic time series, such as electrocardiogram (ECG) data. By avoiding the use of negative pairs, we not only mitigate the model's blindness to temporal changes but also enable the integration of a "Gradual Loss (L_gra)" function. This function guides the model to effectively capture dynamic patterns evolving throughout the record. The outcomes are promising, as DEAPS demonstrates a notable improvement of +10% over existing SOTA methods when just a few annotated records are presented to fit a Machine Learning (ML) model based on the learned representation.

#4 Cutting the Black Box: Conceptual Interpretation of a Deep Neural Net with Multi-Modal Embeddings and Multi-Criteria Decision Aid [PDF¹] [Copy] [Kimi] [REL]

Authors: Nicolas Atienza ; Roman Bresson ; Cyriaque Rousselot ; Philippe Caillou ; Johanne Cohen ; Christophe Labreuche ; Michele Sebag

This paper tackles the concept-based explanation of neural models in computer vision, building upon the state of the art in Multi-Criteria Decision Aid (MCDA). The novelty of the approach is to leverage multi-modal embeddings from CLIP to bridge the gap between pixel-based and concept-based representations. The proposed Cut the Black Box (CB2) approach disentangles the latent representation of a trained pixel-based neural net, referred to as teacher model, along a 3-step process. Firstly, the pixel-based representation of the samples is mapped onto a conceptual representation using multi-modal embeddings. Secondly, an interpretable-by-design MCDA student model is trained by distillation from the teacher model, using the conceptual sample representation. Thirdly, the alignment of the teacher and student latent representations spells out the concepts relevant to explaining the teacher model. The empirical validation of the approach on ResNet, VGG, and VisionTransformer on Cifar-10, Cifar-100, Tiny ImageNet, and Fashion-MNIST showcases the effectiveness of the interpretations provided for the teacher models. The analysis reveals that decision-making predominantly relies on few concepts, thereby exposing potential bias in the teacher's decisions.

#5 On the Computation of Example-Based Abductive Explanations for Random Forests [PDF] [Copy] [Kimi] [REL]

Authors: Gilles Audemard ; Jean-Marie Lagniez ; Pierre Marquis ; Nicolas Szczepanski

We show how to define and compute example-based abductive explanations. Such explanations are guaranteed to be 100% correct, fairly general, and persuasive enough since they cover sufficiently many reference instances furnished by the explainee. We prove that the latter coverage condition yields a complexity shift to the second level of the polynomial hierarchy. We present a CEGAR-based algorithm to derive such explanations, and show how to modify it to derive most anchored example-based abductive explanations, i.e., example-based abductive explanations that cover as many reference instances as possible. We also explain how to reduce example-based abductive explanations to get subset-minimal explanations. Experiments in the case of random forest classifiers show that our CEGAR-based algorithm is quite efficient in practice.

#6 Deriving Provably Correct Explanations for Decision Trees: The Impact of Domain Theories [PDF] [Copy] [Kimi] [REL]

Authors: Gilles Audemard ; Jean-Marie Lagniez ; Pierre Marquis ; Nicolas Szczepanski

We are interested in identifying the complexity of computing local explanations of various types given a decision tree, when the Boolean conditions used in the tree are not independent. This is usually the case when decision trees are learned from instances described using numerical or categorical attributes. In such a case, considering the domain theory indicating how the Boolean conditions occurring in the tree are logically connected is paramount to derive provably correct explanations. However, the nature of the domain theory may have a strong impact on the complexity of generating explanations. In this paper, we identify the complexity of deriving local explanations (abductive or contrastive) given a decision tree in the general case, and under several natural restrictions about the domain theory.

#7 Online Learning with Off-Policy Feedback in Adversarial MDPs [PDF] [Copy] [Kimi] [REL]

Authors: Francesco Bacchiocchi ; Francesco Emanuele Stradi ; Matteo Papini ; Alberto Maria Metelli ; Nicola Gatti

In this paper, we face the challenge of online learning in adversarial Markov decision processes with off-policy feedback. In this setting, the learner chooses a policy, but, differently from the traditional on-policy setting, the environment is explored by means of a different, fixed, and possibly unknown policy (named colleague's policy). The off-policy feedback presents an additional issue that is not present in traditional settings: the learner is charged with the regret of its chosen policy but it observes only the rewards gained by the colleague's policy. First, we present a lower-bound for the setting we propose, which shows that the optimal dependency of the sublinear regret is w.r.t. the dissimilarity between the optimal policy in hindsight and the colleague's policy. Then, we propose novel algorithms that, by employing pessimistic estimators---commonly adopted in the off-line reinforcement learning literature---ensure sublinear regret bounds depending on the desired dissimilarity, even when the colleague's policy is unknown.

#8 Ansatz-Agnostic Exponential Resource Saving in Variational Quantum Algorithms Using Shallow Shadows [PDF] [Copy] [Kimi] [REL]

Authors: Afrad Basheer ; Yuan Feng ; Christopher Ferrie ; Sanjiang Li

Variational Quantum Algorithms (VQA) have been identified as a promising candidate for the demonstration of near-term quantum advantage in solving optimization tasks in chemical simulation, quantum information, and machine learning. The standard model of training requires a significant amount of quantum resources, which led researchers to use classical shadows to devise an alternative that consumes exponentially fewer quantum resources. However, the approach only works when the observables are local and the ansatz is the shallow Alternating Layered Ansatz (ALA), thus severely limiting its potential in solving problems such as quantum state preparation, where the ideal state might not be approximable with an ALA. In this work, we present a protocol based on shallow shadows that achieves similar levels of savings for almost any shallow ansatz studied in the literature, when combined with observables of low Frobenius norm. We show that two important applications in quantum information for which VQAs can be a powerful option, namely variational quantum state preparation and variational quantum circuit synthesis, are compatible with our protocol. We also experimentally demonstrate orders of magnitude improvement in comparison to the standard VQA model.

#9 Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification [PDF] [Copy] [Kimi] [REL]

Authors: Matteo Bianchi ; Antonio De Santis ; Andrea Tocchetti ; Marco Brambilla

Transparency and explainability in image classification are essential for establishing trust in machine learning models and detecting biases and errors. State-of-the-art explainability methods generate saliency maps to show where a specific class is identified, without providing a detailed explanation of the model's decision process. Striving to address such a need, we introduce a post-hoc method that explains the entire feature extraction process of a Convolutional Neural Network. These explanations include a layer-wise representation of the features the model extracts from the input. Such features are represented as saliency maps generated by clustering and merging similar feature maps, to which we associate a weight derived by generalizing Grad-CAM for the proposed methodology. To further enhance these explanations, we include a set of textual labels collected through a gamified crowdsourcing activity and processed using NLP techniques and Sentence-BERT. Finally, we show an approach to generate global explanations by aggregating labels across multiple images.

#10 Contrastive General Graph Matching with Adaptive Augmentation Sampling [PDF¹] [Copy] [Kimi] [REL]

Authors: Jianyuan Bo ; Yuan Fang

Graph matching has important applications in pattern recognition and beyond. Current approaches predominantly adopt supervised learning, demanding extensive labeled data which can be limited or costly. Meanwhile, self-supervised learning methods for graph matching often require additional side information such as extra categorical information and input features, limiting their application to the general case. Moreover, designing the optimal graph augmentations for self-supervised graph matching presents another challenge to ensure robustness and efficacy. To address these issues, we introduce a novel Graph-centric Contrastive framework for Graph Matching (GCGM), capitalizing on a vast pool of graph augmentations for contrastive learning, yet without needing any side information. Given the variety of augmentation choices, we further introduce a Boosting-inspired Adaptive Augmentation Sampler (BiAS), which adaptively selects more challenging augmentations tailored for graph matching. Through various experiments, our GCGM surpasses state-of-the-art self-supervised methods across various datasets, marking a significant step toward more effective, efficient and general graph matching.

#11 Towards Exact Computation of Inductive Bias [PDF¹] [Copy] [Kimi] [REL]

Authors: Akhilan Boopathy ; William Yue ; Jaedong Hwang ; Abhiram Iyer ; Ila Fiete

Much research in machine learning involves finding appropriate inductive biases (e.g. convolutional neural networks, momentum-based optimizers, transformers) to promote generalization on tasks. However, quantification of the amount of inductive bias associated with these architectures and hyperparameters has been limited. We propose a novel method for efficiently computing the inductive bias required for generalization on a task with a fixed training data budget; formally, this corresponds to the amount of information required to specify well-generalizing models within a specific hypothesis space of models. Our approach involves modeling the loss distribution of random hypotheses drawn from a hypothesis space to estimate the required inductive bias for a task relative to these hypotheses. Unlike prior work, our method provides a direct estimate of inductive bias without using bounds and is applicable to diverse hypothesis spaces. Moreover, we derive approximation error bounds for our estimation approach in terms of the number of sampled hypotheses. Consistent with prior results, our empirical results demonstrate that higher dimensional tasks require greater inductive bias. We show that relative to other expressive model classes, neural networks as a model class encode large amounts of inductive bias. Furthermore, our measure quantifies the relative difference in inductive bias between different neural network architectures. Our proposed inductive bias metric provides an information-theoretic interpretation of the benefits of specific model architectures for certain tasks and provides a quantitative guide to developing tasks requiring greater inductive bias, thereby encouraging the development of more powerful inductive biases.

#12 Best Arm Identification with Retroactively Increased Sampling Budget for More Resource-Efficient HPO [PDF] [Copy] [Kimi] [REL]

Authors: Jasmin Brandt ; Marcel Wever ; Viktor Bengs ; Eyke Hüllermeier

Hyperparameter optimization (HPO) is indispensable for achieving optimal performance in machine learning tasks. A popular class of methods in this regard is based on Successive Halving (SHA), which casts HPO into a pure-exploration multi-armed bandit problem under finite sampling budget constraints. This is accomplished by considering hyperparameter configurations as arms and rewards as the negative validation losses. While enjoying theoretical guarantees as well as working well in practice, SHA comes, however, with several hyperparameters itself, one of which is the maximum budget that can be allocated to evaluate a single arm (hyperparameter configuration). Although there are already solutions to this meta hyperparameter optimization problem, such as the doubling trick or asynchronous extensions of SHA, these are either practically inefficient or lack theoretical guarantees. In this paper, we propose incremental SHA (iSHA), a synchronous extension of SHA, allowing to increase the maximum budget a posteriori while still enjoying theoretical guarantees. Our empirical analysis of HPO problems corroborates our theoretical findings and shows that iSHA is more resource-efficient than existing SHA-based approaches.

#13 With a Little Help from Language: Semantic Enhanced Visual Prototype Framework for Few-Shot Learning [PDF] [Copy] [Kimi] [REL]

Authors: Hecheng Cai ; Yang Liu ; Shudong Huang ; Jiancheng Lv

Few-shot learning (FSL) aims to recognize new categories given limited training samples. The core challenge is to avoid overfitting to the minimal data while ensuring good generalization to novel classes. One mainstream method employs prototypes from visual feature extractors as classifier weight and the performance depends on the quality of the prototype. Since different categories may have similar visual features, the visual prototype has limitations. This is because existing methods only learn a simple visual feature extractor during the pre-training stage but neglect the importance of a well-developed feature space for the prototype. We introduce the Semantic Enhanced Visual Prototype framework (SEVpro) to address this issue. SEVpro refines prototype learning from the pre-training stage and serves as a versatile plug-and-play framework for all prototype-based FSL methods. Specifically, we enhance prototype discriminability by transforming semantic embeddings into the visual space, aiding in separating categories with similar visual features. For novel class learning, we leverage knowledge from base classes and incorporate semantic information to elevate prototype quality further. Meanwhile, extensive experiments on FSL benchmarks and ablation studies demonstrate the superiority of our proposed SEVpro for FSL.

#14 LG-FGAD: An Effective Federated Graph Anomaly Detection Framework [PDF¹] [Copy] [Kimi] [REL]

Authors: Jinyu Cai ; Yunhe Zhang ; Jicong Fan ; See-Kiong Ng

Graph anomaly detection (GAD), which aims to identify those graphs that are significantly different from other ones, has gained growing attention in many real-world scenarios. However, existing GAD methods are generally designed for centralized training, while in real-world collaboration, graph data is generally distributed across various clients and exhibits significant non-IID characteristics. To tackle this challenge, we propose a federated graph anomaly detection framework with local-global anomaly awareness (LG-FGAD). We first introduce a self-adversarial generation module and train a discriminator to identify the generated anomalous graphs from the normal graph. To enhance the anomaly awareness of the model, we propose to maximize/minimize the mutual information from local and global perspectives. Importantly, to alleviate the impact of non-IID problems in collaborative learning, we propose a dual knowledge distillation module. The knowledge distillation is conducted over both logits and embedding distributions, and only the student model engages in collaboration to preserve the personalization of each client. Empirical results on various types of real-world datasets prove the superiority of our method.

#15 Dual Contrastive Graph-Level Clustering with Multiple Cluster Perspectives Alignment [PDF¹] [Copy] [Kimi] [REL]

Authors: Jinyu Cai ; Yunhe Zhang ; Jicong Fan ; Yali Du ; Wenzhong Guo

Graph-level clustering, which is essential in medical, biomedical, and social network data analysis, aims to group a set of graphs into various clusters. However, existing methods generally rely on a single clustering criterion, e.g., $k$-means, which limits their abilities to fully exploit the complex Euclidean and structural information inherent in graphs. To bridge this gap, we propose a dual contrastive graph-level clustering (DCGLC) method in this paper. DCGLC leverages graph contrastive learning and introduces the Euclidian-based and subspace-based cluster heads to capture the cluster information from different cluster perspectives. To overcome the inconsistency estimations and fuse the cluster information of multiple cluster heads, we propose a contrastive mechanism to align the cluster information derived from them. The cluster-perspective contrast facilitates the capture of more comprehensive cluster information. Importantly, DCGLC is an end-to-end framework in which graph contrastive learning and cluster-perspective contrast are mutually improved. We demonstrate the superiority of DCGLC over the state-of-the-art baselines on numerous graph benchmarks.

#16 Learning Low-Rank Tensor Cores with Probabilistic ℓ0-Regularized Rank Selection for Model Compression [PDF¹] [Copy] [Kimi] [REL]

Authors: Tianxiao Cao ; Lu Sun ; Canh Hao Nguyen ; Hiroshi Mamitsuka

Compressing deep neural networks is of great importance for real-world applications on resource-constrained devices. Tensor decomposition is one promising answer that retains the functionality and most of the expressive power of the original deep models by replacing the weights with their decomposed cores. Decomposition with optimal ranks can achieve a good compression-accuracy trade-off, but it is expensive to optimize due to its discrete and combinatorial nature. A common practice is to set all ranks equal and tune one hyperparameter, but it may significantly harm the flexibility and generalization. In this paper, we propose a novel automatic rank selection method for deep model compression that allows learning model weights and decomposition ranks simultaneously. We propose to penalize the ℓ0 (quasi-)norm of the slices of decomposed tensor cores during model training. To avoid combinatorial optimization, we develop a probabilistic formulation and apply an approximate Bernoulli gate to each of the slices of tensor cores, which can be implemented in an end-to-end and scalable framework via gradient descent. It enables the automatic rank selection to be incorporated with arbitrary tensor decompositions and neural network layers such as linear layers, convolutional layers, and embedding layers. Comprehensive experiments on various tasks, including image classification, text sentiment classification, and neural machine translation, demonstrate the superior effectiveness of the proposed method over baselines.

#17 Breaking Barriers of System Heterogeneity: Straggler-Tolerant Multimodal Federated Learning via Knowledge Distillation [PDF] [Copy] [Kimi] [REL]

Authors: Jinqian Chen ; Haoyu Tang ; Junhao Cheng ; Ming Yan ; Ji Zhang ; Mingzhu Xu ; Yupeng Hu ; Liqiang Nie

Internet of Things (IoT) devices possess valuable yet private multimodal data, calling for a decentralized machine learning scheme. Though several multimodal federated learning (MFL) methods have been proposed, most of them merely overlook the system heterogeneity across IoT devices, resulting in the inadaptability to real world applications. Aiming at this, we conduct theoretical analysis and exploration experiments on straggler impacts and uncover the fact that stragglers caused by system heterogeneity are fatal to MFL, resulting in catastrophic time overhead. Motivated by this, we propose a novel Multimodal Federated Learning with Accelerated Knowledge Distillation (MFL-AKD) framework, which is the first attempt to integrate knowledge distillation to combat stragglers in complex multimodal federated scenarios. Concretely, given the pretrained large-scale vision-language models deployed in the central server, we apply a fast knowledge transfer mechanism to conduct early training of local models with part of the local data. The early-trained model is then enhanced through the distillation of the pretrained large model and further trained on the remaining data. Extensive experiments on two datasets for video moment retrieval and two datasets for image-text retrieval demonstrate that our method achieves superior results with high straggler robustness.

#18 Off-Agent Trust Region Policy Optimization [PDF] [Copy] [Kimi] [REL]

Authors: Ruiqing Chen ; Xiaoyuan Zhang ; Yali Du ; Yifan Zhong ; Zheng Tian ; Fanglei Sun ; Yaodong Yang

Leveraging the experiences of other agents offers a powerful mechanism to enhance policy optimization in multi-agent reinforcement learning (MARL). However, contemporary MARL algorithms often neglect experience sharing possibilities or adopt a simple approach via direct parameter sharing. Our work explores a refined off-agent learning framework that allows selective integration of experience from other agents to improve policy learning. Our investigation begins with a thorough assessment of current mechanisms for reusing experiences among heterogeneous agents, revealing that direct experience transfer may result in negative consequences. Moreover, even the experience of homogeneous agents requires modification before reusing. Our approach introduces off-agent adaptations to the multi-agent policy optimization methods, enabling effective and purposeful leverage of cross-agent experiences beyond conventional parameter sharing. Accompanying this, we provide a theoretical guarantee for an approximate monotonic improvement. Experiments conducted on the StarCraftII Multi-Agent Challenge (SMAC) and Google Research Football (GRF) demonstrate that our algorithms outperform state-of-the-art (SOTA) methods and achieve faster convergence, suggesting the viability of our approach for efficient experience reusing in MARL.

#19 EAT: Self-Supervised Pre-Training with Efficient Audio Transformer [PDF] [Copy] [Kimi] [REL]

Authors: Wenxi Chen ; Yuzhe Liang ; Ziyang Ma ; Zhisheng Zheng ; Xie Chen

Audio self-supervised learning (SSL) pre-training, which aims to learn good representations from unlabeled audio, has made remarkable progress. However, the extensive computational demands during pre-training pose a significant barrier to the potential application and optimization of audio SSL models. In this paper, inspired by the success of data2vec 2.0 in image modality and Audio-MAE in audio modality, we introduce Efficient Audio Transformer (EAT) to further improve the effectiveness and efficiency in audio SSL. The proposed EAT adopts the bootstrap self-supervised training paradigm to the audio domain. A novel Utterance-Frame Objective (UFO) is designed to enhance the modeling capability of acoustic events. Furthermore, we reveal that the masking strategy is critical in audio SSL pre-training, and superior audio representations can be obtained with large inverse block masks. Experiment results demonstrate that EAT achieves state-of-the-art (SOTA) performance on a range of audio-related tasks, including AudioSet (AS-2M, AS-20K), ESC-50, and SPC-2, along with a significant pre-training speedup up to ~15x compared to existing audio SSL models.

#20 Global Optimality of Single-Timescale Actor-Critic under Continuous State-Action Space: A Study on Linear Quadratic Regulator [PDF] [Copy] [Kimi] [REL]

Authors: Xuyang Chen ; Jingliang Duan ; Lin Zhao

Actor-critic methods have achieved state-of-the-art performance in various challenging tasks. However, theoretical understandings of their performance remain elusive and challenging. Existing studies mostly focus on practically uncommon variants such as double-loop or two-timescale stepsize actor-critic algorithms for simplicity. These results certify local convergence on finite state- or action- space only. We push the boundary to investigate the classic single-sample single-timescale actor-critic on continuous (infinite) state-action space, where we employ the canonical linear quadratic regulator (LQR) problem as a case study. We show that the popular single-timescale actor-critic can attain an epsilon-optimal solution with an order of epsilon to -2 sample complexity for solving LQR on the demanding continuous state-action space. Our work provides new insights into the performance of single-timescale actor-critic, which further bridges the gap between theory and practice.

#21 Boosting Single Positive Multi-label Classification with Generalized Robust Loss [PDF] [Copy] [Kimi] [REL]

Authors: Yanxi Chen ; Chunxiao Li ; Xinyang Dai ; Jinhuan Li ; Weiyu Sun ; Yiming Wang ; Renyuan Zhang ; Tinghe Zhang ; Bo Wang

Multi-label learning (MLL) requires comprehensive multi-semantic annotations that is hard to fully obtain, thus often resulting in missing labels scenarios. In this paper, we investigate Single Positive Multi-label Learning (SPML), where each image is associated with merely one positive label. Existing SPML methods only focus on designing losses using mechanisms such as hard pseudo-labeling and robust losses, mostly leading to unacceptable false negatives. To address this issue, we first propose a generalized loss framework based on expected risk minimization to provide soft pseudo labels, and point out that the former losses can be seamlessly converted into our framework. In particular, we design a novel robust loss based on our framework, which enjoys flexible coordination between false positives and false negatives, and can additionally deal with the imbalance between positive and negative samples. Extensive experiments show that our approach can significantly improve SPML performance and outperform the vast majority of state-of-the-art methods on all the four benchmarks. Our code is available at https://github.com/yan4xi1/GRLoss.

#22 Disentangling Domain and General Representations for Time Series Classification [PDF¹] [Copy] [Kimi] [REL]

Authors: Youmin Chen ; Xinyu Yan ; Yang Yang ; Jianfeng Zhang ; Jing Zhang ; Lujia Pan ; Juren Li

Modeling time series data has become a very at tractive research topic due to its wide application, such as human activity recognition, financial forecasting and sensor-based automatic system monitoring. Recently deep learning models have shown great advances in modeling the time series data but they heavily depend on a large amount of labeled data. To avoid costly labeling, this paper explores domain adaptation from a labeled source domain to the unlabeled target domain on time series data. To achieve the goal, we propose a disentangled representation learning framework named CADT to disentangle the domain-invariant features from the domain-specific ones. Particularly, CADT is injected with a novel class-wise hypersphere loss to improve the generalization of the classifier from the source domain to the target domain. Intuitively, it restricts the source data of the same class within the same hypersphere and minimizes the radius of it, which in turn enlarges the margin between different classes and makes the decision boundary of both domains easier. We further devise several kinds of domain-preserving data augmentation methods to better capture the domain-specific patterns. Extensive experiments on two public datasets and two real-world applications demonstrate the effectiveness of the proposed model against several state-of-the-art baselines.

#23 Automated CPU Design by Learning from Input-Output Examples [PDF] [Copy] [Kimi] [REL]

Authors: Shuyao Cheng ; Pengwei Jin ; Qi Guo ; Zidong Du ; Rui Zhang ; Xing Hu ; Yongwei Zhao ; Yifan Hao ; Xiangtao Guan ; Husheng Han ; Zhengyue Zhao ; Ximing Liu ; Xishan Zhang ; Yuejie Chu ; Weilong Mao ; Tianshi Chen ; Yunji Chen

Designing a central processing unit (CPU) requires intensive manual work of talented experts to implement the circuit logic from design specifications. Although considerable progress has been made in electronic design automation (EDA) to relieve human efforts, all existing EDA tools require hand-crafted formal program codes (e.g., Verilog, Chisel, or C) as the input. To automate the CPU design without human programming, we are motivated to learn the CPU design from only input-output (IO) examples. The key challenge is that the learned CPU design should have almost zero tolerance for inaccuracy, which makes well-known approximate algorithms such as neural networks ineffective. We propose a new AI approach to generate the CPU design in the form of a large-scale Boolean function, from only external IO examples instead of formal program code. This approach employs a novel graph structure called Binary Speculative Diagram (BSD) to approximate the CPU-scale Boolean function accurately. We propose an efficient BSD expansion method based on Boolean Distance, a new metric to quantitatively measure the structural similarity between Boolean functions, gradually increasing the design accuracy up to 100%. Our approach generates an industrial-scale RISC-V CPU design within 5 hours, reducing the design cycle by about 1000x without human involvement. The taped-out chip, Enlightenment-1, the world's first CPU designed by AI, successfully runs the Linux operating system and performs comparably against the human-design Intel 80486SX CPU. Our approach even autonomously discovers human knowledge of the von Neumann architecture.

#24 Deep Embedding Clustering Driven by Sample Stability [PDF] [Copy] [Kimi] [REL]

Authors: Zhanwen Cheng ; Feijiang Li ; Jieting Wang ; Yuhua Qian

Deep clustering methods improve the performance of clustering tasks by jointly optimizing deep representation learning and clustering. While numerous deep clustering algorithms have been proposed, most of them rely on artificially constructed pseudo targets for performing clustering. This construction process requires some prior knowledge, and it is challenging to determine a suitable pseudo target for clustering. To address this issue, we propose a deep embedding clustering algorithm driven by sample stability (DECS), which eliminates the requirement of pseudo targets. Specifically, we start by constructing the initial feature space with an autoencoder and then learn the cluster-oriented embedding feature constrained by sample stability. The sample stability aims to explore the deterministic relationship between samples and all cluster centroids, pulling samples to their respective clusters and keeping them away from other clusters with high determinacy. We analyzed the convergence of the loss using Lipschitz continuity in theory, which verifies the validity of the model. The experimental results on five datasets illustrate that the proposed method achieves superior performance compared to state-of-the-art clustering approaches.

#25 Diversification of Adaptive Policy for Effective Offline Reinforcement Learning [PDF] [Copy] [Kimi] [REL]

Authors: Yunseon Choi ; Li Zhao ; Chuheng Zhang ; Lei Song ; Jiang Bian ; Kee-Eung Kim

Offline Reinforcement Learning (RL) aims to learn policies from pre-collected datasets that capture only a subset of the environment's dynamics. The predominant approach has been to solve a constrained optimization formulation, which ensures that the policy visits state-action pairs within the support of the offline dataset. However, this approach has limited the ability to make decisions when the agent faces unknown parts of the environment at deployment time. To address the challenge of decision-making in out-of-support regions, model-based Bayes-adaptive approaches have been proposed by considering all dynamics models that could potentially be the true environment. Since it is generally infeasible to compute the posterior of all dynamics models based on the offline dataset, these approaches usually approximate the posterior by using a finite ensemble of highly probable dynamics models. Hence, the diversity of these models is the key to obtaining good policies. In this work, we propose MoDAP (Model-based Diverse Adaptive Policy Learning), an algorithm to enable the adaptive policy to make informed decisions in previously unexplored states. MoDAP adopts an iterative strategy that simultaneously training the policy and dynamics models. The policy optimization seeks to maximize expected returns across dynamics models, while the dynamics models are trained to promote policy diversification through the proposed information-theoretic objective. We evaluate MoDAP through experiments on the D4RL and NeoRL benchmarks, showcasing its performance superiority over state-of-the-art algorithms.