| Total: 33

Optimal transport (OT) plays an essential role in various areas like machine learning and deep learning. However, computing discrete optimal transport plan for large scale problems with adequate accuracy and efficiency is still highly challenging. Recently, methods based on the Sinkhorn algorithm add an entropy regularizer to the prime problem and get a trade off between efficiency and accuracy. In this paper, we propose a novel algorithm to further improve the efficiency and accuracy based on Nesterov's smoothing technique. Basically, the non-smooth c-transform of the Kantorovich potential is approximated by the smooth Log-Sum-Exp function, which finally smooths the original non-smooth Kantorovich dual functional. The smooth Kantorovich functional can be optimized by the fast proximal gradient algorithm (FISTA) efficiently. Theoretically, the computational complexity of the proposed method is lower than current estimation of the Sinkhorn algorithm in terms of the precision. Empirically, compared with the Sinkhorn algorithm, our experimental results demonstrate that the proposed method achieves faster convergence and better accuracy with the same parameter.

We study the convergence rate of gradient-based local search methods for solving low-rank matrix recovery problems with general objectives in both symmetric and asymmetric cases, under the assumption of the restricted isometry property. First, we develop a new technique to verify the Polyak-Lojasiewicz inequality in a neighborhood of the global minimizers, which leads to a local linear convergence region for the gradient descent method. Second, based on the local convergence result and a sharp strict saddle property proven in this paper, we present two new conditions that guarantee the global linear convergence of the perturbed gradient descent method. The developed local and global convergence results provide much stronger theoretical guarantees than the existing results. As a by-product, this work significantly improves the existing bounds on the RIP constant required to guarantee the non-existence of spurious solutions.

We present a new algorithm called A*+BFHS for solving problems with unit-cost operators where A* and IDA* fail due to memory limitations and/or the existence of many distinct paths between the same pair of nodes. A*+BFHS is based on A* and breadth-first heuristic search (BFHS). A*+BFHS combines advantages from both algorithms, namely A*'s node ordering, BFHS's memory savings, and both algorithms' duplicate detection. On easy problems, A*+BFHS behaves the same as A*. On hard problems, it is slower than A* but saves a large amount of memory. Compared to BFIDA*, A*+BFHS reduces the search time and/or memory requirement by several times on a variety of planning domains.

The maximum k-club problem (MkCP) is an important clique relaxation problem with wide applications. Previous MkCP algorithms only work on small-scale instances and are not applicable for large-scale instances. For solving instances with different scales, this paper develops an efficient local search algorithm named NukCP for the MkCP which mainly includes two novel ideas. First, we propose a dynamic reduction strategy, which makes a good balance between the time efficiency and the precision effectiveness of the upper bound calculation. Second, a stratified threshold configuration checking strategy is designed by giving different priorities for the neighborhood in the different levels. Experiments on a broad range of different scale instances show that NukCP significantly outperforms the state-of-the-art MkCP algorithms on most instances.

Optimization of real-world black-box functions defined over purely categorical variables is an active area of research. In particular, optimization and design of biological sequences with specific functional or structural properties have a profound impact in medicine, materials science, and biotechnology. Standalone search algorithms, such as simulated annealing (SA) and Monte Carlo tree search (MCTS), are typically used for such optimization problems. In order to improve the performance and sample efficiency of such algorithms, we propose to use existing methods in conjunction with a surrogate model for the black-box evaluations over purely categorical variables. To this end, we present two different representations, a group-theoretic Fourier expansion and an abridged one-hot encoded Boolean Fourier expansion. To learn such representations, we consider two different settings to update our surrogate model. First, we utilize an adversarial online regression setting where Fourier characters of each representation are considered as experts and their respective coefficients are updated via an exponential weight update rule each time the black box is evaluated. Second, we consider a Bayesian setting where queries are selected via Thompson sampling and the posterior is updated via a sparse Bayesian regression model (over our proposed representation) with a regularized horseshoe prior. Numerical experiments over synthetic benchmarks as well as real-world RNA sequence optimization and design problems demonstrate the representational power of the proposed methods, which achieve competitive or superior performance compared to state-of-the-art counterparts, while improving the computation cost and/or sample efficiency, substantially.

In bounded-suboptimal heuristic search, one attempts to find a solution that costs no more than a prespecified factor of optimal as quickly as possible. This is an important setting, as it admits faster-than-optimal solving while retaining some control over solution cost. In this paper, we investigate several new algorithms for bounded-suboptimal search, including novel variants of EES and DPS, the two most prominent previous proposals, and methods inspired by recent work in bounded-cost search that leverages uncertainty estimates of the heuristic. We perform what is, to our knowledge, the most comprehensive empirical comparison of bounded-suboptimal search algorithms to date, including both search and planning benchmarks, and we find that one of the new algorithms, a simple alternating queue scheme, significantly outperforms previous work.

The Maximum k-Defective Clique Problem (MDCP), as a clique relaxation model, has been used to solve various problems. Because it is a hard computational task, previous works can hardly solve the MDCP for massive sparse graphs derived from real-world applications. In this work, we propose a novel branch-and-bound algorithm to solve the MDCP based on several new techniques. First, we propose two new upper bounds of the MDCP as well as corresponding reduction rules to remove redundant vertices and edges. The proposed reduction rules are particularly useful for massive graphs. Second, we present another new upper bound by counting missing edges between fixed vertices and an unfixed vertex for cutting branches. We perform extensive computational experiments to evaluate our algorithm. Experimental results show that our reduction rules are very effective for removing redundant vertices and edges so that graphs are reduced greatly. Also, our algorithm can solve benchmark instances efficiently, and it has significantly better performance than state-of-the-art algorithms.

Learning from one's mistakes is an effective human learning technique where the learners focus more on the topics where mistakes were made, so as to deepen their understanding. In this paper, we investigate if this human learning strategy can be applied in machine learning. We propose a novel machine learning method called Learning From Mistakes (LFM), wherein the learner improves its ability to learn by focusing more on the mistakes during revision. We formulate LFM as a three-stage optimization problem: 1) learner learns; 2) learner re-learns focusing on the mistakes, and; 3) learner validates its learning. We develop an efficient algorithm to solve the LFM problem. We apply the LFM framework to neural architecture search on CIFAR-10, CIFAR-100, and Imagenet. Experimental results strongly demonstrate the effectiveness of our model.

Temporal graphs naturally model graphs whose underlying topology changes over time. Recently, the problems Temporal Vertex Cover (or TVC) and Sliding-Window Temporal Vertex Cover (or Delta-TVC for time-windows of a fixed-length Delta) have been established as natural extensions of the classic Vertex Cover problem on static graphs with connections to areas such as surveillance in sensor networks. In this paper we initiate a systematic study of the complexity of TVC and Delta-TVC on sparse graphs. Our main result shows that for every Delta geq 2, Delta-TVC is NP-hard even when the underlying topology is described by a path or a cycle. This resolves an open problem from literature and shows a surprising contrast between Delta-TVC and TVC for which we provide a polynomial-time algorithm in the same setting. To circumvent this hardness, we present a number of exact and approximation algorithms for temporal graphs whose underlying topologies are given by a path, that have bounded vertex degree in every time step, or that admit a small-sized temporal vertex cover.

The efficient detection of outbreaks and other cascading phenomena is a fundamental problem in a number of domains, including disease spread, social networks, and infrastructure networks. In such settings, monitoring and testing a small group of pre-selected nodes from the susceptible population (i.e., a sensor set) is often the preferred testing regime. We study the problem of selecting a sensor set that minimizes the delay in detection---we refer to this as the MinDelSS problem. Prior methods for minimizing the detection time rely on greedy algorithms using submodularity. We show that this approach can sometimes lead to a worse approximation for minimizing the detection time than desired. We also show that MinDelSS is hard to approximate within an O(n^(1-1/g))-factor for any constant g greater than or equal to 2 for a graph with n nodes. This instead motivates seeking a bicriteria approximations. We present the algorithm RoundSensor, which gives a rigorous worst case O(log(n))-factor for the detection time, while violating the budget by a factor of O(log^2(n)). Our algorithm is based on the sample average approximation technique from stochastic optimization, combined with linear programming and rounding. We evaluate our algorithm on several networks, including hospital contact networks, which validates its effectiveness in real settings.

We present a multi-objective meta-search procedure that constructs candidate algorithms for state-space search puzzles like Rubik's cube. The candidate algorithms take the form of macro databases, i.e., rule tables that specify sequences of actions to perform in different states. Rules are repeatedly applied until the puzzle is solved. The objectives favor candidates that are god-like (solving the puzzle in fewer steps) and folk-like (having fewer rules in the macro database). We build each candidate with a non-deterministic rule table construction, and then optimize over the non-deterministic choice points to find candidates near the Pareto-optimal trades-offs between godliness and folksiness. We prove that the rule table construction is correct: it always terminates and solves every state at termination. This is verified empirically on the full 2x2x2 "pocket" cube, where correct (but unoptimized) constructions take under one hour and the total number of rules is less than 10% the number of possible states. We also empirically assess the multi-objective optimization on restricted variants of the cube with up to 29K possible states, showing relative improvements in the objectives between 14-20%. Avenues for scaling up the method in future work are discussed.

Mixed-integer programming (MIP) technology offers a generic way of formulating and solving combinatorial optimization problems. While generally reliable, state-of-the-art MIP solvers base many crucial decisions on hand-crafted heuristics, largely ignoring common patterns within a given instance distribution of the problem of interest. Here, we propose MIP-GNN, a general framework for enhancing such solvers with data-driven insights. By encoding the variable-constraint interactions of a given mixed-integer linear program (MILP) as a bipartite graph, we leverage state-of-the-art graph neural network architectures to predict variable biases, i.e., component-wise averages of (near) optimal solutions, indicating how likely a variable will be set to 0 or 1 in (near) optimal solutions of binary MILPs. In turn, the predicted biases stemming from a single, once-trained model are used to guide the solver, replacing heuristic components. We integrate MIP-GNN into a state-of-the-art MIP solver, applying it to tasks such as node selection and warm-starting, showing significant improvements compared to the default setting of the solver on two classes of challenging binary MILPs. Our code and appendix are publicly available at https://github.com/lyeskhalil/mipGNN.

Optimizing a machine learning (ML) pipeline has been an important topic of AI and ML. Despite recent progress, pipeline optimization remains a challenging problem, due to potentially many combinations to consider as well as slow training and validation. We present the BLDS algorithm for optimized algorithm selection (ML operations) in a fixed ML pipeline structure. BLDS performs multi-fidelity optimization for selecting ML algorithms trained with smaller computational overhead, while controlling its pipeline search based on multi-armed bandit and limited discrepancy search. Our experiments on well-known classification benchmarks show that BLDS is superior to competing algorithms. We also combine BLDS with hyperparameter optimization, empirically showing the advantage of BLDS.

With ever-increasing dataset sizes, subset selection techniques are becoming increasingly important for a plethora of tasks. It is often necessary to guide the subset selection to achieve certain desiderata, which includes focusing or targeting certain data points, while avoiding others. Examples of such problems include: i)targeted learning, where the goal is to find subsets with rare classes or rare attributes on which the model is under performing, and ii)guided summarization, where data (e.g., image collection, text, document or video) is summarized for quicker human consumption with specific additional user intent. Motivated by such applications, we present PRISM, a rich class of PaRameterIzed Submodular information Measures. Through novel functions and their parameterizations, PRISM offers a variety of modeling capabilities that enable a trade-off between desired qualities of a subset like diversity or representation and similarity/dissimilarity with a set of data points. We demonstrate how PRISM can be applied to the two real-world problems mentioned above, which require guided subset selection. In doing so, we show that PRISM interestingly generalizes some past work, therein reinforcing its broad utility. Through extensive experiments on diverse datasets, we demonstrate the superiority of PRISM over the state-of-the-art in targeted learning and in guided image-collection summarization. PRISM is available as a part of the SUBMODLIB (https://github.com/decile-team/submodlib) and TRUST (https://github.com/decile-team/trust) toolkits.

In many games, moves consist of several decisions made by the player. These decisions can be viewed as separate moves, which is already a common practice in multi-action games for efficiency reasons. Such division of a player move into a sequence of simpler / lower level moves is called splitting. So far, split moves have been applied only in forementioned straightforward cases, and furthermore, there was almost no study revealing its impact on agents' playing strength. Taking the knowledge-free perspective, we aim to answer how to effectively use split moves within Monte-Carlo Tree Search (MCTS) and what is the practical impact of split design on agents' strength. This paper proposes a generalization of MCTS that works with arbitrarily split moves. We design several variations of the algorithm and try to measure the impact of split moves separately on efficiency, quality of MCTS, simulations, and action-based heuristics. The tests are carried out on a set of board games and performed using the Regular Boardgames General Game Playing formalism, where split strategies of different granularity can be automatically derived based on an abstract description of the game. The results give an overview of the behavior of agents using split design in different ways. We conclude that split design can be greatly beneficial for single- as well as multi-action games.

Multi-Agent Path Finding (MAPF) is the problem of planning collision-free paths for multiple agents in a shared environment. In this paper, we propose a novel algorithm MAPF-LNS2 based on large neighborhood search for solving MAPF efficiently. Starting from a set of paths that contain collisions, MAPF-LNS2 repeatedly selects a subset of colliding agents and replans their paths to reduce the number of collisions until the paths become collision-free. We compare MAPF-LNS2 against a variety of state-of-the-art MAPF algorithms, including Prioritized Planning with random restarts, EECBS, and PPS, and show that MAPF-LNS2 runs significantly faster than them while still providing near-optimal solutions in most cases. MAPF-LNS2 solves 80% of the random-scenario instances with the largest number of agents from the MAPF benchmark suite with a runtime limit of just 5 minutes, which, to our knowledge, has not been achieved by any existing algorithms.

Tensor optimization is crucial to massive machine learning and signal processing tasks. In this paper, we consider tensor optimization with a convex and well-conditioned objective function and reformulate it into a nonconvex optimization using the Burer-Monteiro type parameterization. We analyze the local convergence of applying vanilla gradient descent to the factored formulation and establish a local regularity condition under mild assumptions. We also provide a linear convergence analysis of the gradient descent algorithm started in a neighborhood of the true tensor factors. Complementary to the local analysis, this work also characterizes the global geometry of the best rank-one tensor approximation problem and demonstrates that for orthogonally decomposable tensors the problem has no spurious local minima and all saddle points are strict except for the one at zero which is a third-order saddle point.

Cross-modal hashing has attracted considerable attention for large-scale multimodal data. Recent supervised cross-modal hashing methods using multi-label networks utilize the semantics of multi-labels to enhance retrieval accuracy, where label hash codes are learned independently. However, all these methods assume that label annotations reliably reflect the relevance between their corresponding instances, which is not true in real applications. In this paper, we propose a novel framework called Bidirectional Reinforcement Guided Hashing for Effective Cross-Modal Retrieval (Bi-CMR), which exploits a bidirectional learning to relieve the negative impact of this assumption. Specifically, in the forward learning procedure, we highlight the representative labels and learn the reinforced multi-label hash codes by intra-modal semantic information, and further adjust similarity matrix. In the backward learning procedure, the reinforced multi-label hash codes and adjusted similarity matrix are used to guide the matching of instances. We construct two datasets with explicit relevance labels that reflect the semantic relevance of instance pairs based on two benchmark datasets. The Bi-CMR is evaluated by conducting extensive experiments over these two datasets. Experimental results prove the superiority of Bi-CMR over four state-of-the-art methods in terms of effectiveness.

Configuration checking (CC) has been confirmed to alleviate the cycling problem in local search for combinatorial optimization problems (COPs). When using CC heuristics in local search for graph problems, a critical concept is the configuration of the vertices. All existing CC variants employ either 1- or 2-level neighborhoods of a vertex as its configuration. Inspired by the idea that neighborhoods with different levels should have different contributions to solving COPs, we propose the probabilistic configuration (PC), which introduces probabilities for neighborhoods at different levels to consider the impact of neighborhoods of different levels on the CC strategy. Based on the concept of PC, we first propose probabilistic configuration checking (PCC), which can be developed in an automated and lightweight favor. We then apply PCC to two classic COPs which have been shown to achieve good results by using CC, and our preliminary results confirm that PCC improves the existing algorithms because PCC alleviates the cycling problem.

It is well-known that the search algorithms A* and Iterative Deepening A* (IDA*) can fail to solve state-space tasks optimally due to time and memory limits. The former typically fails in memory-restricted scenarios and the latter in time-restricted scenarios. Therefore, several algorithms were proposed to solve state-space tasks optimally using less memory than A* and less time than IDA*, such as A*+IDA*, a hybrid memory-restricted algorithm that combines A* and IDA*. In this paper, we present a hybrid memory-restricted algorithm that combines Partial Expansion A* (PEA*) and IDA*. This new algorithm has two phases, the same structure as the A*+IDA* algorithm. The first phase of PEA*+IDA* runs PEA* until it reaches a memory limit, and the second phase runs IDA* without duplicate detection on each node of PEA*'s Open. First, we present a model that shows how PEA*+IDA* can perform better than A*+IDA* although pure PEA* usually makes more expansions than pure A*. Later, we perform an experimental evaluation using three memory limits and show that, compared to A*+IDA* on classical planning domains, PEA*+IDA* has higher coverage and expands fewer nodes. Finally, we experimentally analyze both algorithms and show that having higher F-limits and better priority-queue composition given by PEA* have a considerable impact on the performance of the algorithms.

We consider an application of combinatorial search to the optimization of topologies in series-parallel networks. We propose a recursive search over the space of decomposition trees, in which partial solutions are obtained by exploring k-way partitionings of expandable nodes. We present two complementary pruning techniques that bound the value of intermediate solutions from above and below, applying monotonic operations to the contents of unresolved leaves. We also develop a means to exploit the convexity of our objective function, so as to prevent the redundant recomputation of subcircuit configurations. Finally, we evaluate our approach on a parameterized benchmark suite of electrical circuits, demonstrating over an order of magnitude improvement in performance as compared to a baseline implementation.

We report a new neural backdoor attack, named Hibernated Backdoor, which is stealthy, aggressive and devastating. The backdoor is planted in a hibernated mode to avoid being detected. Once deployed and fine-tuned on end-devices, the hibernated backdoor turns into the active state that can be exploited by the attacker. To the best of our knowledge, this is the first hibernated neural backdoor attack. It is achieved by maximizing the mutual information (MI) between the gradients of regular and malicious data on the model. We introduce a practical algorithm to achieve MI maximization to effectively plant the hibernated backdoor. To evade adaptive defenses, we further develop a targeted hibernated backdoor, which can only be activated by specific data samples and thus achieves a higher degree of stealthiness. We show the hibernated backdoor is robust and cannot be removed by existing backdoor removal schemes. It has been fully tested on four datasets with two neural network architectures, compared to five existing backdoor attacks, and evaluated using seven backdoor detection schemes. The experiments demonstrate the effectiveness of the hibernated backdoor attack under various settings.

Combinatorial optimization problems are ubiquitous for decision making in planning social infrastructures. In real-world scenarios, a decision-maker needs to solve his/her problem iteratively until he/she satisfies solutions, but such an iterative process remains challenging. This paper studies a new explainable framework, particularly for finding meeting points, which is a key optimization problem for designing facility locations. Our framework automatically fills the gap between its input instance and instances from which a user could obtain the desired outcome, where computed solutions are judged by the user. The framework also provides users with explanations, representing the difference of instances for deeply understanding the process and its inside. Explanations are clues for users to understand their situation and implement suggested results in practice (e.g., designing a coupon for free travel). We experimentally demonstrate that our search-based framework is promising to solve instances with generating explanations in a sequential decision-making process.

The Latin square completion (LSC) problem is an important NP-complete problem with numerous applications. Given its theoretical and practical importance, several algorithms are designed for solving the LSC problem. In this work, to further improve the performance, a fast local search algorithm is developed based on three main ideas. Firstly, a reduction reasoning technique is used to reduce the scale of search space. Secondly, we propose a novel conflict value selection heuristic, which considers the history conflicting information of vertices as a selection criterion when more than one vertex have equal values on the primary scoring function. Thirdly, during the search phase, we record previous history search information and then make use of these information to restart the candidate solution. Experimental results show that our proposed algorithm significantly outperforms the state-of-the-art heuristic algorithms on almost all instances in terms of success rate and run time.

Submodular functions are at the core of many machine learning and data mining tasks. The underlying submodular functions for many of these tasks are decomposable, i.e., they are sum of several simple submodular functions. In many data intensive applications, however, the number of underlying submodular functions in the original function is so large that we need prohibitively large amount of time to process it and/or it does not even fit in the main memory. To overcome this issue, we introduce the notion of sparsification for decomposable submodular functions whose objective is to obtain an accurate approximation of the original function that is a (weighted) sum of only a few submodular functions. Our main result is a polynomial-time randomized sparsification algorithm such that the expected number of functions used in the output is independent of the number of underlying submodular functions in the original function. We also study the effectiveness of our algorithm under various constraints such as matroid and cardinality constraints. We complement our theoretical analysis with an empirical study of the performance of our algorithm.