Total: 25

Integer Linear Programming (ILP) and its mixed variant (MILP) are archetypical examples of NP-complete optimization problems which have a wide range of applications in various areas of artificial intelligence. However, we still lack a thorough understanding of which structural restrictions make these problems tractable. Here we focus on structure captured via so-called decompositional parameters, which have been highly successful in fields such as boolean satisfiability and constraint satisfaction but have not yet reached their full potential in the ILP setting. In particular, primal treewidth (an established decompositional parameter) can only be algorithmically exploited to solve ILP under restricted circumstances. Our main contribution is the introduction and algorithmic exploitation of two new decompositional parameters for ILP and MILP. The first, torso-width, is specifically tailored to the linear programming setting and is the first decompositional parameter which can also be used for MILP. The latter, incidence treewidth, is a concept which originates from boolean satisfiability but has not yet been used in the ILP setting; here we obtain a full complexity landscape mapping the precise conditions under which incidence treewidth can be used to obtain efficient algorithms. Both of these parameters overcome previous shortcomings of primal treewidth for ILP in unique ways, and consequently push the frontiers of tractability for these important problems.

A common strategy for improving optimization algorithms is to restart the algorithm when it is believed to be trapped in an inferior part of the search space. However, while specific restart strategies have been developed for specific problems (and specific algorithms), restarts are typically not regarded as a general tool to speed up an optimization algorithm. In fact, many optimization algorithms do not employ restarts at all. Recently, "bet-and-run" was introduced in the context of mixed-integer programming, where first a number of short runs with randomized initial conditions is made, and then the most promising run of these is continued. In this article, we consider two classical NP-complete combinatorial optimization problems, traveling salesperson and minimum vertex cover, and study the effectiveness of different bet-and-run strategies. In particular, our restart strategies do not take any problem knowledge into account, nor are tailored to the optimization algorithm. Therefore, they can be used off-the-shelf. We observe that state-of-the-art solvers for these problems can benefit significantly from restarts on standard benchmark instances.

The Schatten-p norm (0 < p < 1) has been widely used to replace the nuclear norm for better approximating the rank function. However, existing methods are either 1) not scalable for large scale problems due to relying on singular value decomposition (SVD) in every iteration, or 2) specific to some p values, e.g., 1/2, and 2/3. In this paper, we show that for any p, p1, and p2 > 0 satisfying 1/p = 1/p1 + 1/p2, there is an equivalence between the Schatten-p norm of one matrix and the Schatten-p1 and the Schatten-p2 norms of its two factor matrices. We further extend the equivalence to multiple factor matrices and show that all the factor norms can be convex and smooth for any p > 0. In contrast, the original Schatten-p norm for 0 < p < 1 is non-convex and non-smooth. As an example we conduct experiments on matrix completion. To utilize the convexity of the factor matrix norms, we adopt the accelerated proximal alternating linearized minimization algorithm and establish its sequence convergence. Experiments on both synthetic and real datasets exhibit its superior performance over the state-of-the-art methods. Its speed is also highly competitive.

Automatically searching for optimal hyperparameter configurations is of crucial importance for applying deep learning algorithms in practice. Recently, Bayesian optimization has been proposed for optimizing hyperparameters of various machine learning algorithms. Those methods adopt probabilistic surrogate models like Gaussian processes to approximate and minimize the validation error function of hyperparameter values. However, probabilistic surrogates require accurate estimates of sufficient statistics (e.g., covariance) of the error distribution and thus need many function evaluations with a sizeable number of hyperparameters. This makes them inefficient for optimizing hyperparameters of deep learning algorithms, which are highly expensive to evaluate. In this work, we propose a new deterministic and efficient hyperparameter optimization method that employs radial basis functions as error surrogates. The proposed mixed integer algorithm, called HORD, searches the surrogate for the most promising hyperparameter values through dynamic coordinate search and requires many fewer function evaluations. HORD does well in low dimensions but it is exceptionally better in higher dimensions. Extensive evaluations on MNIST and CIFAR-10 for four deep neural networks demonstrate HORD significantly outperforms the well-established Bayesian optimization methods such as GP, SMAC, and TPE. For instance, on average, HORD is more than 6 times faster than GP-EI in obtaining the best configuration of 19 hyperparameters.

We describe an exact branch-and-bound algorithm for the maximum weight clique problem (MWC), called WLMC, that is especially suited for large vertex-weighted graphs. WLMC incorporates two original contributions: a preprocessing to derive an initial vertex ordering and to reduce the size of the graph, and incremental vertex-weight splitting to reduce the number of branches in the search space. Experiments on representative large graphs from real-world applications show that WLMC greatly outperforms relevant exact and heuristic MWC algorithms, and refute the prevailing hypothesis that exact MWC algorithms are less adequate for large graphs than heuristic algorithms.

Although distance metric learning has been successfully applied to many real-world applications, learning a distance metric from large-scale and high-dimensional data remains a challenging problem. Due to the PSD constraint, the computational complexity of previous algorithms per iteration is at least O(d2) where d is the dimensionality of the data.In this paper, we develop an efficient stochastic algorithm for a class of distance metric learning problems with nuclear norm regularization, referred to as low-rank DML. By utilizing the low-rank structure of the intermediate solutions and stochastic gradients, the complexity of our algorithm has a linear dependence on the dimensionality d. The key idea is to maintain all the iterates in factorized representations and construct stochastic gradients that are low-rank. In this way, the projection onto the PSD cone can be implemented efficiently by incremental SVD. Experimental results on several data sets validate the effectiveness and efficiency of our method.

Random embedding has been applied with empirical success to large-scale black-box optimization problems with low effective dimensions. This paper proposes the EmbeddedHunter algorithm, which incorporates the technique in a hierarchical stochastic bandit setting, following the optimism in the face of uncertainty principle and breaking away from the multiple-run framework in which random embedding has been conventionally applied similar to stochastic black-box optimization solvers. Our proposition is motivated by the bounded mean variation in the objective value for a low-dimensional point projected randomly into the decision space of Lipschitz-continuous problems. In essence, the EmbeddedHunter algorithm expands optimistically a partitioning tree over a low-dimensional — equal to the effective dimension of the problem —search space based on a bounded number of random embeddings of sampled points from the low-dimensional space. In contrast to the probabilistic theoretical guarantees of multiple-run random-embedding algorithms, the finite-time analysis of the proposed algorithm presents a theoretical upper bound on the regret as a function of the algorithm's number of iterations. Furthermore, numerical experiments were conducted to validate its performance. The results show a clear performance gain over recently proposed random embedding methods for large-scale problems, provided the intrinsic dimensionality is low.

Nowadays, asynchronous parallel algorithms have received much attention in the optimization field due to the crucial demands for modern large-scale optimization problems. However, most asynchronous algorithms focus on convex problems. Analysis on nonconvex problems is lacking. For the Asynchronous Stochastic Descent (ASGD) algorithm, the best result from (Lian et al., 2015) can only achieve an asymptotic O(\frac{1}{\epsilon^2}) rate (convergence to the stationary points) on nonconvex problems. In this paper, we study Stochastic Variance Reduced Gradient (SVRG) in the asynchronous setting. We propose the Asynchronous Stochastic Variance Reduced Gradient (ASVRG) algorithm for nonconvex finite-sum problems. We develop two schemes for ASVRG, depending on whether the parameters are updated as an atom or not. We prove that both of the two schemes can achieve linear speed up (a non-asymptotic O(\frac{n^\frac{2}{3}}{\epsilon}) rate to the stationary points) for nonconvex problems when the delay parameter \tau\leq n^{\frac{1}{3}}, where n is the number of training samples. We also establish a non-asymptotic O(\frac{n^\frac{2}{3}\tau^\frac{1}{3}}{\epsilon}) rate (convergence to the stationary points) for our algorithm without assumptions on \tau. This further demonstrates that even with asynchronous updating, SVRG has less number of Incremental First-order Oracles (IFOs) compared with Stochastic Gradient Descent and Gradient Descent. We also experiment on a shared memory multi-core system to demonstrate the efficiency of our algorithm.

Submodular function maximization has numerous applications in machine learning and artificial intelligence. Many real applications require multiple submodular objective func-tions to be maximized, and which function is regarded as important by a user is not known in advance. In such cases, it is desirable to have a small family of representative solutions that would satisfy any user’s preference. A traditional approach for solving such a problem is to enumerate the Pareto optimal solutions. However, owing to the massive number of Pareto optimal solutions (possibly exponentially many), it is difficult for a user to select a solution. In this paper, we propose two efficient methods for finding a small family of representative solutions, based on the notion of regret ratio. The first method outputs a family of fixed size with a nontrivial regret ratio. The second method enables us to choose the size of the output family, and in the biobjective case, it has a provable trade-off between the size and the regret ratio. Using real and synthetic data, we empirically demonstrate that our methods achieve a small regret ratio.

We consider non-monotone DR-submodular function maximization, where DR-submodularity (diminishing return submodularity) is an extension of submodularity for functions over the integer lattice based on the concept of the diminishing return property. Maximizing non-monotone DR-submodular functions has many applications in machine learning that cannot be captured by submodular set functions. In this paper, we present a 1/(2+ε)-approximation algorithm with a running time of roughly O(n/ε log2B), where n is the size of the ground set, B is the maximum value of a coordinate, and ε > 0 is a parameter. The approximation ratio is almost tight and the dependency of running time on B is exponentially smaller than the naive greedy algorithm. Experiments on synthetic and real-world datasets demonstrate that our algorithm outputs almost the best solution compared to other baseline algorithms, whereas its running time is several orders of magnitude faster.

Logic-based Benders decomposition (LBBD) is a powerful hybrid optimisation technique that can combine the strong dual bounds of mixed integer programming (MIP) with the combinatorial search strengths of constraint programming (CP). A major drawback of LBBD is that it is a far more involved process to implement an LBBD solution to a problem than the "model-and-run" approach provided by both CP and MIP. We propose an automated approach that accepts an arbitrary MiniZinc model and solves it using LBBD with no additional intervention on the part of the modeller. The design of this approach also reveals an interesting duality between LBBD and large neighborhood search (LNS). We compare our implementation of this approach to CP and MIP solvers on 4 different problem classes where LBBD has been applied before.

A clique model is one of the most important techniques on the cohesive subgraph detection; however, its applications are rather limited due to restrictive conditions of the model. Hence much research resorts to k-plex — a graph in which any vertex is adjacent to all but at most k vertices — which is a relaxation model of the clique. In this paper, we study the maximum k-plex problem and propose a fast algorithm to compute maximum k-plexes by exploiting structural properties of the problem. In an n-vertex graph, the algorithm computes optimal solutions in cnnO(1) time for a constant c < 2 depending only on k. To the best of our knowledge, this is the first algorithm that breaks the trivial theoretical bound of 2n for each k ≥ 3. We also provide experimental results over multiple real-world social network instances in support.

A grid maze is a binary matrix where fields containing a 0 are accessible while fields containing a 1 are blocked. A movement sequence consists of relative movements up, down, left, right – moving to a blocked field results in non-movement. The simultaneous maze solving problem asks for the shortest movement sequence starting in the upper left corner and visiting the lower right corner for all mazes of size n × m (for which a path from the upper left to the lower right corner exists at all). We present a theoretical problem analysis, including hardness results and a cubic upper bound on the sequence length. In addition, we describe several approaches to practically compute solving sequences and lower bounds despite the high combinatorial complexity of the problem.

To achieve peak performance, it is often necessary to adjust the parameters of a given algorithm to the class of problem instances to be solved; this is known to be the case for popular solvers for a broad range of AI problems, including AI planning, propositional satisfiability (SAT) and answer set programming (ASP). To avoid tedious and often highly sub-optimal manual tuning of such parameters by means of ad-hoc methods, general-purpose algorithm configuration procedures can be used to automatically find performance-optimizing parameter settings. While impressive performance gains are often achieved in this manner, additional, potentially costly parameter importance analysis is required to gain insights into what parameter changes are most responsible for those improvements. Here, we show how the running time cost of ablation analysis, a well-known general-purpose approach for assessing parameter importance, can be reduced substantially by using regression models of algorithm performance constructed from data collected during the configuration process. In our experiments, we demonstrate speed-up factors between 33 and 14 727 for ablation analysis on various configuration scenarios from AI planning, SAT, ASP and mixed integer programming (MIP).

In the recent years, there has been significant work on the difficulty of heuristic search problems, identifying different problem instance characteristics that can have a significant impact on search effort. Phase transitions in the solubility of random problem instances have proved useful in the study of problem difficulty for other classes of computational problems, notably SAT and CSP, and it has been shown that the hardest problems typically occur during this rapid transition. In this work, we perform the first empirical investigation of the phase transition phenomena for heuristic search. We establish the existence of a rapid transition in the solubility of an abstract model of heuristic search problems and show that, for greedy best first search, the hardest instances are associated with the phase transition region. We then perform a novel investigation of the behavior of heuristics of different strength across the solubility spectrum. Finally, we demonstrate that the behavior of our abstract model carries over to commonly used benchmark problems including the Pancake Problem, Grid Navigation, TopSpin, and the Towers of Hanoi. An interesting deviation is observed and explained in the Sliding Puzzle.

We investigate the potential of exhaustively exploring larger neighborhoods in local search algorithms for Minimum Vertex Cover. More precisely, we study whether, for moderate values of k, it is feasible and worthwhile to determine, given a graph G with vertex cover C, if there is a k-swap S such that (C ∖ S) ∪ (S ∖ C) is a smaller vertex cover of G. First, we describe an algorithm running in ∆O(k) ⋅ n time for searching the k-swap neighborhood on n-vertex graphs with maximum degree ∆. Then, we demonstrate that, by devising additional pruning rules that decrease the size of the search space, this algorithm can be implemented so that it solves the problem quickly for k ≈ 20. Finally, we show that it is worthwhile to consider moderately-sized k-swap neighborhoods. For our benchmark data set, we show that when combining our algorithm with a hill-climbing approach, the solution quality improves quickly with the radius k of the local search neighborhood and that in most cases optimal solutions can be found by setting k=21.

Metaheuristics have been developed to provide general purpose approaches for solving hard combinatorial problems. While these frameworks often serve as the starting point for the development of problem-specific search procedures, they very rarely work efficiently in their default state. We combine the ideas of reactive search, which adjusts key parameters during search, and algorithm configuration, which fine-tunes algorithm parameters for a given set of problem instances, for the automatic compilation of a portfolio of highly reactive dialectic search heuristics for MaxSAT. Even though the dialectic search metaheuristic knows nothing more about MaxSAT than how to evaluate the cost of a truth assignment, our automatically generated solver defines a new state of the art for random weighted partial MaxSAT instances. Moreover, when combined with an industrial MaxSAT solver, the self-assembled reactive portfolio was able to win four out of nine gold medals at the recent 2016 MaxSAT Evaluation on random, crafted, and industrial partial and weighted-partial MaxSAT instances.

The Minimum Sum Coloring Problem (MSCP) is an NP-Hard problem derived from the graph coloring problem (GCP) and has practical applications in different domains such as VLSI design, distributed resource allocation, and scheduling. There exist few exact solutions for MSCP, probably due to its search space much more elusive than that of GCP. On the contrary, much effort is spent in the literature to develop upper and lower bounds for MSCP. In this paper, we borrow a notion called motif, that was used in a recent work for upper bounding the minimum number of colors in an optimal solution of MSCP, to develop a new algebraic lower bound called for MSCP. Experiments on standard benchmarks for MSCP and GCP show that this new lower bound is substantially better than the existing lower bounds for several families of graphs.

Exact cover is the problem of finding subfamilies, S*, of a family of sets, S, over universe U, where S* forms a partition of U. It is a popular NP-hard problem appearing in a wide range of computer science studies. Knuth's algorithm DLX, a backtracking-based depth-first search implemented with the data structure called dancing links, is known as state-of-the-art for finding all exact covers. We propose a method to accelerate DLX. Our method constructs a Zero-suppressed Binary Decision Diagram (ZDD) that represents the set of solutions while running depth-first search in DLX. Constructing ZDDs enables the efficient use of memo cache to speed up the search. Moreover, our method has a virtue that it outputs ZDDs; we can perform several useful operations with them. Experiments confirm that the proposed method is up to several orders of magnitude faster than DLX.

Black-box domains where the successor states generated by applying an action are generated by a completely opaque simulator pose a challenge for domain-independent planning. The main computational bottleneck in search-based planning for such domains is the number of calls to the black-box simulation. We propose a method for significantly reducing the number of calls to the simulator by the search algorithm by detecting and pruning sequences of actions which are dominated by others. We apply our pruning method to Iterated Width and breadth-first search in domain-independent black-box planning for Atari 2600 games in the Arcade Learning Environment (ALE), adding our pruning method significantly improves upon the baseline algorithms.

Multi-objective (MO) optimization problems require simultaneously optimizing two or more objective functions. An MO algorithm needs to find solutions that reach different optimal balances of the objective functions, i.e., optimal Pareto front, therefore, high dimensionality of the solution space can hurt MO optimization much severer than single-objective optimization, which was little addressed in previous studies. This paper proposes a general, theoretically-grounded yet simple approach ReMO, which can scale current derivative-free MO algorithms to the high-dimensional non-convex MO functions with low effective dimensions, using random embedding. We prove the conditions under which an MO function has a low effective dimension, and for such functions, we prove that ReMO possesses the desirable properties of optimal Pareto front preservation, time complexity reduction, and rotation perturbation invariance. Experimental results indicate that ReMO is effective for optimizing the high-dimensional MO functions with low effective dimensions, and is even effective for the high-dimensional MO functions where all dimensions are effective but most only have a small and bounded effect on the function value.

One common pattern database compression technique is to merge adjacent database entries and store the minimum of merged entries to maintain heuristic admissibility. In this paper we propose a compression technique that preserves every entry, but reduces the number of bits used to store each entry, therefore limiting the values that can be represented. Even when this technique throws away low values in the heuristic, it can still have better performance than the traditional approach. We develop a theoretical basis for selecting which values to keep and show improved performance in both unidirectional and bidirectional search.

Grid pathfinding, an old AI problem, is central for the development of navigation systems for autonomous agents. A surprising fact about the vast literature on this problem is that very limited neighborhoods have been studied. Indeed, only the 4- and 8-neighborhoods are usually considered, and rarely the 16-neighborhood. This paper describes three contributions that enable the construction of effective grid path planners for extended 2k-neighborhoods. First, we provide a simple recursive definition of the 2k-neighborhood in terms of the 2k–1-neighborhood. Second, we derive distance functions, for any k >1, which allow us to propose admissible heurisitics which are perfect for obstacle-free grids. Third, we describe a canonical ordering which allows us to implement a version of A* whose performance scales well when increasing k. Our empirical evaluation shows that the heuristics we propose are superior to the Euclidean distance (ED) when regular A* is used. For grids beyond 64 the overhead of computing the heuristic yields decreased time performance compared to the ED. We found also that a configuration of our A*-based implementation, without canonical orders, is competitive with the "any-angle" path planner Theta$^*$ both in terms of solution quality and runtime.

Bounding the partition function is a key inference task in many graphical models. In this paper, we develop an anytime anyspace search algorithm taking advantage of AND/OR tree structure and optimized variational heuristics to tighten deterministic bounds on the partition function. We study how our priority-driven best-first search scheme can improve on state-of-the-art variational bounds in an anytime way within limited memory resources, as well as the effect of the AND/OR framework to exploit conditional independence structure within the search process within the context of summation. We compare our resulting bounds to a number of existing methods, and show that our approach offers a number of advantages on real-world problem instances taken from recent UAI competitions.

In recent years there has been rising interest in the use of programming-by-example techniques to assist users in data manipulation tasks. Such techniques rely on an explicit input-output examples specification from the user to automatically synthesize programs. However, in a wide range of data extraction tasks it is easy for a human observer to predict the desired extraction by just observing the input data itself. Such predictive intelligence has not yet been explored in program synthesis research, and is what we address in this work. We describe a predictive program synthesis algorithm that infers programs in a general form of extraction DSLs (domain specific languages) given input-only examples. We describe concrete instantiations of such DSLs and the synthesis algorithm in the two practical application domains of text extraction and web extraction, and present an evaluation of our technique on a range of extraction tasks encountered in practice.