Total: 34

The Traveling Salesperson Problem (TSP) is one of the best-known problems in computer science. The Euclidean TSP is a special case in which each node is identified by its coordinates on the plane and the Euclidean distance is used as cost function. Many works in the Constraint Programming (CP) literature addressed the TSP, and use as benchmark Euclidean instances; however the usual approach is to build a distance matrix from the points coordinates, and then address the problem as a TSP, disregarding the information carried by the points coordinates for constraint propagation. In this work, we propose to use geometric information, present in Euclidean TSP instances, to improve the filtering power. In order to have a declarative approach, we implemented the filtering algorithms in Constraint Logic Programming on Finite Domains (CLP(FD)).

The goal of constraint acquisition is to learn exactly a constraint network given access to an oracle that answers truthfully certain types of queries. In this paper we focus on partial membership queries and initiate a systematic investigation of the learning complexity of constraint languages. First, we use the notion of chain length to show that a wide class of languages can be learned with as few as O(n log(n)) queries. Then, we combine this result with generic lower bounds to derive a dichotomy in the learning complexity of binary languages. Finally, we identify a class of ternary languages that eludes our framework and hints at new research directions.

The efficiency of Conflict Driven Clause Learning (CDCL) SAT solving depends crucially on finding conflicts at a fast rate. State-of-the-art CDCL branching heuristics such as VSIDS, CHB and LRB conform to this goal. We take a closer look at the way in which conflicts are generated over the course of a CDCL SAT search. Our study of the VSIDS branching heuristic shows that conflicts are typically generated in short bursts, followed by what we call a conflict depression phase in which the search fails to generate any conflicts in a span of decisions. The lack of conflict indicates that the variables that are currently ranked highest by the branching heuristic fail to generate conflicts. Based on this analysis, we propose an exploration strategy, called expSAT, which randomly samples variable selection sequences in order to learn an updated heuristic from the generated conflicts. The goal is to escape from conflict depressions expeditiously. The branching heuristic deployed in expSAT combines these updates with the standard VSIDS activity scores. An extensive empirical evaluation with four state-of-the-art CDCL SAT solvers demonstrates good-to-strong performance gains with the expSAT approach.

Bi-objective optimisation aims to optimise two generally competing objective functions. Typically, it consists in computing the set of nondominated solutions, called the Pareto front. This raises two issues: 1) time complexity, as the Pareto front in general can be infinite for continuous problems and exponentially large for discrete problems, and 2) lack of decisiveness. This paper focusses on the computation of a small, “relevant” subset of the Pareto front called the representative set, which provides meaningful trade-offs between the two objectives. We introduce a procedure which, given a pre-computed Pareto front, computes a representative set in polynomial time, and then we show how to adapt it to the case where the Pareto front is not provided. This has three important consequences for computing the representative set: 1) does not require the whole Pareto front to be provided explicitly, 2) can be done in polynomial time for bi-objective mixed-integer linear programs, and 3) only requires a polynomial number of solver calls for bi-objective problems, as opposed to the case where a higher number of objectives is involved. We implement our algorithm and empirically illustrate the efficiency on two families of benchmarks.

We study the predict+optimise problem, where machine learning and combinatorial optimisation must interact to achieve a common goal. These problems are important when optimisation needs to be performed on input parameters that are not fully observed but must instead be estimated using machine learning. We provide a novel learning technique for predict+optimise to directly reason about the underlying combinatorial optimisation problem, offering a meaningful integration of machine learning and optimisation. This is done by representing the combinatorial problem as a piecewise linear function parameterised by the coefficients of the learning model and then iteratively performing coordinate descent on the learning coefficients. Our approach is applicable to linear learning functions and any optimisation problem solvable by dynamic programming. We illustrate the effectiveness of our approach on benchmarks from the literature.

Mixed Integer Programming (MIP) is one of the most widely used modeling techniques for combinatorial optimization problems. In many applications, a similar MIP model is solved on a regular basis, maintaining remarkable similarities in model structures and solution appearances but differing in formulation coefficients. This offers the opportunity for machine learning methods to explore the correlations between model structures and the resulting solution values. To address this issue, we propose to represent a MIP instance using a tripartite graph, based on which a Graph Convolutional Network (GCN) is constructed to predict solution values for binary variables. The predicted solutions are used to generate a local branching type cut which can be either treated as a global (invalid) inequality in the formulation resulting in a heuristic approach to solve the MIP, or as a root branching rule resulting in an exact approach. Computational evaluations on 8 distinct types of MIP problems show that the proposed framework improves the primal solution finding performance significantly on a state-of-the-art open-source MIP solver.

Submodular optimization plays a key role in many real-world problems. In many real-world scenarios, it is also necessary to handle uncertainty, and potentially disruptive events that violate constraints in stochastic settings need to be avoided. In this paper, we investigate submodular optimization problems with chance constraints. We provide a first analysis on the approximation behavior of popular greedy algorithms for submodular problems with chance constraints. Our results show that these algorithms are highly effective when using surrogate functions that estimate constraint violations based on Chernoff bounds. Furthermore, we investigate the behavior of the algorithms on popular social network problems and show that high quality solutions can still be obtained even if there are strong restrictions imposed by the chance constraint.

We present an algorithm to compute exact literal-weighted model counts of Boolean formulas in Conjunctive Normal Form. Our algorithm employs dynamic programming and uses Algebraic Decision Diagrams as the main data structure. We implement this technique in ADDMC, a new model counter. We empirically evaluate various heuristics that can be used with ADDMC. We then compare ADDMC to four state-of-the-art weighted model counters (Cachet, c2d, d4, and miniC2D) on 1914 standard model counting benchmarks and show that ADDMC significantly improves the virtual best solver.

Many optimisation problems are of an online—also called dynamic—nature, where new information is expected to arrive and the problem must be resolved in an ongoing fashion to (a) improve or revise previous decisions and (b) take new ones. Typically, building an online decision-making system requires substantial ad-hoc coding to ensure the offline version of the optimisation problem is continually adjusted and resolved. This paper defines a general framework for automatically solving online optimisation problems. This is achieved by extending a model of the offline optimisation problem, from which an online version is automatically constructed, thus requiring no further modelling effort. In doing so, it formalises many of the aspects that arise in online optimisation problems. The same framework can be applied for automatically creating sliding-window solving approaches for problems that have a large time horizon. Experiments show we can automatically create efficient online and sliding-window solutions to optimisation problems.

Constraint programming solvers support rich global constraints and propagators, which make them both powerful and hard to debug. In the Boolean satisfiability community, proof-logging is the standard solution for generating trustworthy outputs, and this has become key to the social acceptability of computer-generated proofs. However, reusing this technology for constraint programming requires either much weaker propagation, or an impractical blowup in proof length. This paper demonstrates that simple, clean, and efficient proof logging is still possible for the all-different constraint, through pseudo-Boolean reasoning. We explain how such proofs can be expressed and verified mechanistically, describe an implementation, and discuss the broader implications for proof logging in constraint programming.

Pseudo-Boolean solvers hold out the theoretical potential of exponential improvements over conflict-driven clause learning (CDCL) SAT solvers, but in practice perform very poorly if the input is given in the standard conjunctive normal form (CNF) format. We present a technique to remedy this problem by recovering cardinality constraints from CNF on the fly during search. This is done by collecting potential building blocks of cardinality constraints during propagation and combining these blocks during conflict analysis. Our implementation has a non-negligible but manageable overhead when detection is not successful, and yields significant gains for some SAT competition and crafted benchmarks for which pseudo-Boolean reasoning is stronger than CDCL. It also boosts performance for some native pseudo-Boolean formulas where this approach helps to improve learned constraints.

Machine learning components commonly appear in larger decision-making pipelines; however, the model training process typically focuses only on a loss that measures average accuracy between predicted values and ground truth values. Decision-focused learning explicitly integrates the downstream decision problem when training the predictive model, in order to optimize the quality of decisions induced by the predictions. It has been successfully applied to several limited combinatorial problem classes, such as those that can be expressed as linear programs (LP), and submodular optimization. However, these previous applications have uniformly focused on problems with simple constraints. Here, we enable decision-focused learning for the broad class of problems that can be encoded as a mixed integer linear program (MIP), hence supporting arbitrary linear constraints over discrete and continuous variables. We show how to differentiate through a MIP by employing a cutting planes solution approach, an algorithm that iteratively tightens the continuous relaxation by adding constraints removing fractional solutions. We evaluate our new end-to-end approach on several real world domains and show that it outperforms the standard two phase approaches that treat prediction and optimization separately, as well as a baseline approach of simply applying decision-focused learning to the LP relaxation of the MIP. Lastly, we demonstrate generalization performance in several transfer learning tasks.

In this paper, we consider the Parallel Machine Scheduling Problem with Additional Unit Resources, which consists in scheduling a set of n jobs on m parallel unrelated machines and subject to exactly one of r unit resources. This problem arises from the download of acquisitions from satellites to ground stations. We first introduce two baseline constraint models for this problem. Then, we build on an approximation algorithm for this problem, and we discuss about the efficiency of designing an improved constraint model based on these approximation results. In particular, we introduce new constraints that restrict search to executions of the approximation algorithm. Finally, we report experimental data demonstrating that this model significantly outperforms the two reference models.

Second-order optimization methods have desirable convergence properties. However, the exact Newton method requires expensive computation for the Hessian and its inverse. In this paper, we propose SPAN, a novel approximate and fast Newton method. SPAN computes the inverse of the Hessian matrix via low-rank approximation and stochastic Hessian-vector products. Our experiments on multiple benchmark datasets demonstrate that SPAN outperforms existing first-order and second-order optimization methods in terms of the convergence wall-clock time. Furthermore, we provide a theoretical analysis of the per-iteration complexity, the approximation error, and the convergence rate. Both the theoretical analysis and experimental results show that our proposed method achieves a better trade-off between the convergence rate and the per-iteration efficiency.

For many combinatorial problems, finding a single solution is not enough. This is clearly the case for multi-objective optimization problems, as they have no single “best solution” and, thus, it is useful to find a representation of the non-dominated solutions (the Pareto frontier). However, it also applies to single objective optimization problems, where one may be interested in finding several (close to) optimal solutions that illustrate some form of diversity. The same applies to satisfaction problems. This is because models usually idealize the problem in some way, and a diverse pool of solutions may provide a better choice with respect to considerations that are omitted or simplified in the model. This paper describes a general framework for finding k diverse solutions to a combinatorial problem (be it satisfaction, single-objective or multi-objective), various approaches to solve problems in the framework, their implementations, and an experimental evaluation of their practicality.

This paper introduces incremental symmetry breaking constraints for graph search problems which are complete and compact. We show that these constraints can be computed incrementally: A symmetry breaking constraint for order n graphs can be extended to one for order n + 1 graphs. Moreover, these constraints induce a special property on their canonical solutions: An order n canonical graph contains a canonical subgraph on the first k vertices for every 1 ≤ k ≤ n. This facilitates a “generate and extend” paradigm for parallel graph search problem solving: To solve a graph search problem φ on order n graphs, first generate the canonical graphs of some order k < n. Then, compute canonical solutions for φ by extending, in parallel, each canonical order k graph together with suitable symmetry breaking constraints. The contribution is that the proposed symmetry breaking constraints enable us to extend the order k canonical graphs to order n canonical solutions. We demonstrate our approach through its application on two hard graph search problems.

The reconstruction of the evolutionary tree of a set of species based on qualitative attributes is a central problem in phylogenetics. In the NP-hard perfect phylogeny problem the input is a set of taxa (species) and characters (attributes) on them, and the task is to find an evolutionary tree that describes the evolution of the taxa so that each character state evolves only once. However, in practical situations a perfect phylogeny rarely exists, motivating the maximum compatibility problem of finding the largest subset of characters admitting a perfect phylogeny. Various declarative approaches, based on applying integer programming (IP), answer set programming (ASP) and pseudo-Boolean optimization (PBO) solvers, have been proposed for maximum compatibility. In this work we develop a new hybrid approach to solving maximum compatibility for multi-state characters, making use of both declarative optimization techniques (specifically maximum satisfiability, MaxSAT) and an adaptation of the Bouchitt'e-Todinca approach to triangulation-based graph optimization problems. Empirically our approach outperforms in scalability the earlier proposed approaches w.r.t. various parameters underlying the problem.

The Boolean SATisfiability problem (SAT) is of central importance in computer science. Although SAT is known to be NP-complete, progress on the engineering side—especially that of Conflict-Driven Clause Learning (CDCL) and Local Search SAT solvers—has been remarkable. Yet, while SAT solvers, aimed at solving industrial-scale benchmarks in Conjunctive Normal Form (CNF), have become quite mature, SAT solvers that are effective on other types of constraints (e.g., cardinality constraints and XORs) are less well-studied; a general approach to handling non-CNF constraints is still lacking. In addition, previous work indicated that for specific classes of benchmarks, the running time of extant SAT solvers depends heavily on properties of the formula and details of encoding, instead of the scale of the benchmarks, which adds uncertainty to expectations of running time. To address the issues above, we design FourierSAT, an incomplete SAT solver based on Fourier analysis of Boolean functions, a technique to represent Boolean functions by multilinear polynomials. By such a reduction to continuous optimization, we propose an algebraic framework for solving systems consisting of different types of constraints. The idea is to leverage gradient information to guide the search process in the direction of local improvements. Empirical results demonstrate that FourierSAT is more robust than other solvers on certain classes of benchmarks.

The refutation power of SAT and MaxSAT resolution is challenged by problems like the soft and hard Pigeon Hole Problem PHP for which short refutations do not exist. In this paper we augment the MaxSAT resolution proof system with an extension rule. The new proof system MaxResE is sound and complete, and more powerful than plain MaxSAT resolution, since it can refute the soft and hard PHP in polynomial time. We show that MaxResE refutations actually subtract lower bounds from the objective function encoded by the formulas. The resulting formula is the residual after the lower bound extraction. We experimentally show that the residual of the soft PHP (once its necessary cost of 1 has been efficiently subtracted with MaxResE) is a concise, easy to solve, satisfiable problem.

The Set Covering Problem (SCP) and Dominating Set Problem (DSP) are NP-hard and have many real world applications. SCP and DSP can be encoded into Maximum Satisfiability (MaxSAT) naturally and the resulting instances share a special structure. In this paper, we develop an efficient local search solver for MaxSAT instances of this kind. Our algorithm contains three phrase: construction, local search and recovery. In construction phrase, we simplify the instance by three reduction rules and construct an initial solution by a greedy heuristic. The initial solution is improved during the local search phrase, which exploits the feature of such instances in the scoring function and the variable selection heuristic. Finally, the corresponding solution of original instance is recovered in the recovery phrase. Experiment results on a broad range of large scale instances of SCP and DSP show that our algorithm significantly outperforms state of the art solvers for SCP, DSP and MaxSAT.

Making good decisions at the top of a search tree is important for finding good solutions early in constraint optimization. In this paper, we propose a method employing frequent pattern mining (FPM), a classic datamining technique, to find good subtrees for solving constraint optimization problems. We demonstrate that applying FPM in a small number of random high-quality feasible solutions enables us to identify subtrees containing optimal solutions in more than 55% of problem instances for four real world benchmark problems. The method works as a plugin that can be combined with any search strategy for branch-and-bound search. Exploring the identified subtrees first, the method brings substantial improvements for four efficient search strategies in both total runtime and runtime of finding optimal solutions.

We propose a hard thresholding method based on stochastically controlled stochastic gradients (SCSG-HT) to solve a family of sparsity-constrained empirical risk minimization problems. The SCSG-HT uses batch gradients where batch size is pre-determined by the desirable precision tolerance rather than full gradients to reduce the variance in stochastic gradients. It also employs the geometric distribution to determine the number of loops per epoch. We prove that, similar to the latest methods based on stochastic gradient descent or stochastic variance reduction methods, SCSG-HT enjoys a linear convergence rate. However, SCSG-HT now has a strong guarantee to recover the optimal sparse estimator. The computational complexity of SCSG-HT is independent of sample size n when n is larger than 1/ε, which enhances the scalability to massive-scale problems. Empirical results demonstrate that SCSG-HT outperforms several competitors and decreases the objective value the most with the same computational costs.

In this paper, we introduce a new optimization approach to Entity Resolution. Traditional approaches tackle entity resolution with hierarchical clustering, which does not benefit from a formal optimization formulation. In contrast, we model entity resolution as correlation-clustering, which we treat as a weighted set-packing problem and write as an integer linear program (ILP). In this case, sources in the input data correspond to elements and entities in output data correspond to sets/clusters. We tackle optimization of weighted set packing by relaxing integrality in our ILP formulation. The set of potential sets/clusters can not be explicitly enumerated, thus motivating optimization via column generation. In addition to the novel formulation, we also introduce new dual optimal inequalities (DOI), that we call flexible dual optimal inequalities, which tightly lower-bound dual variables during optimization and accelerate column generation. We apply our formulation to entity resolution (also called de-duplication of records), and achieve state-of-the-art accuracy on two popular benchmark datasets. Our F-DOI can be extended to other weighted set-packing problems.

Combinatorial optimization assumes that all parameters of the optimization problem, e.g. the weights in the objective function, are fixed. Often, these weights are mere estimates and increasingly machine learning techniques are used to for their estimation. Recently, Smart Predict and Optimize (SPO) has been proposed for problems with a linear objective function over the predictions, more specifically linear programming problems. It takes the regret of the predictions on the linear problem into account, by repeatedly solving it during learning. We investigate the use of SPO to solve more realistic discrete optimization problems. The main challenge is the repeated solving of the optimization problem. To this end, we investigate ways to relax the problem as well as warm-starting the learning and the solving. Our results show that even for discrete problems it often suffices to train by solving the relaxation in the SPO loss. Furthermore, this approach outperforms the state-of-the-art approach of Wilder, Dilkina, and Tambe. We experiment with weighted knapsack problems as well as complex scheduling problems, and show for the first time that a predict-and-optimize approach can successfully be used on large-scale combinatorial optimization problems.

Programming-by-example (PBE) is a synthesis paradigm that allows users to generate functions by simply providing input-output examples. While a promising interaction paradigm, synthesis is still too slow for realtime interaction and more widespread adoption. Existing approaches to PBE synthesis have used automated reasoning tools, such as SMT solvers, as well as works applying machine learning techniques. At its core, the automated reasoning approach relies on highly domain specific knowledge of programming languages. On the other hand, the machine learning approaches utilize the fact that when working with program code, it is possible to generate arbitrarily large training datasets. In this work, we propose a system for using machine learning in tandem with automated reasoning techniques to solve Syntax Guided Synthesis (SyGuS) style PBE problems. By preprocessing SyGuS PBE problems with a neural network, we can use a data driven approach to reduce the size of the search space, then allow automated reasoning-based solvers to more quickly find a solution analytically. Our system is able to run atop existing SyGuS PBE synthesis tools, decreasing the runtime of the winner of the 2019 SyGuS Competition for the PBE Strings track by 47.65% to outperform all of the competing tools.