Total: 22

Using a regular language as a pattern for string matching is nowadays a common -and sometimes unsafe- operation, provided as a built-in feature by most programming languages. A proper constraint solver over string variables should support most of the operations over regular expressions and related constructs. However, state-of-the-art string solvers natively support only the membership relation of a string variable to a regular language. Here we take a step forward by defining a specialised propagator for the match operation, returning the leftmost position where a pattern can match a given string. Empirical evidences show the effectiveness of our approach, implemented within the constraint programming framework, and tested against state-of-the-art string solvers.

Constraint acquisition is the task of learning a constraint network from examples of solutions and non-solutions. Existing constraint acquisition systems typically require advance knowledge of the target network's constraint language, which significantly narrows their scope of applicability. In this paper we propose a constraint acquisition method that computes a suitable constraint language as part of the learning process, eliminating the need for any advance knowledge. We report preliminary experiments on various acquisition benchmarks.

We propose a concise function representation based on deterministic finite state automata for exact most probable explanation and constrained optimization tasks in graphical models. We then exploit our concise representation within Bucket Elimination (BE). We denote our version of BE as FABE. FABE significantly improves the performance of BE in terms of runtime and memory requirements by minimizing redundancy. Indeed, results on most probable explanation and weighted constraint satisfaction benchmarks show that FABE often outperforms the state of the art, leading to significant runtime improvements (up to 2 orders of magnitude in our tests).

This paper introduces a new approach to generating strongly constrained texts. We consider standardized sentence generation for the typical application of vision screening. To solve this problem, we formalize it as a discrete combinatorial optimization problem and utilize multivalued decision diagrams (MDD), a well-known data structure to deal with constraints. In our context, one key strength of MDD is to compute an exhaustive set of solutions without performing any search. Once the sentences are obtained, we apply a language model (GPT-2) to keep the best ones. We detail this for English and also for French where the agreement and conjugation rules are known to be more complex. Finally, with the help of GPT-2, we get hundreds of bona-fide candidate sentences. When compared with the few dozen sentences usually available in the well-known vision screening test (MNREAD), this brings a major breakthrough in the field of standardized sentence generation. Also, as it can be easily adapted for other languages, it has the potential to make the MNREAD test even more valuable and usable. More generally, this paper highlights MDD as a convincing alternative for constrained text generation, especially when the constraints are hard to satisfy, but also for many other prospects.

Combinatorial optimisation has numerous practical applications, such as planning, logistics, or circuit design. Problems such as these can be solved by approaches such as Boolean Satisfiability (SAT) or Constraint Programming (CP). Solver performance is affected significantly by the model chosen to represent a given problem, which has led to the study of model reformulation. One such method is tabulation: rewriting the expression of some of the model constraints in terms of a single “table” constraint. Successfully applying this process means identifying expressions amenable to trans- formation, which has typically been done manually. Recent work introduced an automatic tabulation using a set of hand-designed heuristics to identify constraints to tabulate. However, the performance of these heuristics varies across problem classes and solvers. Recent work has shown learning techniques to be increasingly useful in the context of automatic model reformulation. The goal of this study is to understand whether it is possible to improve the performance of such heuristics, by learning a model to predict whether or not to activate them for a given instance. Experimental results suggest that a random forest classifier is the most robust choice, improving the performance of four different SAT and CP solvers.

Partially ordered models of time occur naturally in applications where agents/processes cannot perfectly communicate with each other, and can be traced back to the seminal work of Lamport. In this paper we consider the problem of deciding if a (likely incomplete) description of a system of events is consistent, the network consistency problem for the point algebra of partially ordered time (POT). While the classical complexity of this problem has been fully settled, comparably little is known of the fine-grained complexity of POT except that it can be solved in O*((0.368n)^n) time by enumerating ordered partitions. We construct a much faster algorithm with a run-time bounded by O*((0.26n)^n), which, e.g., is roughly 1000 times faster than the naive enumeration algorithm in a problem with 20 events. This is achieved by a sophisticated enumeration of structures similar to total orders, which are then greedily expanded toward a solution. While similar ideas have been explored earlier for related problems it turns out that the analysis for POT is non-trivial and requires significant new ideas.

Allen's interval algebra is one of the most well-known calculi in qualitative temporal reasoning with numerous applications in artificial intelligence. Very recently, there has been a surge of improvements in the fine-grained complexity of NP-hard reasoning tasks in this algebra, which has improved the running time from the naive 2^O(n^2) to O*((1.0615n)^n), and even faster algorithms are known for unit intervals and the case when we a bounded number of overlapping intervals. Despite these improvements the best known lower bound is still only 2^o(n) under the exponential-time hypothesis and major improvements in either direction seemingly require fundamental advances in computational complexity. In this paper we propose a novel framework for solving NP-hard qualitative reasoning problems which we refer to as dynamic programming with sublinear partitioning. Using this technique we obtain a major improvement of O*((cn/log(n))^n) for Allen's interval algebra. To demonstrate that the technique is applicable to further problem domains we apply it to a problem in qualitative spatial reasoning, the cardinal direction calculus, and solve it in O*((cn/log(n))^(2n/3)) time. Hence, not only do we significantly advance the state-of-the-art for NP-hard qualitative reasoning problems, but obtain a novel algorithmic technique that is likely applicable to many problems where 2^O(n) time algorithms are unlikely.

This paper addresses the weighted vertex coloring problem (WVCP) which is an NP-hard variant of the graph coloring problem with various applications. Given a vertex-weighted graph, the problem consists of partitioning vertices in independent sets (colors) so as to minimize the sum of the maximum weights of the colors. We first present an iterative procedure to reduce the size of WVCP instances and prove new upper bounds on the objective value and the number of colors. Alternative constraint programming models are then introduced which rely on primal and dual encodings of the problem and use symmetry breaking constraints. A large number of experiments are conducted on benchmark instances. We analyze the impact of using specific bounds to reduce the search space and speed up the exact resolution of instances. New optimality proofs are reported for some benchmark instances.

Two of the most central algorithmic paradigms implemented in practical solvers for maximum satisfiability (MaxSAT) and other related declarative paradigms for NP-hard combinatorial optimization are the core-guided (CG) and implicit hitting set (IHS) approaches. We develop a general unifying algorithmic framework, based on the recent notion of abstract cores, that captures both CG and IHS computations. The framework offers a unified way of establishing the correctness of variants of the approaches, and can be instantiated in novel ways giving rise to new algorithmic variants of the core-guided and IHS approaches. We illustrate the latter aspect by developing a prototype implementation of an algorithm variant for MaxSAT based on the framework.

We present a new SAT-based method for generating all graphs up to isomorphism that satisfy a given co-NP property. Our method extends the SAT Modulo Symmetry (SMS) framework with a technique that we call co-certificate learning. If SMS generates a candidate graph that violates the given co-NP property, we obtain a certificate for this violation, i.e., `co-certificate' for the co-NP property. The co-certificate gives rise to a clause that the SAT solver, serving as SMS's backend, learns as part of its CDCL procedure. We demonstrate that SMS plus co-certificate learning is a powerful method that allows us to improve the best-known lower bound on the size of Kochen-Specker vector systems, a problem that is central to the foundations of quantum mechanics and has been studied for over half a century. Our approach is orders of magnitude faster and scales significantly better than a recently proposed SAT-based method.

Model selection is a strategy aimed at creating accurate and robust models by identifying the optimal model for classifying any particular input sample. This paper proposes a novel framework for differentiable selection of groups of models by integrating machine learning and combinatorial optimization. The framework is tailored for ensemble learning with a strategy that learns to combine the predictions of appropriately selected pre-trained ensemble models. It does so by modeling the ensemble learning task as a differentiable selection program trained end-to-end over a pretrained ensemble to optimize task performance. The proposed framework demonstrates its versatility and effectiveness, outperforming conventional and advanced consensus rules across a variety of classification tasks.

The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks. A central challenge in this setting is backpropagation through the solution of an optimization problem, which typically lacks a closed form. One typical strategy is algorithm unrolling, which relies on automatic differentiation through the operations of an iterative solver. While flexible and general, unrolling can encounter accuracy and efficiency issues in practice. These issues can be avoided by analytical differentiation of the optimization, but current frameworks impose rigid requirements on the optimization problem's form. This paper provides theoretical insights into the backward pass of unrolled optimization, leading to a system for generating efficiently solvable analytical models of backpropagation. Additionally, it proposes a unifying view of unrolling and analytical differentiation through optimization mappings. Experiments over various model-based learning tasks demonstrate the advantages of the approach both computationally and in terms of enhanced expressiveness.

An important problem in network science is finding an optimal placement of sensors in nodes in order to uniquely detect failures in the network. This problem can be modelled as an identifying code set (ICS) problem, introduced by Karpovsky et al. in 1998. The ICS problem aims to find a cover of a set S, such that the elements in the cover define a unique signature for each of the elements of S, and to minimise the cover’s cardinality. In this work, we study a generalised identifying code set (GICS) problem, where a unique signature must be found for each subset of S that has a cardinality of at most k (instead of just each element of S). The concept of an independent support of a Boolean formula was introduced by Chakraborty et al. in 2014 to speed up propositional model counting, by identifying a subset of variables whose truth assignments uniquely define those of the other variables. In this work, we introduce an extended version of independent support, grouped independent support (GIS), and show how to reduce the GICS problem to the GIS problem. We then propose a new solving method for finding a GICS, based on finding a GIS. We show that the prior state-of-the-art approaches yield integer-linear programming (ILP) models whose sizes grow exponentially with the problem size and k, while our GIS encoding only grows polynomially with the problem size and k. While the ILP approach can solve the GICS problem on networks of at most 494 nodes, the GIS-based method can handle networks of up to 21 363 nodes; a ∼40× improvement. The GIS-based method shows up to a 520× improvement on the ILP-based method in terms of median solving time. For the majority of the instances that can be encoded and solved by both methods, the cardinality of the solution returned by the GIS-based method is less than 10% larger than the cardinality of the solution found by the ILP method.

Bounded Variable Elimination (BVE) is an important Boolean formula simplification technique in which the variable ordering is crucial. We define a new variable ordering based on variable activity, called ESA (variable Elimination Scheduled by Activity), for in-processing BVE in Conflict-Driven Clause Learning (CDCL) SAT solvers, and incorporate it into several state-of-the-art CDCL SAT solvers. Experimental results show that the new ESA ordering consistently makes these solvers solve more instances on the benchmark set including all the 5675 instances used in the Crafted, Application and Main tracks of all SAT Competitions up to 2022. In particular, one of these solvers with ESA, Kissat_MAB_ESA, won the Anniversary track of the SAT Competition 2022. The behaviour of ESA and the reason of its effectiveness are also analyzed.

The generalized arc consistency (GAC) algorithm is the prevailing solution for alldifferent constraint problems. The core part of GAC for alldifferent constraints is excavating and enumerating all the strongly connected components (SCCs) of the graph model. This causes a large amount of complex data structures to maintain the node information, leading to a large overhead both in time and memory space. More critically, the complexity of the data structures further precludes the coordination of different optimization schemes for GAC. To solve this problem, the key observation of this paper is that the GAC algorithm only cares whether a node of the graph model is in an SCC or not, rather than which SCCs it belongs to. Based on this observation, we propose AllDiffbit, which employs bitwise data structures and operations to efficiently determine if a node is in an SCC. This greatly reduces the corresponding overhead, and enhances the ability to incorporate existing optimizations to work in a synergistic way. Our experiments show that AllDiffbit outperforms the state-of-the-art GAC algorithms over 60%.

A distributed constraint optimization problem (DCOP) is a framework to model multi-agent coordination problems. Asynchronous distributed optimization (ADOPT) is a well-known complete DCOP algorithm, and owing to its superior characteristics, many variants have been proposed over the last decade. It is considered proven that ADOPT-based algorithms have the key properties of termination and optimality, which guarantee that the algorithms terminate in a finite time and obtain an optimal solution, respectively. In this paper, we present counterexamples to the termination and optimality of ADOPT-based algorithms. The flaws are classified into three types, at least one of which exists in each of ADOPT and seven of its variants that we analyzed. In other words, the algorithms may potentially not terminate or terminate with a suboptimal solution. We also propose an amended version of ADOPT that avoids the flaws in existing algorithms and prove that it has the properties of termination and optimality.

We present fast algorithms for the general CNF satisfiability problem (SAT) with running-time bound O*({c_d}^n), where c_d is a function of the maximum occurrence d of variables (d can also be the average occurrence when each variable appears at least twice), and n is the number of variables in the input formula. Similar to SAT with bounded clause lengths, SAT with bounded occurrences of variables has also been extensively studied in the literature. Especially, the running-time bounds for small values of d, such as d=3 and d=4, have become bottlenecks for algorithms evaluated by the formula length L and other algorithms. In this paper, we show that SAT can be solved in time O*(1.1238^n) for d=3 and O*(1.2628^n) for d=4, improving the previous results O*(1.1279^n) and O*(1.2721^n) obtained by Wahlström (SAT 2005) nearly 20 years ago. For d>=5, we obtain a running time bound of O*(1.0641^{dn}), implying a bound of O*(1.0641^L) with respect to the formula length L, which is also a slight improvement over the previous bound.

The graph width-measure twin-width recently attracted great attention because of its solving power and generality. Many prominent NP-hard problems are tractable on graphs of bounded twin-width if a certificate for the twin-width bound is provided as an input. Bounded twin-width subsumes other prominent structural restrictions such as bounded treewidth and bounded rank-width. Computing such a certificate is NP-hard itself, already for twin-width 4, and the only known implemented algorithm for twin-width computation is based on a SAT encoding. In this paper, we propose two new algorithmic approaches for computing twin-width that significantly improve the state of the art. Firstly, we develop a SAT encoding that is far more compact than the known encoding and consequently scales to larger graphs. Secondly, we propose a new Branch & Bound algorithm for twin-width that, on many graphs, is significantly faster than the SAT encoding. It utilizes a sophisticated caching system for partial solutions. Both algorithmic approaches are based on new conceptual insights into twin-width computation, including the reordering of contractions.

Constrained clustering is a semi-supervised task that employs a limited amount of labelled data, formulated as constraints, to incorporate domain-specific knowledge and to significantly improve clustering accuracy. Previous work has considered exact optimization formulations that can guarantee optimal clustering while satisfying all constraints, however these approaches lack interpretability. Recently, decision trees have been used to produce inherently interpretable clustering solutions, however existing approaches do not support clustering constraints and do not provide strong theoretical guarantees on solution quality. In this work, we present a novel SAT-based framework for interpretable clustering that supports clustering constraints and that also provides strong theoretical guarantees on solution quality. We also present new insight into the trade-off between interpretability and satisfaction of such user-provided constraints. Our framework is the first approach for interpretable and constrained clustering. Experiments with a range of real-world and synthetic datasets demonstrate that our approach can produce high-quality and interpretable constrained clustering solutions.

Model counting is a fundamental problem with many practical applications, including query evaluation in probabilistic databases and failure-probability estimation of networks. In this work, we focus on a variant of this problem where the underlying formula is expressed in Disjunctive Normal Form (DNF), also known as #DNF. This problem has been shown to be #P-complete, making it intractable to solve exactly. Much research has therefore been focused on obtaining approximate solutions, particularly in the form of (epsilon, delta) approximations. The primary contribution of this paper is a new approach, called pepin, to approximate #DNF counting that achieves (nearly) optimal time complexity and outperforms existing FPRAS. Our approach is based on the recent breakthrough in the context of union of sets in streaming. We demonstrate the effectiveness of our approach through extensive experiments and show that it provides an affirmative answer to the challenge of efficiently computing #DNF.

Determining the satisfiability of Boolean constraint-satisfaction problems with different types of constraints, that is hybrid constraints, is a well-studied problem with important applications. We study a new application of hybrid Boolean constraints, which arises in quantum computing. The problem relates to constrained perfect matching in edge-colored graphs. While general-purpose hybrid constraint solvers can be powerful, we show that direct encodings of the constrained-matching problem as hybrid constraints scale poorly and special techniques are still needed. We propose a novel encoding based on Tutte's Theorem in graph theory as well as optimization techniques. Empirical results demonstrate that our encoding, in suitable languages with advanced SAT solvers, scales significantly better than a number of competing approaches on constrained-matching benchmarks. Our study identifies the necessity of designing problem-specific encodings when applying powerful general-purpose constraint solvers.

AllDifferent constraint is widely used in Constraint Programming to model real world problems. Existing Generalized Arc Consistency (GAC) algorithms map an AllDifferent constraint onto a bipartite graph and utilize the structure of Strongly Connected Components (SCCs) in the graph to filter values. Calculating SCCs is time-consuming in the existing algorithms, so we propose a novel GAC algorithm for AllDifferent constraint in this paper, which eliminates the computation of SCCs. We prove that all redundant edges in the bipartite graph point to some alternating cycles. Our algorithm exploits this property and uses a more efficient method to filter values, which is based on breadth-first search. Experimental results on the XCSP3 benchmark suite show that our algorithm considerably outperforms the state-of-the-art GAC algorithms.