AAAI.2018 - Heuristic Search and Optimization

| Total: 19

#1 Efficiently Monitoring Small Data Modification Effect for Large-Scale Learning in Changing Environment [PDF] [Copy] [Kimi] [REL]

Authors: Hiroyuki Hanada, Atsushi Shibagaki, Jun Sakuma, Ichiro Takeuchi

We study large-scale machine learning problems in changing environments where a small part of the dataset is modified, and the effect of the data modification must be monitored in order to know how much the modification changes the optimal model. When the entire dataset is large, even if the amount of the data modification is fairly small, the computational cost for re-training the model would be prohibitively large. In this paper, we propose a novel method, called the optimal solution bounding (OSB), for monitoring such a data modification effect on the optimal model by efficiently evaluating (without actually re-training) it. The proposed method provides bounds on the unknown optimal model with the cost proportional only to the size of the data modification.


#2 On the Time and Space Complexity of Genetic Programming for Evolving Boolean Conjunctions [PDF] [Copy] [Kimi] [REL]

Authors: Andrei Lissovoi, Pietro Oliveto

Genetic Programming (GP) is a general purpose bio-inspired meta-heuristic for the evolution of computer programs. In contrast to the several successful applications, there is little understanding of the working principles behind GP. In this paper we present a performance analysis that sheds light on the behaviour of simple GP systems for evolving conjunctions of n variables (AND_n). The analysis of a random local search GP system with minimal terminal and function sets reveals the relationship between the number of iterations and the expected error of the evolved program on the complete training set. Afterwards we consider a more realistic GP system equipped with a global mutation operator and prove that it can efficiently solve AND_n by producing programs of linear size that fit a training set to optimality and with high probability generalise well. Additionally, we consider more general problems which extend the terminal set with undesired variables or negated variables. In the presence of undesired variables, we prove that, if non-strict selection is used, then the algorithm fits the complete training set efficiently while the strict selection algorithm may fail with high probability unless the substitution operator is switched off. In the presence of negations, we show that while the algorithms fail to fit the complete training set, the constructed solutions generalise well. Finally, from a problem hardness perspective, we reveal the existence of small training sets that allow the evolution of the exact conjunctions even in the presence of negations or of undesired variables.


#3 Locality Preserving Projection Based on F-norm [PDF] [Copy] [Kimi] [REL]

Authors: Xiangjie Hu, Yanfeng Sun, Junbin Gao, Yongli Hu, Baocai Yin

Locality preserving projection (LPP) is a well-known method for dimensionality reduction in which the neighborhood graph structure of data is preserved. Traditional LPP employ squared F-norm for distance measurement. This may exaggerate more distance errors, and result in a model being sensitive to outliers. In order to deal with this issue, we propose two novel F-norm-based models, termed as F-LPP and F-2DLPP, which are developed for vector-based and matrix-based data, respectively. In F-LPP and F-2DLPP, the distance of data projected to a low dimensional space is measured by F-norm. Thus it is anticipated that both methods can reduce the influence of outliers. To solve the F-norm-based models, we propose an iterative optimization algorithm, and give the convergence analysis of algorithm. The experimental results on three public databases have demonstrated the effectiveness of our proposed methods.


#4 Exact Clustering via Integer Programming and Maximum Satisfiability [PDF] [Copy] [Kimi] [REL]

Authors: Atsushi Miyauchi, Tomohiro Sonobe, Noriyoshi Sukegawa

We consider the following general graph clustering problem: given a complete undirected graph G=(V,E,c) with an edge weight function c:E->Q, we are asked to find a partition C of V that maximizes the sum of edge weights within the clusters in C. Owing to its high generality, this problem has a wide variety of real-world applications, including correlation clustering, group technology, and community detection. In this study, we investigate the design of mathematical programming formulations and constraint satisfaction formulations for the problem. First, we present a novel integer linear programming (ILP) formulation that has far fewer constraints than the standard ILP formulation by Groetschel and Wakabayashi (1989). Second, we propose an ILP-based exact algorithm that solves an ILP problem obtained by modifying our above ILP formulation and then performs simple post-processing to produce an optimal solution to the original problem. Third, we present maximum satisfiability (MaxSAT) counterparts of both our ILP formulation and ILP-based exact algorithm. Computational experiments using well-known real-world datasets demonstrate that our ILP-based approaches and their MaxSAT counterparts are highly effective in terms of both memory efficiency and computation time.


#5 Submodular Function Maximization Over Graphs via Zero-Suppressed Binary Decision Diagrams [PDF] [Copy] [Kimi] [REL]

Authors: Shinsaku Sakaue, Masaaki Nishino, Norihito Yasuda

Submodular function maximization (SFM) has attracted much attention thanks to its applicability to various practical problems. Although most studies have considered SFM with size or budget constraints, more complex constraints often appear in practice. In this paper, we consider a very general class of SFM with such complex constraints (e.g., an s-t path constraint on a given graph). We propose a novel algorithm that takes advantage of zero-suppressed binary decision diagrams, which store all feasible solutions efficiently thus enabling us to circumvent the difficulty of determining feasibility. Theoretically, our algorithm is guaranteed to achieve (1-c)-approximations, where c is the curvature of a submodular function. Experiments show that our algorithm runs much faster than exact algorithms and finds better solutions than those obtained by an existing approximation algorithm in many instances. Notably, our algorithm achieves better than a 90%-approximation in all instances for which optimal values are available.


#6 Accelerated Best-First Search With Upper-Bound Computation for Submodular Function Maximization [PDF] [Copy] [Kimi] [REL]

Authors: Shinsaku Sakaue, Masakazu Ishihata

Submodular maximization continues to be an attractive subject of study thanks to its applicability to many real-world problems. Although greedy-based methods are guaranteed to find (1-1/e)-approximate solutions for monotone submodular maximization, many applications require solutions with better approximation guarantees; moreover, it is desirable to be able to control the trade-off between the computation time and approximation guarantee. Given this background, the best-first search (BFS) has been recently studied as a promising approach. However, existing BFS-based methods for submodular maximization sometimes suffer excessive computation cost since their heuristic functions are not well designed. In this paper, we propose an accelerated BFS for monotone submodular maximization with a knapsack constraint. The acceleration is attained by introducing a new termination condition and developing a novel method for computing an upper-bound of the optimal value for submodular maximization, which enables us to use a better heuristic function. Experiments show that our accelerated BFS is far more efficient in terms of both time and space complexities than existing methods.


#7 Large Scale Constrained Linear Regression Revisited: Faster Algorithms via Preconditioning [PDF] [Copy] [Kimi] [REL]

Authors: Di Wang, Jinhui Xu

In this paper, we revisit the large-scale constrained linear regression problem and propose faster methods based on some recent developments in sketching and optimization. Our algorithms combine (accelerated) mini-batch SGD with a new method called two-step preconditioning to achieve an approximate solution with a time complexity lower than that of the state-of-the-art techniques for the low precision case. Our idea can also be extended to the high precision case, which gives an alternative implementation to the Iterative Hessian Sketch (IHS) method with significantly improved time complexity. Experiments on benchmark and synthetic datasets suggest that our methods indeed outperform existing ones considerably in both the low and high precision cases.


#8 Proximal Alternating Direction Network: A Globally Converged Deep Unrolling Framework [PDF] [Copy] [Kimi] [REL]

Authors: Risheng Liu, Xin Fan, Shichao Cheng, Xiangyu Wang, Zhongxuan Luo

Deep learning models have gained great success in many real-world applications. However, most existing networks are typically designed in heuristic manners, thus lack of rigorous mathematical principles and derivations. Several recent studies build deep structures by unrolling a particular optimization model that involves task information. Unfortunately, due to the dynamic nature of network parameters, their resultant deep propagation networks do not possess the nice convergence property as the original optimization scheme does. This paper provides a novel proximal unrolling framework to establish deep models by integrating experimentally verified network architectures and rich cues of the tasks. More importantly,we prove in theory that 1) the propagation generated by our unrolled deep model globally converges to a critical-point of a given variational energy, and 2) the proposed framework is still able to learn priors from training data to generate a convergent propagation even when task information is only partially available. Indeed, these theoretical results are the best we can ask for, unless stronger assumptions are enforced. Extensive experiments on various real-world applications verify the theoretical convergence and demonstrate the effectiveness of designed deep models.


#9 On Multiset Selection With Size Constraints [PDF] [Copy] [Kimi] [REL]

Authors: Chao Qian, Yibo Zhang, Ke Tang, Xin Yao

This paper considers the multiset selection problem with size constraints, which arises in many real-world applications such as budget allocation. Previous studies required the objective function f to be submodular, while we relax this assumption by introducing the notion of the submodularity ratios (denoted by α_f and β_f). We propose an anytime randomized iterative approach POMS, which maximizes the given objective f and minimizes the multiset size simultaneously. We prove that POMS using a reasonable time achieves an approximation guarantee of max{1-1/e^(β_f), (α_f/2)(1-1/e^(α_f))}. Particularly, when f is submdoular, this bound is at least as good as that of the previous greedy-style algorithms. In addition, we give lower bounds on the submodularity ratio for the objectives of budget allocation. Experimental results on budget allocation as well as a more complex application, namely, generalized influence maximization, exhibit the superior performance of the proposed approach.


#10 A Recursive Scenario Decomposition Algorithm for Combinatorial Multistage Stochastic Optimisation Problems [PDF] [Copy] [Kimi] [REL]

Authors: David Hemmi, Guido Tack, Mark Wallace

Stochastic programming is concerned with decision making under uncertainty, seeking an optimal policy with respect to a set of possible future scenarios. This paper looks at multistage decision problems where the uncertainty is revealed over time. First, decisions are made with respect to all possible future scenarios. Secondly, after observing the random variables, a set of scenario specific decisions is taken. Our goal is to develop algorithms that can be used as a back-end solver for high-level modeling languages. In this paper we propose a scenario decomposition method to solve multistage stochastic combinatorial decision problems recursively. Our approach is applicable to general problem structures, utilizes standard solving technology and is highly parallelizable. We provide experimental results to show how it efficiently solves benchmarks with hundreds of scenarios.


#11 Revisiting Immediate Duplicate Detection in External Memory Search [PDF] [Copy] [Kimi] [REL]

Authors: Shunji Lin, Alex Fukunaga

External memory search algorithms store the open and closed lists in secondary memory (e.g., hard disks) to augment limited internal memory. To minimize expensive random access in hard disks, these algorithms typically employ delayed duplicate detection (DDD), at the expense of processing more nodes than algorithms using immediate duplicate detection (IDD). Given the recent ubiquity of solid state drives (SSDs), we revisit the use of IDD in external memory search. We propose segmented compression, an improved IDD method that significantly reduces the number of false positive access into secondary memory. We show that A*-IDD, an external search variant of A* that uses segmented compression-based IDD, significantly improves upon previous open-addressing based IDD. We also show that A*-IDD can outperform DDD-based A* on some domains in domain-independent planning.


#12 A Two-Stage MaxSAT Reasoning Approach for the Maximum Weight Clique Problem [PDF] [Copy] [Kimi] [REL]

Authors: Hua Jiang, Chu-Min Li, Yanli Liu, Felip Manyà

MaxSAT reasoning is an effective technology used in modern branch-and-bound (BnB) algorithms for the Maximum Weight Clique problem (MWC) to reduce the search space. However, the current MaxSAT reasoning approach for MWC is carried out in a blind manner and is not guided by any relevant strategy. In this paper, we describe a new BnB algorithm for MWC that incorporates a novel two-stage MaxSAT reasoning approach. In each stage, the MaxSAT reasoning is specialised and guided for different tasks. Experiments on an extensive set of graphs show that the new algorithm implementing this approach significantly outperforms relevant exact and heuristic MWC algorithms in both small/medium and massive real-world graphs.


#13 Counting Linear Extensions in Practice: MCMC Versus Exponential Monte Carlo [PDF] [Copy] [Kimi] [REL]

Authors: Topi Talvitie, Kustaa Kangas, Teppo Niinimäki, Mikko Koivisto

Counting the linear extensions of a given partial order is a #P-complete problem that arises in numerous applications. For polynomial-time approximation, several Markov chain Monte Carlo schemes have been proposed; however, little is known of their efficiency in practice. This work presents an empirical evaluation of the state-of-the-art schemes and investigates a number of ideas to enhance their performance. In addition, we introduce a novel approximation scheme, adaptive relaxation Monte Carlo (ARMC), that leverages exact exponential-time counting algorithms. We show that approximate counting is feasible up to a few hundred elements on various classes of partial orders, and within this range ARMC typically outperforms the other schemes.


#14 Streaming Non-Monotone Submodular Maximization: Personalized Video Summarization on the Fly [PDF] [Copy] [Kimi] [REL]

Authors: Baharan Mirzasoleiman, Stefanie Jegelka, Andreas Krause

The need for real time analysis of rapidly producing data streams (e.g., video and image streams) motivated the design of streaming algorithms that can efficiently extract and summarize useful information from massive data "on the fly." Such problems can often be reduced to maximizing a submodular set function subject to various constraints. While efficient streaming methods have been recently developed for monotone submodular maximization, in a wide range of applications, such as video summarization, the underlying utility function is non-monotone, and there are often various constraints imposed on the optimization problem to consider privacy or personalization. We develop the first efficient single pass streaming algorithm, Streaming Local Search, that for any streaming monotone submodular maximization algorithm with approximation guarantee α under a collection of independence systems I, provides a constant 1/(1+2/√α+1/α+2d(1+√α)) approximation guarantee for maximizing a non-monotone submodular function under the intersection of I and d knapsack constraints. Our experiments show that for video summarization, our method runs more than 1700 times faster than previous work, while maintaining practically the same performance.


#15 Disjunctive Program Synthesis: A Robust Approach to Programming by Example [PDF] [Copy] [Kimi] [REL]

Authors: Mohammad Raza, Sumit Gulwani

Programming by example (PBE) systems allow end users to easily create programs by providing a few input-output examples to specify their intended task. The system attempts to generate a program in a domain specific language (DSL) that satisfies the given examples. However, a key challenge faced by existing PBE techniques is to ensure the robustness of the programs that are synthesized from a small number of examples, as these programs often fail when applied to new inputs. This is because there can be many possible programs satisfying a small number of examples, and the PBE system has to somehow rank between these candidates and choose the correct one without any further information from the user. In this work we present a different approach to PBE in which the system avoids making a ranking decision at the synthesis stage, by instead synthesizing a disjunctive program that includes the many top-ranked programs as possible alternatives and selects between these different choices upon execution on a new input. This delayed choice brings the important benefit of comparing the possible outputs produced by the different disjuncts on a given input at execution time. We present a generic framework for synthesizing such disjunctive programs in arbitrary DSLs, and describe two concrete implementations of disjunctive synthesis in the practical domains of data extraction from plain text and HTML documents. We present an evaluation showing the significant increase in robustness achieved with our disjunctive approach, as illustrated by an increase from 59% to 93% of tasks for which correct programs can be learnt from a single example.


#16 Memory-Augmented Monte Carlo Tree Search [PDF] [Copy] [Kimi] [REL]

Authors: Chenjun Xiao, Jincheng Mei, Martin Müller

This paper proposes and evaluates Memory-Augmented Monte Carlo Tree Search (M-MCTS), which provides a new approach to exploit generalization in online real-time search. The key idea of M-MCTS is to incorporate MCTS with a memory structure, where each entry contains information of a particular state. This memory is used to generate an approximate value estimation by combining the estimations of similar states. We show that the memory based value approximation is better than the vanilla Monte Carlo estimation with high probability under mild conditions. We evaluate M-MCTS in the game of Go. Experimental results show that M-MCTS outperforms the original MCTS with the same number of simulations.


#17 Warmstarting of Model-Based Algorithm Configuration [PDF] [Copy] [Kimi] [REL]

Authors: Marius Lindauer, Frank Hutter

The performance of many hard combinatorial problem solvers depends strongly on their parameter settings, and since manual parameter tuning is both tedious and suboptimal the AI community has recently developed several algorithm configuration (AC) methods to automatically address this problem. While all existing AC methods start the configuration process of an algorithm A from scratch for each new type of benchmark instances, here we propose to exploit information about A's performance on previous benchmarks in order to warmstart its configuration on new types of benchmarks. We introduce two complementary ways in which we can exploit this information to warmstart AC methods based on a predictive model. Experiments for optimizing a flexible modern SAT solver on twelve different instance sets show that our methods often yield substantial speedups over existing AC methods (up to 165-fold) and can also find substantially better configurations given the same compute budget.


#18 Avoiding Dead Ends in Real-Time Heuristic Search [PDF] [Copy] [Kimi] [REL]

Authors: Bence Cserna, William Doyle, Jordan Ramsdell, Wheeler Ruml

Many systems, such as mobile robots, need to be controlled in real time. Real-time heuristic search is a popular on-line planning paradigm that supports concurrent planning and execution. However,existing methods do not incorporate a notion of safety and we show that they can perform poorly in domains that contain dead-end states from which a goal cannot be reached. We introduce new real-time heuristic search methods that can guarantee safety if the domain obeys certain properties. We test these new methods on two different simulated domains that contain dead ends, one that obeys the properties and one that does not. We find that empirically the new methods provide good performance. We hope this work encourages further efforts to widen the applicability of real-time planning.


#19 Noisy Derivative-Free Optimization With Value Suppression [PDF] [Copy] [Kimi] [REL]

Authors: Hong Wang, Hong Qian, Yang Yu

Derivative-free optimization has shown advantage in solving sophisticated problems such as policy search, when the environment is noise-free. Many real-world environments are noisy, where solution evaluations are inaccurate due to the noise. Noisy evaluation can badly injure derivative-free optimization, as it may make a worse solution looks better. Sampling is a straightforward way to reduce noise, while previous studies have shown that delay the noise handling to the comparison time point (i.e., threshold selection) can be helpful for derivative-free optimization. This work further delays the noise handling, and proposes a simple noise handling mechanism, i.e., value suppression. By value suppression, we do nothing about noise until the best-so-far solution has not been improved for a period, and then suppress the value of the best-so-far solution and continue the optimization. On synthetic problems as well as reinforcement learning tasks, experiments verify that value suppression can be significantly more effective than the previous methods.