Date: Fri, 9 Aug 2024 | Total: 9

Submodular optimization finds applications in machine learning and data mining. In this paper, we study the problem of maximizing functions of the form $h = f-c$, where $f$ is a monotone, non-negative, weakly submodular set function and $c$ is a modular function. We design a deterministic approximation algorithm that runs with ${{O}}(\frac{n}{\epsilon}\log \frac{n}{\gamma \epsilon})$ oracle calls to function $h$, and outputs a set ${S}$ such that $h({S}) \geq \gamma(1-\epsilon)f(OPT)-c(OPT)-\frac{c(OPT)}{\gamma(1-\epsilon)}\log\frac{f(OPT)}{c(OPT)}$, where $\gamma$ is the submodularity ratio of $f$. Existing algorithms for this problem either admit a worse approximation ratio or have quadratic runtime. We also present an approximation ratio of our algorithm for this problem with an approximate oracle of $f$. We validate our theoretical results through extensive empirical evaluations on real-world applications, including vertex cover and influence diffusion problems for submodular utility function $f$, and Bayesian A-Optimal design for weakly submodular $f$. Our experimental results demonstrate that our algorithms efficiently achieve high-quality solutions.

In this paper, we first study what we call Superset-Subset-Disjoint (SSD) set system. Based on properties of SSD set system, we derive the following (I) to (IV): (I) For a nonnegative integer $k$ and a graph $G=(V,E)$ with $|V|\ge2$, let $X_1,X_2,\dots,X_q\subsetneq V$ denote all maximal proper subsets of $V$ that induce $k$-edge-connected subgraphs. Then at least one of (a) and (b) holds: (a) $\{X_1,X_2,\dots,X_q\}$ is a partition of $V$; and (b) $V\setminus X_1, V\setminus X_2,\dots,V\setminus X_q$ are pairwise disjoint. (II) For a strongly-connected (i.e., $k=1$) digraph $G$, we show that whether $V$ is in (a) and/or (b) can be decided in $O(n+m)$ time and that we can generate all such $X_1,X_2,\dots,X_q$ in $O(n+m+|X_1|+|X_2|+\dots+|X_q|)$ time, where $n=|V|$ and $m=|E|$. (III) For a digraph $G$, we can enumerate in linear delay all vertex subsets of $V$ that induce strongly-connected subgraphs. (IV) A digraph is Hamiltonian if there is a spanning subgraph that is strongly-connected and in the case (a).

The task of min-plus matrix multiplication often arises in the context of distances in graphs and is known to be fine-grained equivalent to the All-Pairs Shortest Path problem. The non-crossing property of shortest paths in planar graphs gives rise to Monge matrices; the min-plus product of $n\times n$ Monge matrices can be computed in $O(n^2)$ time. Grid graphs arising in sequence alignment problems, such as longest common subsequence or longest increasing subsequence, are even more structured. Tiskin [SODA'10] modeled their behavior using simple unit-Monge matrices and showed that the min-plus product of such matrices can be computed in $O(n\log n)$ time. Russo [SPIRE'11] showed that the min-plus product of arbitrary Monge matrices can be computed in time $O((n+\delta)\log^3 n)$ parameterized by the core size $\delta$, which is $O(n)$ for unit-Monge matrices. In this work, we provide a linear bound on the core size of the product matrix in terms of the core sizes of the input matrices and show how to solve the core-sparse Monge matrix multiplication problem in $O((n+\delta)\log n)$ time, matching the result of Tiskin for simple unit-Monge matrices. Our algorithm also allows $O(\log \delta)$-time witness recovery for any given entry of the output matrix. As an application of this functionality, we show that an array of size $n$ can be preprocessed in $O(n\log^3 n)$ time so that the longest increasing subsequence of any sub-array can be reconstructed in $O(l)$ time, where $l$ is the length of the reported subsequence; in comparison, Karthik C. S. and Rahul [arXiv'24] recently achieved $O(l+n^{1/2}\log^3 n)$-time reporting after $O(n^{3/2}\log^3 n)$-time preprocessing. Our faster core-sparse Monge matrix multiplication also enabled reducing two logarithmic factors in the running times of the recent algorithms for edit distance with integer weights [Gorbachev & Kociumaka, arXiv'24].

We combine Nishimoto and Tabei's move structure with a wavelet tree to show how, if $T [1..n]$ is over a constant-sized alphabet and its Burrows-Wheeler Transform (BWT) consists of $r$ runs, then we can store $T$ in $O \left( r \log \frac{n}{r} \right)$ bits such that when given a pattern $P [1..m]$, we can find the BWT interval for $P$ in $O (m)$ time.

$\delta$-Covering, for some covering range $\delta>0$, is a continuous facility location problem on undirected graphs where all edges have unit length. The facilities may be positioned on the vertices as well as on the interior of the edges. The goal is to position as few facilities as possible such that every point on every edge has distance at most $\delta$ to one of these facilities. For large $\delta$, the problem is similar to dominating set, which is hard to approximate, while for small $\delta$, say close to $1$, the problem is similar to vertex cover. In fact, as shown by Hartmann et al. [Math. Program. 22], $\delta$-Covering for all unit-fractions $\delta$ is polynomial time solvable, while for all other values of $\delta$ the problem is NP-hard. We study the approximability of $\delta$-Covering for every covering range $\delta>0$. For $\delta \geq 3/2$, the problem is log-APX-hard, and allows an $\mathcal O(\log n)$ approximation. For every $\delta < 3/2$, there is a constant factor approximation of a minimum $\delta$-cover (and the problem is APX-hard when $\delta$ is not a unit-fraction). We further study the dependency of the approximation ratio on the covering range $\delta < 3/2$. By providing several polynomial time approximation algorithms and lower bounds under the Unique Games Conjecture, we narrow the possible approximation ratio, especially for $\delta$ close to the polynomial time solvable cases.

A factorization $f_1, \ldots, f_m$ of a string $w$ of length $n$ is called a repetition factorization of $w$ if $f_i$ is a repetition, i.e., $f_i$ is a form of $x^kx'$, where $x$ is a non-empty string, $x'$ is a (possibly-empty) proper prefix of $x$, and $k \geq 2$. Dumitran et al. [SPIRE 2015] presented an $O(n)$-time and space algorithm for computing an arbitrary repetition factorization of a given string of length $n$. Their algorithm heavily relies on the Union-Find data structure on trees proposed by Gabow and Tarjan [JCSS 1985] that works in linear time on the word RAM model, and an interval stabbing data structure of Schmidt [ISAAC 2009]. In this paper, we explore more combinatorial insights into the problem, and present a simple algorithm to compute an arbitrary repetition factorization of a given string of length $n$ in $O(n)$ time, without relying on data structures for Union-Find and interval stabbing. Our algorithm follows the approach by Inoue et al. [ToCS 2022] that computes the smallest/largest repetition factorization in $O(n \log n)$ time.

In combinatorial optimization, matroids provide one of the most elegant structures for algorithm design. This is perhaps best identified by the Edmonds-Rado theorem relating the success of the simple greedy algorithm to the anatomy of the optimal basis of a matroid [Edm71; Rad57]. As a response, much energy has been devoted to understanding a matroid's favorable computational properties. Yet surprisingly, not much is understood where parallel algorithm design is concerned. Specifically, while prior work has investigated the task of finding an arbitrary basis in parallel computing settings [KUW88], the more complex task of finding the optimal basis remains unexplored. We initiate this study by reexamining Bor\r{u}vka's minimum weight spanning tree algorithm in the language of matroid theory, identifying a new characterization of the optimal basis by way of a matroid's cocircuits as a result. Furthermore, we then combine such insights with special properties of binary matroids to reduce optimization in a binary matroid to the simpler task of search for an arbitrary basis, with only logarithmic asymptotic overhead. Consequentially, we are able to compose our reduction with a known basis search method of [KUW88] to obtain a novel algorithm for finding the optimal basis of a binary matroid with only sublinearly many adaptive rounds of queries to an independence oracle. To the authors' knowledge, this is the first parallel algorithm for matroid optimization to outperform the greedy algorithm in terms of adaptive complexity, for any class of matroid not represented by a graph.

The paper considers implementations of some randomized algorithms in connection with obtaining a random $n^2 \times n^2$ Sudoku matrix with programming language C++. For this purpose we describes the set $\Pi_n$ of all $(2n) \times n$ matrices, consisting of elements of the set $\mathbb{Z}_n =\{ 1,2,\ldots ,n\}$, such that every row is a permutation. We emphasize the relationship between these matrices and the $n^2 \times n^2$ Sudoku matrices. An algorithm to obtain random $\Pi_n$ matrices is presented in this paper. Several auxiliary algorithms that are related to the underlying problem have been described. We evaluated all algorithms according to two criteria - probability evaluation, and time for generation of random objects and checking of belonging to a specific set. This evaluations are interesting from both theoretical and practical point of view because they are particularly useful in the analysis of computer programs.

The study of online algorithms with machine-learned predictions has gained considerable prominence in recent years. One of the common objectives in the design and analysis of such algorithms is to attain (Pareto) optimal tradeoffs between the consistency of the algorithm, i.e., its performance assuming perfect predictions, and its robustness, i.e., the performance of the algorithm under adversarial predictions. In this work, we demonstrate that this optimization criterion can be extremely brittle, in that the performance of Pareto-optimal algorithms may degrade dramatically even in the presence of imperceptive prediction error. To remedy this drawback, we propose a new framework in which the smoothness in the performance of the algorithm is enforced by means of a user-specified profile. This allows us to regulate the performance of the algorithm as a function of the prediction error, while simultaneously maintaining the analytical notion of consistency/robustness tradeoffs, adapted to the profile setting. We apply this new approach to a well-studied online problem, namely the one-way trading problem. For this problem, we further address another limitation of the state-of-the-art Pareto-optimal algorithms, namely the fact that they are tailored to worst-case, and extremely pessimistic inputs. We propose a new Pareto-optimal algorithm that leverages any deviation from the worst-case input to its benefit, and introduce a new metric that allows us to compare any two Pareto-optimal algorithms via a dominance relation.