2024-10-29 | | Total: 18
We prove that there is a universal constant $C>0$ so that for every $d \in \mathbb N$, every centered subgaussian distribution $\mathcal D$ on $\mathbb R^d$, and every even $p \in \mathbb N$, the $d$-variate polynomial $(Cp)^{p/2} \cdot \|v\|_{2}^p - \mathbb E_{X \sim \mathcal D} \langle v,X\rangle^p$ is a sum of square polynomials. This establishes that every subgaussian distribution is \emph{SoS-certifiably subgaussian} -- a condition that yields efficient learning algorithms for a wide variety of high-dimensional statistical tasks. As a direct corollary, we obtain computationally efficient algorithms with near-optimal guarantees for the following tasks, when given samples from an arbitrary subgaussian distribution: robust mean estimation, list-decodable mean estimation, clustering mean-separated mixture models, robust covariance-aware mean estimation, robust covariance estimation, and robust linear regression. Our proof makes essential use of Talagrand's generic chaining/majorizing measures theorem.
We show that assuming the availability of the processor with variable precision arithmetic, we can compute matrix-by-matrix multiplications in $O(N^2log_2N)$ computational complexity. We replace the standard matrix-by-matrix multiplications algorithm $\begin{bmatrix}A_{11}&A_{12}\\A_{21}&A_{22}\end{bmatrix}\begin{bmatrix}B_{11}&B_{12}\\B_{21}&B_{22}\end{bmatrix}=\begin{bmatrix}A_{11}B_{11}+A_{12}B_{21}&A_{11}B_{12}+A_{12}B_{22}\\A_{21}B_{11}+A_{22}B_{21}&A_{21}B_{12}+A_{22}B_{22}\end{bmatrix}$ by $\begin{bmatrix}A_{11}&A_{12}\\A_{21}&A_{22}\end{bmatrix}\begin{bmatrix}B_{11}&B_{12}\\B_{21}&B_{22}\end{bmatrix}=\Bigl\lfloor\begin{bmatrix} (A_{11}+\epsilon A_{12})(B_{11}+1/{\epsilon}B_{21})&(A_{11}+\epsilon A_{12})(B_{12}+1/{\epsilon}B_{22})\\(A_{21}+\epsilon A_{22})(B_{11}+1/{\epsilon}B_{21})&(A_{21}+\epsilon A_{22})(B_{12}+1/{\epsilon}B_{22})\end{bmatrix}\Bigr\rfloor \mod \frac{1}{\epsilon}$. The resulting computational complexity for $N\times N$ matrices can be estimated from recursive equation $T(N)=4(N/2)^2$ (multiplication of a matrix by number)+$4(N/2)^2$ (additions of matrices)+$2N^2$ (floor and modulo)+$4T(N/2)$ (recursive calls) as $O(N^2log_2N)$. The novelty of the method lies in the observation, somehow ignored by other matrix-by-matrix multiplication algorithms, that we can multiply matrix entries by non-integer numbers to improve computational complexity. In other words, while having a processor that can compute multiplications, additions, modulo and floor operations with variable precision arithmetic in $O(1)$, we can obtain a matrix-by-matrix multiplication algorithm with $O(N^2log_2N)$ computational complexity. We also present a MATLAB code using VPA variable precision arithmetic emulator that can multiply matrices of size $N\times N$ using $(4log_2N+1)N^2$ variable precision arithmetic operations. This emulator uses $O(N)$ digits to run our algorithm.
We give the first parallel algorithm with optimal $\tilde{O}(m)$ work for the classical problem of computing Single-Source Shortest Paths in general graphs with negative-weight edges. In graphs without negative edges, Dijkstra's algorithm solves the Single-Source Shortest Paths (SSSP) problem with optimal $\tilde O(m)$ work, but is inherently sequential. A recent breakthrough by Bernstein, Nanongkai, Wulff-Nilsen; FOCS '22 achieves the same for general graphs. Parallel shortest path algorithms are more difficult and have been intensely studied for decades. Only very recently, multiple lines of research culminated in parallel algorithms with optimal work $\tilde O(m)$ for various restricted settings, such as approximate or exact algorithms for directed or undirected graphs without negative edges. For general graphs, the best known algorithm by [shvinkumar, Bernstein, Cao, Grunau, Haeupler, Jiang, Nanongkai, Su; ESA '24 still requires $m^{1+o(1)}$ work. This paper presents a randomized parallel algorithm for SSSP in general graphs with near-linear work $\tilde O(m)$ and state-of-the-art span $n^{1/2 + o(1)}$. We follow a novel bottom-up approach leading to a particularly clean and simple algorithm. Our algorithm can be seen as a \emph{near-optimal parallel black-box reduction} from SSSP in general graphs to graphs without negative edges. In contrast to prior works, the reduction in this paper is both parallel and essentially without overhead, only affecting work and span by polylogarithmic factors.
In this paper, we introduce flubbles, a new definition of "bubbles" corresponding to variants in a (pan)genome graph $G$. We then show a characterization for flubbles in terms of equivalence classes regarding cycles in an intermediate data structure we built from the spanning tree of the $G$, which leads us to a linear time and space solution for finding all flubbles. Furthermore, we show how a related characterization also allows us to efficiently detect what we define as hairpin inversions: a cycle preceded and followed by the same path in the graph; being the latter necessarily traversed both ways, this structure corresponds to inversions. Finally, Inspired by the concept of Program Structure Tree introduced fifty years ago to represent the hierarchy of the control structure of a program, we define a tree representing the structure of G in terms of flubbles, the flubble tree, which we also find in linear time. The hierarchy of variants introduced by the flubble tree paves the way for new investigations of (pan)genomic structures and their decomposition for practical analyses. We have implemented our methods into a prototype tool named povu which we tested on human and yeast data. We show that povu can find flubbles and also output the flubble tree while being as fast (or faster than) well established tools that find bubbles, such as vg and BubbleGun. Moreover, we show how, within the same time, povu can find hairpin inversions that, to the best of our knowledge, no other tool is able to find. Our tool is freely available at https://github.com/urbanslug/povu/ under the MIT License.
The \textsc{Capacitated $d$-Hitting Set} problem involves a universe $U$ with a capacity function $\mathsf{cap}: U \rightarrow \mathbb{N}$ and a collection $\mathcal{A}$ of subsets of $U$, each of size at most $d$. The goal is to find a minimum subset $S \subseteq U$ and an assignment $\phi : \mathcal{A} \rightarrow S$ such that for every $A \in \mathcal{A}$, $\phi(A) \in A$, and for each $x \in U$, $|\phi^{-1}(x)| \leq \mathsf{cap}(x)$. For $d=2$, this is known as \textsc{Capacitated Vertex Cover}. In the weighted variant, each element of $U$ has a positive integer weight, with the objective of finding a minimum-weight capacitated hitting set. Chuzhoy and Naor [SICOMP 2006] provided a factor-3 approximation for \textsc{Capacitated Vertex Cover} and showed that the weighted case lacks an $o(\log n)$-approximation unless $P=NP$. Kao and Wong [SODA 2017] later independently achieved a $d$-approximation for \textsc{Capacitated $d$-Hitting Set}, with no $d - \epsilon$ improvements possible under the Unique Games Conjecture. Our main result is a parameterized approximation algorithm with runtime $\left(\frac{k}{\epsilon}\right)^k 2^{k^{O(kd)}}(|U|+|\mathcal{A}|)^{O(1)}$ that either concludes no solution of size $\leq k$ exists or finds $S$ of size $\leq 4/3 \cdot k$ and weight at most $2+\epsilon$ times the minimum weight for solutions of size $\leq k$. We further show that no FPT-approximation with factor $c > 1$ exists for unweighted \textsc{Capacitated $d$-Hitting Set} with $d \geq 3$, nor with factor $2 - \epsilon$ for the weighted version, assuming the Exponential Time Hypothesis. These results extend to \textsc{Capacitated Vertex Cover} in multigraphs. Additionally, a variant of multi-dimensional \textsc{Knapsack} is shown hard to FPT-approximate within $2 - \epsilon$.
The 3SUM problem is one of the cornerstones of fine-grained complexity. Its study has led to countless lower bounds, but as has been sporadically observed before -- and as we will demonstrate again -- insights on 3SUM can also lead to algorithmic applications. The starting point of our work is that we spend a lot of technical effort to develop new algorithms for 3SUM-type problems such as approximate 3SUM-counting, small-doubling 3SUM-counting, and a deterministic subquadratic-time algorithm for the celebrated Balog-Szemerédi-Gowers theorem from additive combinatorics. As consequences of these tools, we derive diverse new results in fine-grained complexity and pattern matching algorithms, answering open questions from many unrelated research areas. Specifically: - A recent line of research on the "short cycle removal" technique culminated in tight 3SUM-based lower bounds for various graph problems via randomized fine-grained reductions [Abboud, Bringmann, Fischer; STOC '23] [Jin, Xu; STOC '23]. In this paper we derandomize the reduction to the important 4-Cycle Listing problem. - We establish that \#3SUM and 3SUM are fine-grained equivalent under deterministic reductions. - We give a deterministic algorithm for the $(1+\epsilon)$-approximate Text-to-Pattern Hamming Distances problem in time $n^{1+o(1)} \cdot \epsilon^{-1}$. - In the $k$-Mismatch Constellation problem the input consists of two integer sets $A, B \subseteq [N]$, and the goal is to test whether there is a shift $c$ such that $|(c + B) \setminus A| \leq k$ (i.e., whether $B$ shifted by $c$ matches $A$ up to $k$ mismatches). For moderately small $k$ the previously best running time was $\tilde O(|A| \cdot k)$ [Cardoze, Schulman; FOCS '98] [Fischer; SODA '24]. We give a faster $|A| \cdot k^{2/3} \cdot N^{o(1)}$-time algorithm in the regime where $|B| = \Theta(|A|)$.
A seemingly simple, yet widely applicable subroutine in automated train scheduling is the insertion of a new train path to a timetable in a railway network. We believe it to be the first step towards a new train-rerouting framework in case of large disturbances or maintenance works. Other applications include handling ad-hoc requests and modifying train paths upon request from railway undertakings. We propose a fast and scalable path-insertion algorithm based on dynamic programming that is able to output multiple suitable paths. Our algorithm uses macroscopic data and can run on railway networks with any number of tracks. We apply the algorithm on the line from Göteborg Sävenäs to the Norwegian border at Kornsjö. For a time window of seven hours, we obtain eight suitable paths for a freight train within 0.3 seconds after preprocessing.
We consider algorithms and spectral bounds for sparsest cut and conductance in directed polymatrodal networks. This is motivated by recent work on submodular hypergraphs \cite{Yoshida19,LiM18,ChenOT23,Veldt23} and previous work on multicommodity flows and cuts in polymatrodial networks \cite{ChekuriKRV15}. We obtain three results. First, we obtain an $O(\sqrt{\log n})$-approximation for sparsest cut and point out how this generalizes the result in \cite{ChenOT23}. Second, we consider the symmetric version of conductance and obtain an $O(\sqrt{OPT \log r})$ approximation where $r$ is the maximum degree and we point out how this generalizes previous work on vertex expansion in graphs. Third, we prove a non-constructive Cheeger like inequality that generalizes previous work on hypergraphs. We provide a unified treatment via line-embeddings which were shown to be effective for submodular cuts in \cite{ChekuriKRV15}.
A reachability preserver is a basic kind of graph sparsifier, which preserves the reachability relation of an $n$-node directed input graph $G$ among a set of given demand pairs $P$ of size $|P|=p$. We give constructions of sparse reachability preservers in the online setting, where $G$ is given on input, the demand pairs $(s, t) \in P$ arrive one at a time, and we must irrevocably add edges to a preserver $H$ to ensure reachability for the pair $(s, t)$ before we can see the next demand pair. Our main results are: -- There is a construction that guarantees a maximum preserver size of $$|E(H)| \le O\left( n^{0.72}p^{0.56} + n^{0.6}p^{0.7} + n\right).$$ This improves polynomially on the previous online upper bound of $O( \min\{np^{0.5}, n^{0.5}p\}) + n$, implicit in the work of Coppersmith and Elkin [SODA '05]. -- Given a promise that the demand pairs will satisfy $P \subseteq S \times V$ for some vertex set $S$ of size $|S|=:\sigma$, there is a construction that guarantees a maximum preserver size of $$|E(H)| \le O\left( (np\sigma)^{1/2} + n\right).$$ A slightly different construction gives the same result for the setting $P \subseteq V \times S$. This improves polynomially on the previous online upper bound of $O( \sigma n)$ (folklore). All of these constructions are polynomial time, deterministic, and they do not require knowledge of the values of $p, \sigma$, or $S$. Our techniques also give a small polynomial improvement in the current upper bounds for offline reachability preservers, and they extend to a stronger model in which we must commit to a path for all possible reachable pairs in $G$ before any demand pairs have been received. As an application, we improve the competitive ratio for Online Unweighted Directed Steiner Forest to $O(n^{3/5 + \varepsilon})$.
Asymptotically tight lower bounds are derived for the Input/Output (I/O) complexity of a class of dynamic programming algorithms including matrix chain multiplication, optimal polygon triangulation, and the construction of optimal binary search trees. Assuming no recomputation of intermediate values, we establish an $\Omega\left(\frac{n^3}{\sqrt{M}B}\right)$ I/O lower bound, where $n$ denotes the size of the input and $M$ denotes the size of the available fast memory (cache). When recomputation is allowed, we show the same bound holds for $M < cn$, where $c$ is a positive constant. In the case where $M \ge 2n$, we show an $\Omega\left(n/B\right)$ I/O lower bound. We also discuss algorithms for which the number of executed I/O operations matches asymptotically each of the presented lower bounds, which are thus asymptotically tight. Additionally, we refine our general method to obtain a lower bound for the I/O complexity of the Cocke-Younger-Kasami algorithm, where the size of the grammar impacts the I/O complexity. An upper bound with asymptotically matching performance in many cases is also provided.
We present a simple linear-time algorithm that finds a spanning tree $T$ of a given $2$-edge-connected graph $G$ such that each vertex $v$ of $T$ has degree at most $\lceil \frac{\deg_G(v)}{2}\rceil + 1$.
Online paging is a fundamental problem in the field of online algorithms, in which one maintains a cache of $k$ slots as requests for fetching pages arrive online. In the weighted variant of this problem, each page has its own fetching cost; a substantial line of work on this problem culminated in an (optimal) $O(\log k)$-competitive randomized algorithm, due to Bansal, Buchbinder and Naor (FOCS'07). Existing work for weighted paging assumes that page weights are known in advance, which is not always the case in practice. For example, in multi-level caching architectures, the expected cost of fetching a memory block is a function of its probability of being in a mid-level cache rather than the main memory. This complex property cannot be predicted in advance; over time, however, one may glean information about page weights through sampling their fetching cost multiple times. We present the first algorithm for online weighted paging that does not know page weights in advance, but rather learns from weight samples. In terms of techniques, this requires providing (integral) samples to a fractional solver, requiring a delicate interface between this solver and the randomized rounding scheme; we believe that our work can inspire online algorithms to other problems that involve cost sampling.
Seam carving, a content-aware image resizing technique, has garnered significant attention for its ability to resize images while preserving important content. In this paper, we conduct a comprehensive analysis of four algorithmic design techniques for seam carving: brute-force, greedy, dynamic programming, and GPU-based parallel algorithms. We begin by presenting a theoretical overview of each technique, discussing their underlying principles and computational complexities. Subsequently, we delve into empirical evaluations, comparing the performance of these algorithms in terms of runtime efficiency. Our experimental results provide insights into the theoretical complexities of the design techniques.
Trace distance and infidelity (induced by square root fidelity), as basic measures of the closeness of quantum states, are commonly used in quantum state discrimination, certification, and tomography. However, the sample complexity for their estimation still remains open. In this paper, we solve this problem for pure states. We present a quantum algorithm that estimates the trace distance and square root fidelity between pure states to within additive error $\varepsilon$, given sample access to their identical copies. Our algorithm achieves the optimal sample complexity $\Theta(1/\varepsilon^2)$, improving the long-standing folklore $O(1/\varepsilon^4)$. Our algorithm is composed of a samplized phase estimation of the product of two Householder reflections. Notably, an improved (multi-)samplizer for pure states is used as an algorithmic tool in our construction, through which any quantum query algorithm using $Q$ queries to the reflection operator about a pure state $|\psi\rangle$ can be converted to a $\delta$-close (in the diamond norm) quantum sample algorithm using $\Theta(Q^2/\delta)$ samples of $|\psi\rangle$. This samplizer for pure states is shown to be optimal.
Byzantine agreement is a fundamental problem in fault-tolerant distributed networks that has been studied intensively for the last four decades. Most of these works designed protocols for complete networks. A key goal in Byzantine protocols is to tolerate as many Byzantine nodes as possible. The work of Dwork, Peleg, Pippenger, and Upfal [STOC 1986, SICOMP 1988] was the first to address the Byzantine agreement problem in sparse, bounded degree networks and presented a protocol that achieved almost-everywhere agreement among honest nodes. In such networks, all known Byzantine agreement protocols (e.g., Dwork, Peleg, Pippenger, and Upfal, STOC 1986; Upfal, PODC 1992; King, Saia, Sanwalani, and Vee, FOCS 2006) that tolerated a large number of Byzantine nodes had a major drawback that they were not fully-distributed -- in those protocols, nodes are required to have initial knowledge of the entire network topology. This drawback makes such protocols inapplicable to real-world communication networks such as peer-to-peer (P2P) networks, which are typically sparse and bounded degree and where nodes initially have only local knowledge of themselves and their neighbors. Indeed, a fundamental open question raised by the above works is whether one can design Byzantine protocols that tolerate a large number of Byzantine nodes in sparse networks that work with only local knowledge, i.e., fully-distributed protocols. The work of Augustine, Pandurangan, and Robinson [PODC 2013] presented the first fully-distributed Byzantine agreement protocol that works in sparse networks, but it tolerated only up to $O(\sqrt{n}/ polylog(n))$ Byzantine nodes (where $n$ is the total network size). We answer the earlier open question by presenting fully-distributed Byzantine agreement protocols for sparse, bounded degree networks that tolerate significantly more Byzantine nodes -- up to $O(n/ polylog(n))$ of them.
The classic greedy coloring (first-fit) algorithm considers the vertices of an input graph $G$ in a given order and assigns the first available color to each vertex $v$ in $G$. In the {\sc Grundy Coloring} problem, the task is to find an ordering of the vertices that will force the greedy algorithm to use as many colors as possible. In the {\sc Partial Grundy Coloring}, the task is also to color the graph using as many colors as possible. This time, however, we may select both the ordering in which the vertices are considered and which color to assign the vertex. The only constraint is that the color assigned to a vertex $v$ is a color previously used for another vertex if such a color is available. Whether {\sc Grundy Coloring} and {\sc Partial Grundy Coloring} admit fixed-parameter tractable (FPT) algorithms, algorithms with running time $f(k)n^{\OO(1)}$, where $k$ is the number of colors, was posed as an open problem by Zaker and by Effantin et al., respectively. Recently, Aboulker et al. (STACS 2020 and Algorithmica 2022) resolved the question for \Grundycol\ in the negative by showing that the problem is W[1]-hard. For {\sc Partial Grundy Coloring}, they obtain an FPT algorithm on graphs that do not contain $K_{i,j}$ as a subgraph (a.k.a. $K_{i,j}$-free graphs). Aboulker et al.~re-iterate the question of whether there exists an FPT algorithm for {\sc Partial Grundy Coloring} on general graphs and also asks whether {\sc Grundy Coloring} admits an FPT algorithm on $K_{i,j}$-free graphs. We give FPT algorithms for {\sc Partial Grundy Coloring} on general graphs and for {\sc Grundy Coloring} on $K_{i,j}$-free graphs, resolving both the questions in the affirmative. We believe that our new structural theorems for partial Grundy coloring and ``representative-family'' like sets for $K_{i,j}$-free graphs that we use in obtaining our results may have wider algorithmic applications.
We present a randomized algorithm for solving low-degree polynomial equation systems over finite fields faster than exhaustive search. In order to do so, we follow a line of work by Lokshtanov, Paturi, Tamaki, Williams, and Yu (SODA 2017), Björklund, Kaski, and Williams (ICALP 2019), and Dinur (SODA 2021). In particular, we generalize Dinur's algorithm for $\mathbb{F}_2$ to all finite fields, in particular the "symbolic interpolation" of Björklund, Kaski, and Williams, and we use an efficient trimmed multipoint evaluation and interpolation procedure for multivariate polynomials over finite fields by Van der Hoeven and Schost (AAECC 2013). The running time of our algorithm matches that of Dinur's algorithm for $\mathbb{F}_2$ and is significantly faster than the one of Lokshtanov et al. for $q>2$. We complement our results with tight conditional lower bounds that, surprisingly, we were not able to find in the literature. In particular, under the strong exponential time hypothesis, we prove that it is impossible to solve $n$-variate low-degree polynomial equation systems over $\mathbb{F}_q$ in time $O((q-\varepsilon)^{n})$. As a bonus, we show that under the counting version of the strong exponential time hypothesis, it is impossible to compute the number of roots of a single $n$-variate low-degree polynomial over $\mathbb{F}_q$ in time ${O((q-\varepsilon)^{n})}$; this generalizes a result of Williams (SOSA 2018) from $\mathbb{F}_2$ to all finite fields.
This book provides a comprehensive introduction to the foundational concepts of machine learning (ML) and deep learning (DL). It bridges the gap between theoretical mathematics and practical application, focusing on Python as the primary programming language for implementing key algorithms and data structures. The book covers a wide range of topics, including basic and advanced Python programming, fundamental mathematical operations, matrix operations, linear algebra, and optimization techniques crucial for training ML and DL models. Advanced subjects like neural networks, optimization algorithms, and frequency domain methods are also explored, along with real-world applications of large language models (LLMs) and artificial intelligence (AI) in big data management. Designed for both beginners and advanced learners, the book emphasizes the critical role of mathematical principles in developing scalable AI solutions. Practical examples and Python code are provided throughout, ensuring readers gain hands-on experience in applying theoretical knowledge to solve complex problems in ML, DL, and big data analytics.