https://papers.cool/arxiv/math.OCOptimization and Control2024-11-06T00:00:00+00:00python-feedgenCool Papers - Immersive Paper Discoveryhttps://papers.cool/arxiv/2410.15483Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning2024-11-06T00:00:00+00:00Heshan FernandoHan ShenParikshit RamYi ZhouHorst SamulowitzNathalie BaracaldoTianyi ChenPost-training of pre-trained LLMs, which typically consists of the supervised fine-tuning (SFT) stage and the preference learning (RLHF or DPO) stage, is crucial to effective and safe LLM applications. The widely adopted approach in post-training popular open-source LLMs is to sequentially perform SFT and RLHF/DPO. However, sequential training is sub-optimal in terms of SFT and RLHF/DPO trade-off: the LLM gradually forgets about the first stage's training when undergoing the second stage's training. We theoretically prove the sub-optimality of sequential post-training. Furthermore, we propose a practical joint post-training framework with theoretical convergence guarantees and empirically outperforms sequential post-training framework, while having similar computational cost. Our code is available at https://github.com/heshandevaka/XRIGHT.https://papers.cool/arxiv/2411.02574A Systematic Study on Solving Aerospace Problems Using Metaheuristics2024-11-06T00:00:00+00:00Carlos Alberto da Silva JuniorMarconi de Arruda PereiraAngelo PassaroComplex engineering problems can be modelled as optimisation problems. For instance, optimising engines, materials, components, structure, aerodynamics, navigation, control, logistics, and planning is essential in aerospace. Metaheuristics are applied to solve these optimisation problems. The present paper presents a systematic study on applying metaheuristics in aerospace based on the literature. Relevant scientific repositories were consulted, and a structured methodology was used to filter the papers. Articles published until March 2022 associating metaheuristics and aerospace applications were selected. The most used algorithms and the most relevant hybridizations were identified. This work also analyses the main types of problems addressed in the aerospace context and which classes of algorithms are most used in each problem.https://papers.cool/arxiv/2411.02800Comparison of two mean-field approaches to modeling an epidemic spread2024-11-06T00:00:00+00:00Viktoriya PetrakovaOlga KrivorotkoThe paper describes and compares three approaches to modeling an epidemic spread. The first approach is a well-known system of SIR ordinary differential equations. The second is a mean-field model, in which an isolation strategy for each epidemiological group (Susceptible, Infected, and Removed) is chosen as an optimal control. The third is another meanfield model, in which isolation strategy is common for the entire population. The considered models have been compared analytically, their sensitivity analysis has been carried out and their predictive capabilities have been estimated using sets of synthetic and real data. For one of the meanfield models, its finite-difference analogue has been devised. The models have also been assessed in terms of their applicability for predicting a viral epidemic spread.https://papers.cool/arxiv/2411.03006Neural Networks and (Virtual) Extended Formulations2024-11-06T00:00:00+00:00Christoph HertrichGeorg LohoNeural networks with piecewise linear activation functions, such as rectified linear units (ReLU) or maxout, are among the most fundamental models in modern machine learning. We make a step towards proving lower bounds on the size of such neural networks by linking their representative capabilities to the notion of the extension complexity $\mathrm{xc}(P)$ of a polytope $P$, a well-studied quantity in combinatorial optimization and polyhedral geometry. To this end, we propose the notion of virtual extension complexity $\mathrm{vxc}(P)=\min\{\mathrm{xc}(Q)+\mathrm{xc}(R)\mid P+Q=R\}$. This generalizes $\mathrm{xc}(P)$ and describes the number of inequalities needed to represent the linear optimization problem over $P$ as a difference of two linear programs. We prove that $\mathrm{vxc}(P)$ is a lower bound on the size of a neural network that optimizes over $P$. While it remains open to derive strong lower bounds on virtual extension complexity, we show that powerful results on the ordinary extension complexity can be converted into lower bounds for monotone neural networks, that is, neural networks with only nonnegative weights. Furthermore, we show that one can efficiently optimize over a polytope $P$ using a small virtual extended formulation. We therefore believe that virtual extension complexity deserves to be studied independently from neural networks, just like the ordinary extension complexity. As a first step in this direction, we derive an example showing that extension complexity can go down under Minkowski sum.https://papers.cool/arxiv/2411.03090Adjoint lattice kinetic scheme for topology optimization in fluid problems2024-11-06T00:00:00+00:00Yuta TanabeKentaro YajiKuniharu UshijimaThis paper proposes a topology optimization method for non-thermal and thermal fluid problems using the Lattice Kinetic Scheme (LKS).LKS, which is derived from the Lattice Boltzmann Method (LBM), requires only macroscopic values, such as fluid velocity and pressure, whereas LBM requires velocity distribution functions, thereby reducing memory requirements. The proposed method computes design sensitivities based on the adjoint variable method, and the adjoint equation is solved in the same manner as LKS; thus, we refer to it as the Adjoint Lattice Kinetic Scheme (ALKS). A key contribution of this method is the proposed approximate treatment of boundary conditions for the adjoint equation, which is challenging to apply directly due to the characteristics of LKS boundary conditions. We demonstrate numerical examples for steady and unsteady problems involving non-thermal and thermal fluids, and the results are physically meaningful and consistent with previous research, exhibiting similar trends in parameter dependencies, such as the Reynolds number. Furthermore, the proposed method reduces memory usage by up to 75% compared to the conventional LBM in an unsteady thermal fluid problem.https://papers.cool/arxiv/2411.03277Asymptotic stability equals exponential stability -- while you twist your eyes2024-11-06T00:00:00+00:00Wouter JongeneelSuppose that two vector fields on a smooth manifold render some equilibrium point globally asymptotically stable (GAS). We show that there exists a homotopy between the corresponding semiflows such that this point remains GAS along this homotopy.https://papers.cool/arxiv/2411.02549Distributionally Robust Optimization2024-11-06T00:00:00+00:00Daniel KuhnSoroosh ShafieeWolfram WiesemannDistributionally robust optimization (DRO) studies decision problems under uncertainty where the probability distribution governing the uncertain problem parameters is itself uncertain. A key component of any DRO model is its ambiguity set, that is, a family of probability distributions consistent with any available structural or statistical information. DRO seeks decisions that perform best under the worst distribution in the ambiguity set. This worst case criterion is supported by findings in psychology and neuroscience, which indicate that many decision-makers have a low tolerance for distributional ambiguity. DRO is rooted in statistics, operations research and control theory, and recent research has uncovered its deep connections to regularization techniques and adversarial training in machine learning. This survey presents the key findings of the field in a unified and self-contained manner.https://papers.cool/arxiv/2411.02573Optimization Algorithm Design via Electric Circuits2024-11-06T00:00:00+00:00Stephen P. BoydTetiana ParshakovaErnest K. RyuJaewook J. SuhWe present a novel methodology for convex optimization algorithm design using ideas from electric RLC circuits. Given an optimization problem, the first stage of the methodology is to design an appropriate electric circuit whose continuous-time dynamics converge to the solution of the optimization problem at hand. Then, the second stage is an automated, computer-assisted discretization of the continuous-time dynamics, yielding a provably convergent discrete-time algorithm. Our methodology recovers many classical (distributed) optimization algorithms and enables users to quickly design and explore a wide range of new algorithms with convergence guarantees.https://papers.cool/arxiv/2411.02665A Trust-Region Algorithm for Noisy Equality Constrained Optimization2024-11-06T00:00:00+00:00Shigeng SunJorge NocedalThis paper introduces a modified Byrd-Omojokun (BO) trust region algorithm to address the challenges posed by noisy function and gradient evaluations. The original BO method was designed to solve equality constrained problems and it forms the backbone of some interior point methods for general large-scale constrained optimization. A key strength of the BO method is its robustness in handling problems with rank-deficient constraint Jacobians. The algorithm proposed in this paper introduces a new criterion for accepting a step and for updating the trust region that makes use of an estimate in the noise in the problem. The analysis presented here gives conditions under which the iterates converge to regions of stationary points of the problem, determined by the level of noise. This analysis is more complex than for line search methods because the trust region carries (noisy) information from previous iterates. Numerical tests illustrate the practical performance of the algorithm.https://papers.cool/arxiv/2411.02719Fully Distributed Adaptive Nash Equilibrium Seeking Algorithm for Constrained Noncooperative Games with Prescribed Performance2024-11-06T00:00:00+00:00Sichen QianThis paper investigates a fully distributed adaptive Nash equilibrium (NE) seeking algorithm for constrained noncooperative games with prescribed-time stability. On the one hand, prescribed-time stability for the proposed NE seeking algorithm is obtained by using an adaptive penalty technique, a time-varying control gain and a cosine-related time conversion function, which extends the prior asymptotic stability result. On the other hand, uncoordinated integral adaptive gains are incorporated in order to achieve the fully distribution of the algorithm. Finally, the theoretical result is validated through a numerical simulation based on a standard power market scenario.https://papers.cool/arxiv/2411.02721Differentiability and Approximation of Probability Functions under Gaussian Mixture Models: A Bayesian Approach2024-11-06T00:00:00+00:00Gonzalo ContadorPedro Pérez-ArosEmilio VilchesIn this work, we study probability functions associated with Gaussian mixture models. Our primary focus is on extending the use of spherical radial decomposition for multivariate Gaussian random vectors to the context of Gaussian mixture models, which are not inherently spherical but only conditionally so. Specifically, the conditional probability distribution, given a random parameter of the random vector, follows a Gaussian distribution, allowing us to apply Bayesian analysis tools to the probability function. This assumption, together with spherical radial decomposition for Gaussian random vectors, enables us to represent the probability function as an integral over the Euclidean sphere. Using this representation, we establish sufficient conditions to ensure the differentiability of the probability function and provide and integral representation of its gradient. Furthermore, leveraging the Bayesian decomposition, we approximate the probability function using random sampling over the parameter space and the Euclidean sphere. Finally, we present numerical examples that illustrate the advantages of this approach over classical approximations based on random vector sampling.https://papers.cool/arxiv/2411.02766Approximate controllability of impulsive semilinear evolution equations in Hilbert spaces2024-11-06T00:00:00+00:00Javad A. AsadzadeNazim I. MahmudovMany dynamical systems in fields such as physics, chemistry, biology, and engineering show impulsive behavior due to abrupt changes at specific times. These behaviors are described by differential systems with impulse effects. This paper examines approximate controllability for certain types of semi-linear impulsive control differential and neutral differential equations in Hilbert spaces, including control within the impulses. By applying semigroup theory and a fixed-point approach, sufficient conditions for approximate controllability of impulsive and neutral differential equations are established. To illustrate the usefulness of the proposed results, three examples are presented, offering improvements over some recent resultshttps://papers.cool/arxiv/2411.02806Applications of Automatic Differentiation in Image Registration2024-11-06T00:00:00+00:00Warin WatsonCash CherryRachelle LangWe demonstrate that automatic differentiation, which has become commonly available in machine learning frameworks, is an efficient way to explore ideas that lead to algorithmic improvement in multi-scale affine image registration and affine super-resolution problems. In our first experiment on multi-scale registration, we implement an ODE predictor-corrector method involving a derivative with respect to the scale parameter and the Hessian of an image registration objective function, both of which would be difficult to compute without AD. Our findings indicate that exact Hessians are necessary for the method to provide any benefits over a traditional multi-scale method; a Gauss-Newton Hessian approximation fails to provide such benefits. In our second experiment, we implement a variable projected Gauss-Newton method for super-resolution and use AD to differentiate through the iteratively computed projection, a method previously unaddressed in the literature. We show that Jacobians obtained without differentiating through the projection are poor approximations to the true Jacobians of the variable projected forward map and explore the performance of some other approximations. By addressing these problems, this work contributes to the application of AD in image registration and sets a precedent for further use of machine learning tools in this field.https://papers.cool/arxiv/2411.02822Polyhedral study of a temporal rural postman problem: application in inspection of railway track without disturbing train schedules2024-11-06T00:00:00+00:00Somnath BuriulyLeena VachhaniSivapragasam RavitharanArpita SinhaSunita ChauhanThe Rural Postman Problem with Temporal Unavailability (RPP-TU) is a variant of the Rural Postman Problem (RPP) specified for multi-agent planning over directed graphs with temporal constraints. These temporal constraints represent the unavailable time intervals for each arc during which agents cannot traverse the arc. Such arc unavailability scenarios occur in routing and scheduling of the instrumented wagons for inspection of railway tracks without disturbing the train schedules, i.e. the scheduled trains prohibit access to the signal blocks (sections of railway track separated by signals) for some finite interval of time. A three-index formulation for the RPP-TU is adopted from the literature. The three-index formulation has binary variables for describing the route information of the agents, and continuous non-negative variables to describe the schedules at pre-defined locations. A relaxation of the three-index formulation for RPP-TRU, referred to as Cascaded Graph Formulation (CGF), is investigated in this work. The CGF has attributes that simplify the polyhedral study of time-dependent arc routing problems like RPP-TRU. A novel branch-and-cut algorithm is proposed to solve the RPP-TU, where branching is performed over the service arcs. A family of facet-defining inequalities, derived from the polyhedral study, is used as cutting planes in the proposed branch-and-cut algorithm to reduce the computation time by up to $48\%$. Finally, an application of this work is showcased using a simulation case study of a railway inspection scheduling problem based on Kurla-Vashi-Thane suburban network in Mumbai, India. An improvement of $93\%$ is observed when compared to a Benders' decomposition based MILP solver from the literature.https://papers.cool/arxiv/2411.03051Fast and robust consensus-based optimization via optimal feedback control2024-11-06T00:00:00+00:00Yuyang HuangMichael HertyDante KaliseNikolas KantasWe propose a variant of consensus-based optimization (CBO) algorithms, controlled-CBO, which introduces a feedback control term to improve convergence towards global minimizers of non-convex functions in multiple dimensions. The feedback law is a gradient of a numerical approximation to the Hamilton-Jacobi-Bellman (HJB) equation, which serves as a proxy of the original objective function. Thus, the associated control signal furnishes gradient-like information to facilitate the identification of the global minimum without requiring derivative computation from the objective function itself. The proposed method exhibits significantly improved performance over standard CBO methods in numerical experiments, particularly in scenarios involving a limited number of particles, or where the initial particle ensemble is not well positioned with respect to the global minimum. At the same time, the modification keeps the algorithm amenable to theoretical analysis in the mean-field sense. The superior convergence rates are assessed experimentally.https://papers.cool/arxiv/2411.03103Benign landscape for Burer-Monteiro factorizations of MaxCut-type semidefinite programs2024-11-06T00:00:00+00:00Faniriana Rakoto EndorIrène WaldspurgerWe consider MaxCut-type semidefinite programs (SDP) which admit a low rank solution. To numerically leverage the low rank hypothesis, a standard algorithmic approach is the Burer-Monteiro factorization, which allows to significantly reduce the dimensionality of the problem at the cost of its convexity. We give a sharp condition on the conditioning of the Laplacian matrix associated with the SDP under which any second-order critical point of the non-convex problem is a global minimizer. By applying our theorem, we improve on recent results about the correctness of the Burer-Monteiro approach on $\mathbb{Z}_2$-synchronization problems.https://papers.cool/arxiv/2411.03138Predict-and-Optimize Robust Unit Commitment with Statistical Guarantees via Weight Combination2024-11-06T00:00:00+00:00Rui XieYue ChenPierre PinsonThe growing uncertainty from renewable power and electricity demand brings significant challenges to unit commitment (UC). While various advanced forecasting and optimization methods have been developed to predict better and address this uncertainty, most previous studies treat forecasting and optimization as separate tasks. This separation can lead to suboptimal results due to misalignment between the objectives of the two tasks. To overcome this challenge, we propose a robust UC framework that integrates the forecasting and optimization processes while ensuring statistical guarantees. In the forecasting stage, we combine multiple predictions derived from diverse data sources and methodologies for an improved prediction, aiming to optimize the UC performance. In the optimization stage, the combined prediction is used to construct an uncertainty set with statistical guarantees, based on which the robust UC model is formulated. The optimal robust UC solution provides feedback to refine the forecasting process, forming a closed loop. To solve the proposed integrated forecasting-optimization framework efficiently and effectively, we develop a neural network-based surrogate model for acceleration and introduce a reshaping method for the uncertainty set based on the optimization result to reduce conservativeness. Case studies on modified IEEE 30-bus and 118-bus systems demonstrate the advantages of the proposed approach.https://papers.cool/arxiv/2411.03216On the Hardness of the $L_1-L_2$ Regularization Problem2024-11-06T00:00:00+00:00Yuyuan OuyangKyle YatesThe sparse linear reconstruction problem is a core problem in signal processing which aims to recover sparse solutions to linear systems. The original problem regularized by the total number of nonzero components (also know as $L_0$ regularization) is well-known to be NP-hard. The relaxation of the $L_0$ regularization by using the $L_1$ norm offers a convex reformulation, but is only exact under contain conditions (e.g., restricted isometry property) which might be NP-hard to verify. To overcome the computational hardness of the $L_0$ regularization problem while providing tighter results than the $L_1$ relaxation, several alternate optimization problems have been proposed to find sparse solutions. One such problem is the $L_1-L_2$ minimization problem, which is to minimize the difference of the $L_1$ and $L_2$ norms subject to linear constraints. This paper proves that solving the $L_1-L_2$ minimization problem is NP-hard. Specifically, we prove that it is NP-hard to minimize the $L_1-L_2$ regularization function subject to linear constraints. Moreover, it is also NP-hard to solve the unconstrained formulation that minimizes the sum of a least squares term and the $L_1-L_2$ regularization function. Furthermore, restricting the feasible set to a smaller one by adding nonnegative constraints does not change the NP-hardness nature of the problems.