holds the Chair of Applied Numerical Analysis at TU Munich.
His embraces a broad spectrum of problems in mathematical modeling, analysis and numerical analysis. He is particularly interested in the concept of compression as appearing in different forms in data analysis, image and signal processing, and in the adaptive numerical solutions of partial differential equations or high-dimensional optimization problems.
In this paper, we consider the signature-to-path reconstruction problem from the control-theoretic perspective. Namely, we design an optimal control problem whose solution leads to the minimal-length path that generates a given signature. In order to do that, we minimize a cost functional consisting of two competing terms, i.e., a weighted final-time cost combined with the -norm squared of the controls. Moreover, we can show that, by taking the limit to infinity of the parameter that tunes the final-time cost, the problem -converges to the problem of finding a sub-Riemannian geodesic connecting two signatures. Finally, we provide an alternative reformulation of the latter problem, which is particularly suitable for the numerical implementation.
The challenge of approximating functions in infinite-dimensional spaces from finite samples is widely regarded as formidable. We delve into the challenging problem of the numerical approximation of Sobolev-smooth functions defined on probability spaces. Our particular focus centers on the Wasserstein distance function, which serves as a relevant example. In contrast to the existing body of literature focused on approximating efficiently pointwise evaluations, we chart a new course to define functional approximants by adopting three machine learning-based approaches: 1. Solving a finite number of optimal transport problems and computing the corresponding Wasserstein potentials. 2. Employing empirical risk minimization with Tikhonov regularization in Wasserstein Sobolev spaces. 3. Addressing the problem through the saddle point formulation that characterizes the weak form of the Tikhonov functional’s Euler-Lagrange equation. We furnish explicit and quantitative bounds on generalization errors for each of these solutions. We leverage the theory of metric Sobolev spaces and we combine it with techniques of optimal transport, variational calculus, and large deviation bounds. In our numerical implementation, we harness appropriately designed neural networks to serve as basis functions. These networks undergo training using diverse methodologies. This approach allows us to obtain approximating functions that can be rapidly evaluated after training. Our constructive solutions significantly enhance at equal accuracy the evaluation speed, surpassing that of state-of-the-art methods by several orders of magnitude. This allows evaluations over large datasets several times faster, including training, than traditional optimal transport algorithms. Our analytically designed deep learning architecture slightly outperforms the test error of state-of-the-art CNN architectures on datasets of images.
Consensus-based optimization (CBO) is an agent-based derivative-free method for non-smooth global optimization that has been introduced in 2017, leveraging a surprising interplay between stochastic exploration and Laplace principle. In addition to its versatility and effectiveness in handling high-dimensional, non-convex, and non-smooth optimization problems, this approach lends itself well to theoretical analysis. Indeed, its dynamics is governed by a degenerate nonlinear Fokker–Planck equation, whose large time behavior explains the convergence of the method. Recent results provide guarantees of convergence under the restrictive assumption of a unique global minimizer for the objective function. In this work, we propose a novel and simple variation of CBO to tackle non-convex optimization problems with multiple global minimizers. Despite the simplicity of this new model, its analysis is particularly challenging because of its nonlinearity and nonlocal nature. We prove the existence of solutions of the corresponding nonlinear Fokker–Planck equation and we show exponential concentration in time to the set of minimizers made of multiple smooth, convex, and compact components. Our proofs require combining several ingredients, such as delicate geometrical arguments, new variants of a quantitative Laplace principle, ad hoc regularizations and approximations, and regularity theory for parabolic equations. Ultimately, this result suggests that the corresponding CBO algorithm, formulated as an Euler-Maruyama discretization of the underlying empirical stochastic process, tends to converge to multiple global minimizers.
In this paper we study consensus-based optimization (CBO), which is a multiagent metaheuristic derivative-free optimization method that can globally minimize nonconvex nonsmooth functions and is amenable to theoretical analysis. Based on an experimentally supported intuition that, on average, CBO performs a gradient descent of the squared Euclidean distance to the global minimizer, we devise a novel technique for proving the convergence to the global minimizer in mean-field law for a rich class of objective functions. The result unveils internal mechanisms of CBO that are responsible for the success of the method. In particular, we prove that CBO performs a convexification of a large class of optimization problems as the number of optimizing agents goes to infinity. Furthermore, we improve prior analyses by requiring mild assumptions about the initialization of the method and by covering objectives that are merely locally Lipschitz continuous. As a core component of this analysis, we establish a quantitative nonasymptotic Laplace principle, which may be of independent interest. From the result of CBO convergence in mean-field law, it becomes apparent that the hardness of any global optimization problem is necessarily encoded in the rate of the mean-field approximation, for which we provide a novel probabilistic quantitative estimate. The combination of these results allows us to obtain probabilistic global convergence guarantees of the numerical CBO method.
Consensus-based optimization (CBO) is a versatile multi-particle optimization method for performing nonconvex and nonsmooth global optimizations in high dimensions. Proofs of global convergence in probability have been achieved for a broad class of objective functions in unconstrained optimizations. In this work we adapt the algorithm for solving constrained optimizations on compact and unbounded domains with boundary by leveraging emerging reflective boundary conditions. In particular, we close a relevant gap in the literature by providing a global convergence proof for the many-particle regime comprehensive of convergence rates. On the one hand, for the sake of minimizing running cost, it is desirable to keep the number of particles small. On the other hand, reducing the number of particles implies a diminished capability of exploration of the algorithm. Hence numerical heuristics are needed to ensure convergence of CBO in the few-particle regime. In this work, we also significantly improve the convergence and complexity of CBO by utilizing an adaptive region control mechanism and by choosing geometry-specific random noise. In particular, by combining a hierarchical noise structure with a multigrid finite element method, we are able to compute global minimizers for a constrained p-Allen-Cahn problem with obstacles, a very challenging variational problem.
In this paper, we address the adversarial training of neural ODEs from a robust control perspective. This is an alternative to the classical training via empirical risk minimization, and it is widely used to enforce reliable outcomes for input perturbations. Neural ODEs allow the interpretation of deep neural networks as discretizations of control systems, unlocking powerful tools from control theory for the development and the understanding of machine learning. In this specific case, we formulate the adversarial training with perturbed data as a minimax optimal control problem, for which we derive first order optimality conditions in the form of Pontryagin’s Maximum Principle. We provide a novel interpretation of robust training leading to an alternative weighted technique, which we test on a low-dimensional classification task.
In this paper, we consider ensembles of control-affine systems in ℝd, and we study simultaneous optimal control problems related to the worst-case minimization. After proving that such problems admit solutions, denoting with (ΘN)N a sequence of compact sets that parametrize the ensembles of systems, we first show that the corresponding minimax optimal control problems are Γ-convergent whenever (ΘN)N has a limit with respect to the Hausdorff distance. Besides its independent interest, the previous result plays a crucial role for establishing the Pontryagin Maximum Principle (PMP) when the ensemble is parametrized by a set Θ consisting of infinitely many points. Namely, we first approximate Θ by finite and increasing-in-size sets (ΘN)N for which the PMP is known, and then we derive the PMP for the Γ-limiting problem. The same strategy can be pursued in applications, where we can reduce infinite ensembles to finite ones to compute the minimizers numerically. We bring as a numerical example the Schrödinger equation for a qubit with uncertain resonance frequency.
We introduce CBXPy and ConsensusBasedX.jl, Python and Julia implementations of consensus-based interacting particle systems (CBX), which generalise consensus-based optimization methods (CBO) for global, derivative-free optimisation. The raison d’ˆetre of our libraries is twofold: on the one hand, to offer high- performance implementations of CBX methods that the community can use directly, while on the other, providing a general interface that can accommodate and be extended to further variations of the CBX family. Python and Julia were selected as the leading high-level languages in terms of usage and performance, as well as for their popularity among the scientific computing community. Both libraries have been developed with a common ethos, ensuring a similar API and core functionality, while leveraging the strengths of each language and writing idiomatic code.
The connection between Residual Neural Networks (ResNets) and continuous-time control systems (known as NeurODEs) has led to a mathematical analysis of neural networks, which has provided interesting results of both theoretical and practical significance. However, by construction, NeurODEs have been limited to describing constant-width layers, making them unsuitable for modelling deep learning architectures with layers of variable width. In this paper, we propose a continuous-time Autoencoder, which we call AutoencODE, based on a modification of the controlled field that drives the dynamics. This adaptation enables the extension of the mean-field control framework originally devised for conventional NeurODEs. In this setting, we tackle the case of low Tikhonov regularisation, resulting in potentially non-convex cost landscapes. While the global results obtained for high Tikhonov regularisation may not hold globally, we show that many of them can be recovered in regions where the loss function is locally convex. Inspired by our theoretical findings, we develop a training method tailored to this specific type of Autoencoders with residual connections, and we validate our approach through numerical experiments conducted on various examples.
The Neural Tangent Kernel (NTK) viewpoint is widely employed to analyze the training dynamics of overparameterized Physics-Informed Neural Networks (PINNs). However, unlike the case of linear Partial Differential Equations (PDEs), we show how the NTK perspective falls short in the nonlinear scenario. Specifically, we establish that the NTK yields a random matrix at initialization that is not constant during training, contrary to conventional belief. Another significant difference from the linear regime is that, even in the idealistic infinite-width limit, the Hessian does not vanish and hence it cannot be disregarded during training. This motivates the adoption of second-order optimization methods. We explore the convergence guarantees of such methods in both linear and nonlinear cases, addressing challenges such as spectral bias and slow convergence. Every theoretical result is supported by numerical examples with both linear and nonlinear PDEs, and we highlight the benefits of second-order methods in benchmark test cases.
We investigate the preconditions of an operationalization of ethics on the example algorithmization, i.e. the mathematical implementation, of the concepts of fairness and diversity in AI. From a non-technical point of view in ethics, this implementation entails two major drawbacks, (1) as it narrows down big concepts to a single model that is deemed manageable, and (2) as it hides unsolved problems of humanity in a system that could be mistaken as the `solution’ to these problems. We encourage extra caution when dealing with such issues and vote for human oversight.
The term ‘Normalizing Flows’ is related to the task of constructing invertible transport maps between probability measures by means of deep neural networks. In this paper, we consider the problem of recovering the W2-optimal transport map T between absolutely continuous measures μ,ν∈(ℝn) as the flow of a linear-control neural ODE, where the control depends only on the time variable and takes values in a finite-dimensional space. We first show that, under suitable assumptions on μ,ν and on the controlled vector fields, the optimal transport map is contained in the C0c-closure of the flows generated by the system. Assuming that discrete approximations μN,νN of the original measures μ,ν are available, we use a discrete optimal coupling γN to define an optimal control problem. With a Γ-convergence argument, we prove that its solutions correspond to flows that approximate the optimal transport map T. Finally, taking advantage of the Pontryagin Maximum Principle, we propose an iterative numerical scheme for the resolution of the optimal control problem, resulting in an algorithm for the practical computation of the approximated optimal transport map.
Subgradient methods are the natural extension to the non-smooth case of the classical gradient descent for regular convex optimization problems. However, in general, they are characterized by slow convergence rates, and they require decreasing step-sizes to converge. In this paper we propose a subgradient method with constant step-size for composite convex objectives with -regularization. If the smooth term is strongly convex, we can establish a linear convergence result for the function values. This fact relies on an accurate choice of the element of the subdifferential used for the update, and on proper actions adopted when non-differentiability regions are crossed. Then, we propose an accelerated version of the algorithm, based on conservative inertial dynamics and on an adaptive restart strategy, that is guaranteed to achieve a linear convergence rate in the strongly convex case. Finally, we test the performances of our algorithms on some strongly and non-strongly convex examples.
In this paper, we study consensus-based optimisation (CBO), a versatile, flexible and customisable optimisation method suitable for performing nonconvex and nonsmooth global optimisations in high dimensions. CBO is a multi-particle metaheuristic, which is effective in various applications and at the same time amenable to theoretical analysis thanks to its minimalistic design. The underlying dynamics, however, is flexible enough to incorporate different mechanisms widely used in evolutionary computation and machine learning, as we show by analysing a variant of CBO which makes use of memory effects and gradient information. We rigorously prove that this dynamics converges to a global minimiser of the objective function in mean-field law for a vast class of functions under minimal assumptions on the initialisation of the method. The proof in particular reveals how to leverage further, in some applications advantageous, forces in the dynamics without loosing provable global convergence. To demonstrate the benefit of the herein investigated memory effects and gradient information in certain applications, we present numerical evidence for the superiority of this CBO variant in applications such as machine learning and compressed sensing, which en passant widen the scope of applications of CBO.
Consensus-based optimization (CBO) is a versatile multi-particle metaheuristic optimization method suitable for performing nonconvex and nonsmooth global optimizations in high dimensions. It has proven effective in various applications while at the same time being amenable to a theoretical convergence analysis. In this paper, we explore a variant of CBO, which incorporates truncated noise in order to enhance the well-behavedness of the statistics of the law of the dynamics. By introducing this additional truncation in the noise term of the CBO dynamics, we achieve that, in contrast to the original version, higher moments of the law of the particle system can be effectively bounded. As a result, our proposed variant exhibits enhanced convergence performance, allowing in particular for wider flexibility in choosing the noise parameter of the method as we confirm experimentally. By analyzing the time-evolution of the Wasserstein-2 distance between the empirical measure of the interacting particle system and the global minimizer of the objective function, we rigorously prove convergence in expectation of the proposed CBO variant requiring only minimal assumptions on the objective function and on the initialization. Numerical evidences demonstrate the benefit of truncating the noise in CBO.
In this paper we provide a novel analytical perspective on the theoretical understanding of gradient-based learning algorithms by interpreting consensus-based optimization (CBO), a recently proposed multi-particle derivative-free optimization method, as a stochastic relaxation of gradient descent. Remarkably, we observe that through communication of the particles, CBO exhibits a stochastic gradient descent (SGD)-like behavior despite solely relying on evaluations of the objective function. The fundamental value of such link between CBO and SGD lies in the fact that CBO is provably globally convergent to global minimizers for ample classes of nonsmooth and nonconvex objective functions, hence, on the one side, offering a novel explanation for the success of stochastic relaxations of gradient descent. On the other side, contrary to the conventional wisdom for which zero-order methods ought to be inefficient or not to possess generalization abilities, our results unveil an intrinsic gradient descent nature of such heuristics. This viewpoint furthermore complements previous insights into the working principles of CBO, which describe the dynamics in the mean-field limit through a nonlinear nonlocal partial differential equation that allows to alleviate complexities of the nonconvex function landscape. Our proofs leverage a completely nonsmooth analysis, which combines a novel quantitative version of the Laplace principle (log-sum-exp trick) and the minimizing movement scheme (proximal iteration). In doing so, we furnish useful and precise insights that explain how stochastic perturbations of gradient descent overcome energy barriers and reach deep levels of nonconvex functions. Instructive numerical illustrations support the provided theoretical insights.
In this paper we provide a rigorous convergence analysis for the renowned particle swarm optimization method by using tools from stochastic calculus and the analysis of partial differential equations. Based on a continuous-time formulation of the particle dynamics as a system of stochastic differential equations, we establish convergence to a global minimizer of a possibly nonconvex and nonsmooth objective function in two steps. First, we prove consensus formation of an associated mean-field dynamics by analyzing the time-evolution of the variance of the particle distribution, which acts as Lyapunov function of the dynamics. We then show that this consensus is close to a global minimizer by employing the asymptotic Laplace principle and a tractability condition on the energy landscape of the objective function. These results allow for the usage of memory mechanisms, and hold for a rich class of objectives provided certain conditions of well-preparation of the hyperparameters and the initial datum. In a second step, at least for the case without memory effects, we provide a quantitative result about the mean-field approximation of particle swarm optimization, which specifies the convergence of the interacting particle system to the associated mean-field limit. Combining these two results allows for global convergence guarantees of the numerical particle swarm optimization method with provable polynomial complexity. To demonstrate the applicability of the method we propose an efficient and parallelizable implementation, which is tested in particular on a competitive and well-understood high-dimensional benchmark problem in machine learning.
We consider the damped Newton method for strongly monotone and Lipschitz continuous operator equations in a variational setting. We provide a very accessible justification why the undamped Newton method performs better than its damped counterparts in a vicinity of a solution. Moreover, in the given setting, an adaptive step-size strategy be presented, which guarantees the global convergence and favours an undamped update if admissible.
In this paper we consider the problem of the optimal control of an ensemble of affine-control systems. After proving the well-posedness of the minimization problem under examination, we establish a $Gamma$-convergence result that allows us to substitute the original (and usually infinite) ensemble with a sequence of finite increasing-in-size sub-ensembles. The solutions of the optimal control problems involving these sub-ensembles provide approximations in the $L^2$-strong topology of the minimizers of the original problem. Using again a $Gamma$-convergence argument, we manage to derive a Maximum Principle for ensemble optimal control problems with end-point cost. Moreover, in the case of finite sub-ensembles, we can address the minimization of the related cost through numerical schemes. In particular, we propose an algorithm that consists of a subspace projection of the gradient field induced on the space of admissible controls by the approximating cost functional. In addition, we consider an iterative method based on the Pontryagin Maximum Principle. Finally, we test the algorithms on an ensemble of linear systems in mathbb{R^2}.
We consider a general nonsymmetric second-order linear elliptic PDE in the framework of the Lax-Milgram lemma. We formulate and analyze an adaptive finite element algorithm with arbitrary polynomial degree that steers the adaptive mesh-refinement and the inexact iterative solution of the arising linear systems. More precisely, the iterative solver employs, as an outer loop, the so-called Zarantonello iteration to symmetrize the system and, as an inner loop, a uniformly contractive algebraic solver, e.g., an optimally preconditioned conjugate gradient method or an optimal geometric multigrid algorithm. We prove that the proposed inexact adaptive iteratively symmetrized finite element method (AISFEM) leads to full linear convergence and, for sufficiently small adaptivity parameters, to optimal convergence rates with respect to the overall computational cost, i.e., the total computational time. Numerical experiments underline the theory.
In this paper, we propose consensus-based optimization for saddle point problems (CBO-SP), a novel multi-particle metaheuristic derivative-free optimization method capable of provably finding global Nash equilibria. Following the idea of swarm intelligence, the method employs a group of interacting particles, which perform a minimization over one variable and a maximization over the other. This paradigm permits a passage to the mean-field limit, which makes the method amenable to theoretical analysis and allows to obtain rigorous convergence guarantees under reasonable assumptions about the initialization and the objective function, which most notably include nonconvex-nonconcave objectives.
Subgradient methods are the natural extension to the non-smooth case of the classical gradient descent for regular convex optimization problems. However, in general, they are characterized by slow convergence rates, and they require decreasing step-sizes to converge. In this paper we propose a subgradient method with constant step-size for composite convex objectives with ℓ1-regularization. If the smooth term is strongly convex, we can establish a linear convergence result for the function values. This fact relies on an accurate choice of the element of the subdifferential used for the update, and on proper actions adopted when non-differentiability regions are crossed. Then, we propose an accelerated version of the algorithm, based on conservative inertial dynamics and on an adaptive restart strategy, that is guaranteed to achieve a linear convergence rate in the strongly convex case. Finally, we test the performances of our algorithms on some strongly and non-strongly convex examples.
©all images: LMU | TUM