Machine Learning

  1. [NeurIPS 25] Stab-SGD: Noise-Adaptivity in Smooth Optimization with Stability Ratios

    Neural Information Processing Systems, San Diego, 2025


    Abstract: In the context of smooth stochastic optimization with first order methods, we introduce the stability ratio of gradient estimates, as a measure of local relative noise level, from zero for pure noise to one for negligible noise. We show that a schedule-free variant (Stab-SGD) of stochastic gradient descent obtained by just shrinking the learning rate by the stability ratio achieves real adaptivity to noise levels (i.e. without tuning hyperparameters to the gradient’s variance), with all key properties of a good schedule-free algorithm: neither plateau nor explosion at intialization, and no saturation of the loss. We believe this theoretical development reveals the importance of estimating the local stability ratio in the construction of well-behaved (last-iterate) schedule-free algorithms, particularly when hyperparameter-tuning budgets are a small fraction of the total budget, since noise-adaptivity and cheaper horizon-free tuning are most crucial in this regime.

    Full text :  [ OpenReview ]

  2. [ICLR 24] Random Sparse Lifts: Con­struc­tion, Ana­ly­sis and Con­ver­gence of fi­ni­te sparse net­works

    International Conference on Learning Representations, Vienna, 2024


    Abstract: We present a framework to define a large class of neural networks for which, by construction, training by gradient flow provably reaches arbitrarily low loss when the number of parameters grows. Distinct from the fixed-space global optimality of non-convex optimization, this new form of convergence, and the techniques introduced to prove such convergence, pave the way for a usable deep learning convergence theory in the near future, without overparameterization assumptions relating the number of parameters and training samples. We define these architectures from a simple computation graph and a mechanism to lift it, thus increasing the number of parameters, generalizing the idea of increasing the widths of multi-layer perceptrons. We show that architectures similar to most common deep learning models are present in this class, obtained by sparsifying the weight tensors of usual architectures at initialization. Leveraging tools of algebraic topology and random graph theory, we use the computation graph’s geometry to propagate properties guaranteeing convergence to any precision for these large sparse models.

    Full text :  [ OpenReview ]

  3. [NeurIPS 22] Con­ver­gence be­yond the over­pa­ram­et­er­ized re­gi­me with Ray­leigh quot­ients

    Neural Information Processing Systems, New Orleans, 2022


    Abstract: We present a new strategy to prove the convergence of Deep Learning architectures to a zero training (or even testing) loss by gradient flow. Our analysis is centered on the notion of Rayleigh quotients in order to prove Kurdyka-Lojasiewicz inequalities for a broader set of neural network architectures and loss functions. We show that Rayleigh quotients provide a unified view for several convergence analysis techniques in the literature. Our strategy produces a proof of convergence for various examples of parametric learning. In particular, our analysis does not require the number of parameters to tend to infinity, nor the number of samples to be finite, thus extending to test loss minimization and beyond the over-parameterized regime.

    Full text & code :  [ OpenReview ] [ Github ]

  4. [NeurReps 22] Pe­rio­dic Sig­nal Re­covery with Re­gu­la­rized Sine Neu­ral Net­works

    Neural Information Processing Systems, Neur­Reps Work­shop, New Orleans, 2022


    Abstract: We consider the problem of learning a periodic one-dimensional signal with neural networks, and designing models that are able to extrapolate the signal well beyond the training window. First, we show that multi-layer perceptrons with ReLU activations are provably unable to perform this task, and lead to poor performance in practice even close to the training window. Then, we propose a novel architecture using sine activation functions along with a well-chosen non-convex regularization, that is able to extrapolate the signal with low error well beyond the training window. Our architecture is several orders of magnitude better than its competitors for distant extrapolation (beyond 100 periods of the signal), while being able to accurately recover the frequency spectrum of the signal in a multi-tone setting.

    Full text & code :  [ OpenReview ] [ Github ]

Patents

  1. Linear neural reconstruction for deep neural network compression

    WO 2020/236976 A1, filed with Interdigital, Palo Alto (CA)

    Nov 2020 (WIPO)


    Patent for the neural-network compression algorithm developed during the internship with Technicolor AI Lab / Interdigital. Layer-wise weight compression recast as a sequence of convex activation-reconstruction problems, preserving the output of the deep neural network while reducing its size, in memory and on disk.

    Google Patent page :  [ summary ][ pdf ]

Cybersecurity

  1. [Euro S&P 25] Attacking and Fixing the Android Protected Confirmation Protocol

    IEEE European Symposium on Security and Privacy, Venice, 2025


    Abstract: Android Protected Confirmation (APC) is an authentication protocol designed by Google. It leverages the extra security of the Trusted Execution Environment (TEE) to secure transactions even in the presence of a compromised OS. The intended security guarantee for APC is that if a transaction has been signed under APC, then the user must have previously given its explicit consent, even if an attacker has gained root access to the victim’s Android OS. In this paper, we present a security analysis of APC in the Universal Composability (UC) framework. We uncover two attacks on the design of the protocol which allow a root adversary to issue transactions without the user consenting to them. We provide an attack implementation on a Google Pixel phone, and propose light-weight fixes. Finally, we specify the ideal UC functionality capturing the intended security guarantees for APC, and prove that the fixed protocol UC-realizes it.

    Full text :  [ HAL ][ CISPA Link ]

  2. [ASIA CCS 20] Return-oriented programming on RISC-V

    ACM Asia Conference on Computer and Communications Security, Taipei, 2020


    Abstract: We provide the first analysis on the feasibility of Return-Oriented programming (ROP) on RISC-V, a new instruction set architecture targeting embedded systems. We show the existence of a new class of gadgets, using several Linear Code Sequences And Jumps (LCSAJ), undetected by current Galileo-based ROP gadget searching tools. We argue that this class of gadgets is rich enough on RISC-V to mount complex ROP attacks, bypassing traditional mitigation like DEP, ASLR, stack canaries, G-Free and some compiler-based backward-edge CFI, by jumping over any guard inserted by a compiler to protect indirect jump instructions. We provide examples of such gadgets, as well as a proof-of-concept ROP chain, using C code injection to leverage a privilege escalation attack on two standard Linux operating systems. Additionally, we discuss some of the required mitigations to prevent such attacks and provide a new ROP gadget finder algorithm that handles this new class of gadgets.

    Full text :  [ ACM Link ][ ArXiv ]