Paper deep dive
Stability-Preserving Online Adaptation of Neural Closed-loop Maps
Danilo Saccani, Luca Furieri, Giancarlo Ferrari-Trecate
Intelligence
Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 95%
Last extracted: 3/26/2026, 1:35:17 AM
Summary
The paper introduces a stability-preserving online adaptation mechanism for neural network-based controllers in nonlinear systems. By modeling controllers as causal operators with bounded lp-gain, the authors derive conditions for safe online updates, preventing the destabilization typically caused by switching between policies. They propose two practical schemes—time-scheduled and state-triggered—that guarantee closed-loop stability while allowing for continuous performance optimization.
Entities (6)
Relation Signals (3)
Danilo Saccani → authored → Stability-Preserving Online Adaptation of Neural Closed-loop Maps
confidence 100% · Stability-Preserving Online Adaptation of Neural Closed-loop Maps Danilo Saccani, Luca Furieri, Giancarlo Ferrari-Trecate
Neural Network Controller → achieves → lp-stability
confidence 90% · guarantee the closed-loop remains lp-stable after any number of updates
Internal Model Control (IMC) → parametrizes → Neural Network Controller
confidence 85% · IMC architecture parametrizing all stability-preserving controllers
Cypher Suggestions (2)
Find all researchers who authored the paper · confidence 95% · unvalidated
MATCH (r:Researcher)-[:AUTHORED]->(p:Paper {title: 'Stability-Preserving Online Adaptation of Neural Closed-loop Maps'}) RETURN r.nameIdentify control architectures used in the paper · confidence 90% · unvalidated
MATCH (c:ControlSystem)-[:ACHIEVES]->(s:StabilityCriterion) RETURN c.name, s.name
Abstract
Abstract:The growing complexity of modern control tasks calls for controllers that can react online as objectives and disturbances change, while preserving closed-loop stability. Recent approaches for improving the performance of nonlinear systems while preserving closed-loop stability rely on time-invariant recurrent neural-network controllers, but offer no principled way to update the controller during operation. Most importantly, switching from one stabilizing policy to another can itself destabilize the closed-loop. We address this problem by introducing a stability-preserving update mechanism for nonlinear, neural-network-based controllers. Each controller is modeled as a causal operator with bounded $\ell_p$-gain, and we derive gain-based conditions under which the controller may be updated online. These conditions yield two practical update schemes, time-scheduled and state-triggered, that guarantee the closed-loop remains $\ell_p$-stable after any number of updates. Our analysis further shows that stability is decoupled from controller optimality, allowing approximate or early-stopped controller synthesis. We demonstrate the approach on nonlinear systems with time-varying objectives and disturbances, and show consistent performance improvements over static and naive online baselines while guaranteeing stability.
Tags
Links
- Source: https://arxiv.org/abs/2603.22469v1
- Canonical: https://arxiv.org/abs/2603.22469v1
Full Text
55,687 characters extracted from source content.
Expand or collapse full text
Stability-Preserving Online Adaptation of Neural Closed-loop Maps Danilo Saccani, Luca Furieri, Giancarlo Ferrari-Trecate This work was supported as a part of NCCR Automation, a National Centre of Competence in Research, funded by the Swiss National Science Foundation (grant number 51NF40_225155), the Swiss National Science Foundation Ambizione (grant number PZ00P2_208951) and the NECON project (grant number 200021-219431).D. Saccani and G. Ferrari-Trecate are with the Institute of Mechanical Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland. (email: danilo.saccani, giancarlo.ferraritrecate@epfl.ch). L. Furieri is with the Department of Engineering Science, University of Oxford, United Kingdom (luca.furieri@eng.ox.ac.uk); Abstract The growing complexity of modern control tasks calls for controllers that can react online as objectives and disturbances change, while preserving closed-loop stability. Recent approaches for improving the performance of nonlinear systems while preserving closed-loop stability rely on time-invariant recurrent neural-network controllers, but offer no principled way to update the controller during operation. Most importantly, switching from one stabilizing policy to another can itself destabilize the closed-loop. We address this problem by introducing a stability-preserving update mechanism for nonlinear, neural-network-based controllers. Each controller is modeled as a causal operator with bounded ℓp _p-gain, and we derive gain-based conditions under which the controller may be updated online. These conditions yield two practical update schemes, time-scheduled and state-triggered, that guarantee the closed-loop remains ℓp _p-stable after any number of updates. Our analysis further shows that stability is decoupled from controller optimality, allowing approximate or early-stopped controller synthesis. We demonstrate the approach on nonlinear systems with time-varying objectives and disturbances, and show consistent performance improvements over static and naive online baselines while guaranteeing stability. I Introduction Modern control systems operate in increasingly dynamic settings, enabled by advances in sensing, onboard computation, and model-based design. While closed-loop stability is a core requirement, alone it is not enough, as applications also demand good tracking, efficiency, and constraint handling under time-varying operating conditions. Nonlinear Optimal Control (NOC) provides a natural framework to improve performance by minimizing a task-dependent cost [2]. However, enforcing stability while optimizing a general nonlinear cost is difficult, especially for complex and time-varying dynamics [20, 22]. Classical approaches such as dynamic programming and the maximum principle [2] face significant computational limitations, and existing stability guarantees typically require restrictive assumptions on the cost, which limits applicability in practice. Nonlinear Model Predictive Control (NMPC) offers an approximate solution to the NOC problem by implementing an implicit policy obtained from a finite-horizon problem solved at each time step in receding-horizon fashion [20]. While NMPC provides a structured way to handle constraints and nonlinearities, it typically imposes strict conditions on the cost to guarantee stability, which makes it difficult to apply in scenarios with rapidly changing objectives [10]. Neural networks (NNs) can parametrize rich families of high-performance feedback policies [4], but certifying closed-loop stability remains a nontrivial task. Indeed, generic N architectures do not lend themselves to the utilization of Lyapunov or small-gain conditions [24]. Recent works utilize System Level Synthesis (SLS) [12, 7], Internal Model Control (IMC) [23, 8], and Youla parametrizations [25, 1, 9] to improve nonlinear closed-loop performance while guaranteeing ℓp _p-stability. The key idea is to search over a space of stable ℒpL_p operators used to parametrize the controller. In practice, these operators are implemented as time-invariant neural parametrizations whose stability is certified for any choice of weights. The main limitation is that the controller is fixed after training: the weights are not updated online, so significant offline training is required to cover changing operating conditions. A key obstacle to online adaptation is that replacing one stabilizing policy with another may destabilize the transient closed loop, even when both policies are stabilizing in isolation, as known from switched-systems theory [15, 18, 5]. In this work, building on the neural IMC architecture of [8], we propose gain-budgeted triggering mechanisms, in both time-scheduled and state-triggered form, that certify when a newly optimized controller may safely replace the current one. The main contributions of this work are: (i) a gain-budgeted update condition, derived from finite-gain and small-gain arguments, that guarantees the closed-loop remains ℓp _p-stable under repeated online controller updates; (i) two implementable triggering mechanisms, one time-based and one state-based, that enforce this condition while trading off computational effort against adaptivity; and (i) evidence on two nonlinear benchmarks of consistent improvements over offline baselines and receding-horizon open-loop controllers. Notation We denote by ℕN the set of non-negative integers and ℕ≥aN_≥ a the integers a,a+1,…\a,a+1,…\. The set of all sequences =(v0,v1,v2,…)v=(v_0,v_1,v_2,…), where vt∈ℝnv_t ^n for all t∈ℕt , is denoted as ℓn ^n. Moreover, v belongs to ℓpn⊂ℓn ^n_p⊂ ^n with p∈[1,∞]p∈[1,∞] if ‖p=(∑t=0∞|vt|p)1p<∞\|v\|_p= ( _t=0^∞|v_t|^p ) 1p<∞, where |⋅||·| denotes an arbitrarily chosen vector norm. We say that ∈ℓ∞nv∈ ^n_∞ if supt‖vt‖<∞ _t\|v_t\|<∞. We refer to v0:Tv_0:T to denote the truncation of v with t ranging from 0 to T. An operator :ℓn→ℓmA: ^n→ ^m is said to be causal if ()=(A0(x0),A1(x0:1),…,At(x0:t),…)A(x)=(A_0(x_0),A_1(x_0:1),…,A_t(x_0:t),…). If At(x0:t)=At(0,x0:t−1),A_t(x_0:t)=A_t(0,x_0:t-1), then A is said to be strictly causal. An operator :ℓn→ℓmA: ^n→ ^m is ℓp _p-stable if it is causal and ()∈ℓpmA(a)∈ ^m_p for all ∈ℓpna∈ ^n_p. Equivalently, we write ∈ℒpA _p. We say that an ℒpL_p operator :↦A:w has a finite ℒpL_p-gain γp()>0 _p(A)>0 if ‖p≤γp()‖p\|u\|_p≤ _p(A)\|w\|_p, for all ∈ℓpnw∈ _p^n. When p is clear from the context, we simply write the gain as γ()>0γ(A)>0 and the norm as ‖\| v\|. I Preliminaries Let us consider the following discrete-time, time-varying nonlinear system: xt=ft−1(xt−1,ut−1)+wt,t=1,2,…,x_t=f_t-1(x_t-1,u_t-1)+w_t, t=1,2,…, (1) where xt∈ℝnx_t ^n is the state vector, ut∈ℝmu_t ^m is the control input, and wt∈ℝnw_t ^n represents an unknown process noise embedding the initial value of the system state by setting w0=x0w_0=x_0. Rewriting system (1) in operator form, we obtain: =(,)+,x=F(x,u)+w, (2) where :ℓn×ℓm→ℓnF: ^n× ^m→ ^n is a strictly causal operator defined as (,)=(0,f0(x0,u0),…,ft−1(xt−1,ut−1),…)F(x,u)= (0,f_0(x_0,u_0),…,f_t-1(x_t-1,u_t-1),… ). We model the system as the input-to-state operator :(⊓,⊒)↦§∈ℒ√, F:(u,w) _p, (3) and assume F is causal and admits a finite ℒpL_p-gain γ()<∞γ( F)<∞. This is satisfied when the plant is stable, or when ftf_t represents the dynamics of a system in closed-loop with a stabilizing controller K′K 111The latter case is the one considered in the experiments in Section V.. Additionally, we assume that ∈ℓpw∈ _p and that the process noise wtw_t follows a distribution D. To regulate system (1), we consider a nonlinear, time-varying state-feedback controller of the form: =()=(K0(x0),K1(x0:1),…,Kt(x0:t),…).u=K(x)=(K_0(x_0),K_1(x_0:1),…,K_t(x_0:t),…)\,. (4) where :ℓn→ℓmK: ^n→ ^m is a causal operator to be designed. Due to causality, each disturbance sequence ∈ℓnw∈ ^n induces unique trajectories in the closed-loop system (1)–(4) (see e.g. [8]). For a given system F and controller K, we denote by x[,] ^x[F,K] and u[,] ^u[F,K] the induced closed-loop operators mapping ↦w and ↦w , respectively. Thus, =[,](),=[,](),∀∈ℓn.x= ^x[F,K](w), = ^u[F,K](w), ∈ ^n. (,)F(x,u)x−-++Controlleru(,)F(x,u)(⋅) M(·)wx++++ wSystem Figure 1: IMC architecture parametrizing all stability-preserving controllers via a free operator ∈ℒ√ M _p. Our objective is to design a policy ()K(x) addressing the following problem: Problem 1 Design K by solving the infinite-horizon Nonlinear Optimal Control (NOC) problem: min(⋅) _K(·) [l(,)] _w [l(x,u) ] (5a) s.t. xt=ft−1(xt−1,ut−1)+wt,w0=x0, x_t=f_t-1(x_t-1,u_t-1)+w_t,\ \ \ w_0=x_0, (5b) ut=Kt(x0:t),∀t=0,1,…, u_t=K_t(x_0:t),\ \ ∀ t=0,1,…, (5c) ([,],[,])∈ℒp, ( ^x[F,K], ^u[F,K]) _p\,, (5d) where l:ℓn×ℓm→ℝl: ^n× ^m is any piecewise differentiable loss satisfying l(,)≥0l(x,u)≥ 0. This loss can be defined through a (possibly time-varying) stage cost, e.g., l(,)=∑t=0∞ct(xt,ut)l(x,u)= _t=0^∞c_t(x_t,u_t), and is meant to quantify infinite-horizon closed-loop performance. □ The expectation [⋅]E_w[·] captures the impact of disturbances w on the loss, while (5d) imposes a hard ℓp _p-stability constraint on the closed-loop system. It has been shown in [7, 8] that the NOC problem admits an equivalent formulation by exploiting an IMC control architecture based on an operator :ℓ\→ℓ⇕ M: ^n→ ^m to be designed. Specifically, all and only stability-preserving controllers, i.e., controllers verifying (5d), are parametrized by ∈ℒ√ M _p. The control architecture proposed in [7, 8] is shown in Figure 1 and incorporates a model of the system dynamics to estimate the disturbance w. The following theorem summarizes these results: Theorem 1 (adapted from Theorem 1 in [8]) Assume that the operator F is ℓp _p-stable, and consider the evolution of (2) where u is defined as =(§−(§,⊓)),u= M(x-F(x,u))\,, (6) for a causal operator :ℓ\→ℓ⇕ M: ^n→ ^m. Let K be the operator for which =()u=K(x) is equivalent to (6). The following two statements hold true. (i) If ∈ℒ√ M _p, then the closed-loop system is ℓp _p-stable. (i) If there is a causal policy C such that [,] ^x[F,C], [,]∈ℒp ^u[F,C] _p, then =⊓[,], M= ^u[F,C], gives =K=C. Theorem 1 implies that exploring the space of operators ∈ℒ√ M _p is sufficient to identify all and only stability-preserving policies, i.e. all policies satisfying (5d).This suggests that solving Problem 1 is equivalent to replacing =()u=K(x) in (5c) with (6) and min(⋅) _K(·) in (5a) with min∈ℒ√ _ M _p. However, solving Problem 1 through numerical methods requires some simplifications. First, searching within the infinite-dimensional space ℒpL_p is infeasible. This suggests considering subsets ⊂ℒ√ O _p of operators (θ) M(θ) depending on finitely many parameters θ∈ℝdθ ^d. Various methods for defining different sets O are provided in [21, 3]. With the structures proposed in these papers, the operators M can be modelled using time-invariant nonlinear dynamical systems embedding NNs. The formulations in [21, 17, 26] also allow imposing an explicit upper bound on the operator’s ℓ2 _2-gain, a feature we will exploit in Section I for control design. We highlight that the resulting operators can be written as (θ,⊒)=(ℳ′(θ,⊒′),…,ℳ⊔(θ,⊒′:⊔),…) M(θ, w)=(M_0(θ,w_0),…,M_t(θ,w_0:t),…), where ℳtM_t is obtained by unrolling a time-invariant dynamical system. In combination with this parametrization, to make the NOC problem (5) tractable, [8] proposes the following finite-dimensional optimization problem: minθ∈ℝd _θ ^d 1S∑s=1Sl~(x0:Ts,u0:Ts) 1SΣ^S_s=1 l(x^s_0:T,u^s_0:T) (7a) s.t. xts=ft−1(xt−1s,ut−1s)+wts,w0s=x0s, x_t^s=f_t-1(x^s_t-1,u^s_t-1)+w^s_t,\ \ \ w^s_0=x^s_0, uts=ℳt(θ,w0:ts),∀t=0,…,T. u^s_t=M_t(θ,w_0:t^s),\ \ ∀ t=0,…,T. (7b) In (7), compared to (5a) the expectation is replaced with an empirical average over S samples and the horizon is truncated at T. Crucially, stability is decoupled from optimality; indeed, from part (i) of Theorem 1, closed-loop ℓp _p stability is guaranteed even when (7) provides an approximated local minimizer. Although effective, this approach suffers from key limitations. First, no matter how large T is chosen, it may still be insufficient for time-varying costs. Second, the existing operator parametrizations leverage time-invariant dynamics, associated to a single value of the parameters, significantly limiting the ability of the closed-loop behavior to adapt to changing costs and environmental conditions. To address these points, in this paper, we investigate the following question: under which conditions can we safely replace the current controller with a newly updated controller (⟩)∈ M^(i)∈ O, i=1,2,…i=1,2,… without compromising stability? Answering this question enables performing online optimization to continuously update control policies. I Stability-Preserving Controller Updates via triggering conditions We propose to design a new class of time-varying controllers obtained by concatenating, over time, operators (θ)∈⊆ℒ√ M(θ)∈ O _p, where each (θ) M(θ) is time-invariant and parametrized by a finite-dimensional parameter vector θ∈ℝdθ ^d. Our goal is to derive conditions under which the resulting switched closed loop remains ℓp _p-stable. We denote with ti∈ℕt_i , i=1,2,…i=1,2,… the (possibly infinitely many) discrete time instants at which the operator (θ) M(θ) is updated. Given an update schedule tii≥0\t_i\_i≥ 0 and corresponding operators (⟩)⟩≥′\ M^(i)\_i≥ 0, we define the resulting time-varying controller as =~(,)u= M(x,w), where for any t∈[ti,ti+1)t∈[t_i,t_i+1) the composite operator ~ M is defined as (~(,))t:=ℳt−ti(i)(θ(i),0:t−ti(i)),( M(x,w))_t\;:=\;M^(i)_t-t_i(θ^(i),z^(i)_0:t-t_i), where (i):=(xti,wti+1,wti+2,…)z^(i):=(x_t_i,w_t_i+1,w_t_i+2,…) and the operator (⟩) M^(i) is (re)initialized with a zero internal state at tit_i. This generalizes the w0=x0w_0=x_0 convention: the first input is the state at tit_i, followed by disturbances. We call ~ M the concatenation of the (⟩) M^(i)’s. At each update step, a new control law, to be used over the interval ti:ti+1−1t_i:t_i+1-1, is obtained by solving the optimization problem: minθ(i)∈ℝd _θ^(i) ^d 1S∑s=1Sl~(xti:ti+Hs,uti:ti+Hs) 1SΣ^S_s=1 l(x^s_t_i:t_i+H,u^s_t_i:t_i+H) (8) s.t. xtis=xti,ztis=xti, x^s_t_i=x_t_i,\ \ z^s_t_i=x_t_i, xti+ks=fti+k−1(xti+k−1s,uti+k−1s)+wti+ks, x_t_i+k^s=f_t_i+k-1(x^s_t_i+k-1,u^s_t_i+k-1)+w^s_t_i+k, zti+ks=wti+ks,k=1,…,H,s=1,…,S, z_t_i+k^s=w^s_t_i+k,\ \ k=1,…,H,\ s=1,…,S, uti+ks=ℳk(i)(θ(i),zti:ti+ks),k=0,…,H, u^s_t_i+k=M_k^(i)(θ^(i),z_t_i:t_i+k^s),\ \ k=0,…,H, s=1,…,S, \ \ s=1,…,S, where the cost l~ l is the truncation to horizon H∈ℕH of the infinite-horizon cost l(,)l(x,u). Although solving problem (8) at any given time tit_i yields an operator (⟩)(θ(⟩))∈⊆ℒ√ M^(i)(θ^(i))∈ O _p, which would guarantee ℓp _p-stability if applied from tit_i onward, repeatedly solving (8) and varying the controller over time results in a switched closed loop. It is well known that switching between individually stabilizing controllers can itself destabilize the closed loop [14, 16, 11], which necessitates a principled update mechanism. To make online updating safe, we introduce a structured update mechanism. The idea is to allow a new operator (⟩) M^(i) to take effect only when a prescribed triggering condition is satisfied. As desired, this will ensure that the closed loop induced by the concatenated controller ~ M remains ℓp _p-stable, even though the controller is updated over time through the sub-operators (⟩)(θ)∈ M^(i)(θ)∈ O. We consider triggering instants ti∈ℕt_i defined by t0=0,ti+1=ti+μ(xti),i∈ℕ≥0,t_0=0, t_i+1=t_i+μ(x_t_i), i _≥ 0, (9) where μ:ℝn→ℕ≥1μ:R^n _≥ 1 is a triggering function to be designed for ensuring closed-loop stability. Our main result is as follows. Theorem 2 (Stable update of the controller) For a given γ¯ γ, let (⟩)∈ M^(i)∈ O be the control operators updated at the triggering instants tit_i given by (9), and verifying γ((⟩))≤γ¯,∀⟩≥′γ( M^(i))≤ γ,\ ∀ i≥ 0. If the ℒpL_p gains γ((⟩))γ( M^(i)) satisfy γ()⋅(γ((⟩))+∞)ϵ(⟩)≤∇(⟩),∀⟩≥∞,γ( F)·(γ( M^(i))+1)ε^(i)≤ r^(i), ∀ i≥ 1, (10) for some non-negative scalar sequence =r(i)i=1∞∈ℓp r=\r^(i)\_i=1^∞∈ _p and thresholds ϵ(i)>0ε^(i)>0, and if the controller update is triggered by |xti|≤ϵ(i),|x_t_i|≤ε^(i), (11) then, for every disturbance sequence ∈ℓpw∈ _p, the closed-loop trajectories generated by the concatenated controller ~ M satisfy ∈ℓpx∈ _p and ∈ℓpu∈ _p. In particular, the resulting time-varying closed loop is ℓp _p-stable. The proof is reported in Appendix -A. Theorem 2 highlights a trade-off between the update condition, which bounds the state at switching times, and the intensity of the control action, governed by the gain of (⟩) M^(i). Indeed, (10) yields ϵ(i)≤r(i)γp()(γ√((⟩))+∞),ε^(i)\;≤\; r^(i) _p( F)\,( _p( M^(i))+1)\,, and, therefore, the looser the update triggers (larger ϵ(i)ε^(i)), the gentler the controllers (smaller γp((⟩)) _p( M^(i))). Conversely, larger controller gains amplify both state and input on each inter-update window. Note that (11) is an admissibility condition: an update at time tit_i is only allowed if the condition is met. We do not require tit_i to be the first time this condition becomes true; any tit_i satisfying (11) is permissible. In the next section, we show how to select r(i)i≥1\r^(i)\_i≥ 1, ϵ(i)i≥1\ε^(i)\_i≥ 1, and the triggering mechanism (9) so as to manage this trade-off. IV Implementation In this section, we derive practical update rules from Theorem 2. An update is admissible if the state at switching is sufficiently small and the gain of the new operator is compatible with that state through (10)–(11). The key design quantities are the thresholds ϵ(i)ε^(i) and the sequence =r(i)r=\r^(i)\: larger thresholds make updates easier to trigger but force smaller gains, whereas smaller thresholds allow more aggressive controllers but delay the update. We consider a time-scheduled scheme and a state-triggered scheme. Both require an upper bound γ^() γ( F) on the gain of the stable or pre-stabilized plant; conservatism only makes updates more cautious. In the ℓ2 _2 case, such a bound can be obtained from a storage inequality under the convention w0=x0w_0=x_0. Since stability is enforced by the gain condition rather than by optimality, problem (8) may be solved only approximately, for instance using shorter horizons, fewer samples, or budgeted warm-started gradient steps. A practical discussion on choosing r can be found in Appendix -B. IV-A Algorithm 1: Predefined Update Times We first consider a time-scheduled update mechanism, where μ in (9) is assumed to be constant, in which the designer selects (i) an update period topt∈ℕ≥1t_opt _≥ 1 and (i) a nonnegative budget sequence r(i)i≥1∈ℓp\r^(i)\_i≥ 1∈ _p. Updates are only attempted every toptt_opt steps: t0=0,ti+1=ti+topt,i∈ℕ≥1.t_0=0, t_i+1=t_i+t_opt, i _≥ 1. At each scheduled time tit_i, we (a) measure the current state and set ϵ(i):=|xti|ε^(i):=|x_t_i|, and (b) check if an update is admissible under Theorem 2. Rearranging (10) yields an upper bound on the allowable gain of the new controller: 0<γp((⟩))≤∇(⟩)γ√()ϵ(⟩)−∞.0< _p( M^(i))\;≤\; r^(i) _p( F)\,ε^(i)-1. (12) We also require γp((⟩))≤γ¯ _p( M^(i))≤ γ, where γ¯ γ is the uniform gain bound assumed in Theorem 2. If at tit_i there is γp((⟩)) _p( M^(i)) satisfying (12), we solve (8) (with that gain bound enforced) to obtain new parameters θ(i)θ^(i) for (⟩) M^(i), and we apply this controller on [ti,…,ti+1−1][t_i,…,t_i+1-1]. If it is infeasible (e.g., because |xti||x_t_i| is too large or r(i)r^(i) is too small), we keep the previous controller for the next interval, (⟩)=(⟩−∞) M^(i)= M^(i-1), which preserves ℓp _p-stability. Whenever there is γp((⟩)) _p( M^(i)) satisfying (12), Theorem 2 guarantees that we can update the controller again and the resulting time-varying closed-loop remains ℓp _p-stable. In summary, topt∈ℕ≥1t_opt _≥ 1 is freely chosen (with no minimum dwell-time constraint) and determines how often we attempt an update and r(i)r^(i) sets how aggressive the i-th update is allowed to be: for a given state norm |xti||x_t_i|, larger r(i)r^(i) allows larger controller gain and thus potentially faster transients. IV-B Algorithm 2: State-Dependent Triggering The second scheme removes the fixed update period. Instead, the designer specifies (i) an admissible gain level γp() _p( M) that upper-bounds the ℒpL_p-gain of any controller to be used, and (i) a nonnegative sequence r(i)i≥1∈ℓp\r^(i)\_i≥ 1∈ _p. Given γp() _p( M) and r(i)r^(i), we define the admissible state threshold for the i-th update as ϵ(i)=r(i)γp()(γ√()+∞),ε^(i)\;=\; r^(i) _p( F) ( _p( M)+1 ), which is exactly the feasibility condition of Theorem 2. We then set tit_i to be the first time after ti−1t_i-1 such that |xti|≤ϵ(i).|x_t_i|\;≤\;ε^(i). At that instant tit_i, we solve (8) to obtain a new controller (⟩) M^(i) with γp((⟩))≤γ√() _p( M^(i))≤ _p( M), and keep it active until the next trigger time ti+1t_i+1. This “self-triggered” rule inverts Algorithm 1. Rather than updating on a fixed schedule and then checking if the state is small enough, we wait until the state is small enough to admit an update of the operator M. This can reduce unnecessary recomputations when the state is still large.The only design degrees of freedom are therefore the gain bound γp() _p( M) and the budget sequence r(i)\r^(i)\, which together determine when updates occur. Again, enforcing the gain cap γp((⟩))≤γ√() _p( M^(i))≤ _p( M) and trigger condition implies ℓp _p-stability by Theorem 2. V Numerical Example In this section, we demonstrate the application of the proposed approach to robotic tasks. First, we apply our solution to the mountains problem from [8, 19] to highlight the advantages of our method. Then, we present a tracking problem in a time-varying environment with dynamic obstacles. We consider a point-mass system of mass m∈ℝ+m _+ subject to nonlinear friction forces: [qt+1vt+1]=[qt+Tsvtvt+Tsm−1(Ft+G(vt))]+wt, bmatrixq_t+1\\ v_t+1 bmatrix= bmatrixq_t+T_sv_t\\ v_t+T_sm^-1 (F_t+G(v_t) ) bmatrix+w_t, (13) where qt∈ℝ2q_t ^2 and vt∈ℝ2v_t ^2 represent the position and velocity of the system, respectively, while Ft∈ℝ2F_t ^2 is the control input. The sampling time is denoted as Ts=0.05T_s=0.05 seconds. The term wt∈ℝ4w_t ^4 represents process noise. The nonlinear damping term is defined as: G(vt)=−b1vt+b2tanh(vt),G(v_t)=-b_1v_t+b_2 (v_t), where 0<b2<b10<b_2<b_1. To parameterize the operators (⟩)∈ M^(i)∈ O, we employed a Recurrent Equilibrium Network (REN) [21] with prescribed gain. Problem (8) is solved using backpropagation through time to compute gradients (see [8]), and the Adam optimizer to update the parameters via stochastic gradient descent. The approach is implemented using PyTorch, and the code to reproduce the examples is available at: https://github.com/DecodEPFL/Online_neurSLS.git. Figure 2: Qualitative closed-loop behavior of Algorithm 1 in the two case studies. (Left) Mountains problem at τ=0.8sτ=0.8\,s. Green: obstacles. Colored lines: predicted trajectories over [τ,τ+1.25][τ,τ+1.25]; black: executed trajectory over [0,τ)[0,τ); gray dashed: predicted continuation after τ. Colored disks show agent positions and radii. (Right) Dynamic-obstacles problem at τ1=9.3s _1=9.3\,s. Gray: executed trajectory; red dash-dot: reference; blue: predicted trajectory over [τ1,τ1+0.5][ _1, _1+0.5]; green: obstacle positions at τ1 _1. V-A Mountains problem The mountains scenario depicted in Figure 2 features two agents aiming to collaboratively navigate through a narrow valley. Each agent j∈1,2j∈\1,2\ evolves according to (13) and has a target position q¯j∈ℝ2 q_j ^2 that must be reached with zero velocity (i.e., v¯j=02 v_j=0_2). We stack both agents’ states, so the overall state is xt=[qt,1,vt,1,qt,2,vt,2]⊤∈ℝ8x_t=[q_t,1,v_t,1,q_t,2,v_t,2] ^8 and x¯ x stacks the target positions and zero velocities. We study the regulation problem in error coordinates with respect to the desired equilibrium q¯j,v¯j q_j, v_j. We apply a baseline linear feedback K′=diag(k1,k2)K =diag(k_1,k_2) with gains k1,k2>0k_1,k_2>0, around the equilibrium, and we set Ft,j=K′(q¯j−qt,j)+ut,jF_t,j=K ( q_j-q_t,j)+u_t,j, where ut,ju_t,j is an auxiliary input. We treat ut=[ut,1,ut,2]⊤u_t=[u_t,1,u_t,2] as the control signal to be optimized. Under this pre-stabilization, the closed-loop error dynamics can be written in the form of (1), with state et=xt−x¯∈ℝ8e_t=x_t- x ^8, input ut=[ut,1,ut,2]⊤∈ℝ4u_t=[u_t,1,u_t,2] ^4, and disturbance wt∈ℝ8w_t ^8. The associated input-to-state operator ⌉(⊓,⊒)↦⌉ F_e( u, w) e has finite ℓ2 _2-gain. An upper bound on this gain is obtained from a discrete-time bounded-real LMI [6]; see Appendix -C1. In this example, we set p=2p=2, and we model the disturbance sequence w as Gaussian noise with exponential decay. The loss function is defined as: l~(e0:T,u0:T) l(e_0:T,u_0:T) =∑t=0T(ltraj(et,ut)+lca(et)+lobs(et)), =Σ^T_t=0 (l_traj(e_t,u_t)+l_ca(e_t)+l_obs(e_t) ), where ltraj(e,u)=[e⊤u⊤]Q[e⊤u⊤]⊤l_traj(e,u)=[e \ u ]Q[e \ u ] with Q⪰0Q 0 penalizing deviation from the targets and control effort. The terms lca(e)l_ca(e) and lobs(e)l_obs(e) are barrier functions that prevent collisions between agents and obstacles (see [8] for their definitions). We compare with the approach proposed in [8], where the controller is parametrized by an operator ∈ℒ∈ M _2 obtained offline over a horizon T=100T=100, matching the total simulation time used for our online controller. Figure 2 (left) illustrates closed-loop trajectories obtained using Algorithm 1, which attempts an update to a new operator (⟩) M^(i) at fixed intervals of topt=2t_opt=2 time steps. Problem (8) is solved online to update the operator with a horizon of H=25H=25 and S=3S=3 disturbance realizations. Two scenarios are simulated: the nominal scenario replicating the example in [8], and a perturbed scenario involving abrupt impulse disturbances. Specifically, the local disturbance wt,jw_t,j is replaced by: wt,j+0.3δ(t−1)+0.3δ(t−8),j=1,2,w_t,j+0.3δ(t-1)+0.3δ(t-8), j=1,2, (14) where δ is the Kronecker delta. Over 50 runs, our method reduces the average cost by 35.1% (nominal) and 40.3% (perturbed) versus [8], indicating improved adaptation to abrupt, previously unseen disturbance events; boxplots are reported in Appendix -C. V-B Dynamic obstacles In the second scenario (Figure 2), an agent tracks a circular path while avoiding dynamic obstacles. We track a reference trajectory xref,tx_ref,t by applying a baseline stabilizing feedback K′K together with a feedforward term uref,tu_ref,t. We define the tracking error et=xt−xref,t∈ℝ4e_t=x_t-x_ref,t ^4 and write the total input as Ft=uref,t−K′et+utF_t=u_ref,t-K e_t+u_t, where utu_t is an auxiliary control signal that we optimize online. Under this pre-stabilization, the closed-loop error dynamics for ete_t can be written as a nonlinear system et=fe(et−1,ut−1)+wte_t=f_e(e_t-1,u_t-1)+w_t, which matches the general form of (1) with state ete_t, input utu_t, and disturbance wtw_t. As in the previous example, the gain of the associated input-to-state operator ⌉(⊓,⊒)↦⌉ F_e( u, w) e is computed upper-bounding the worst-case realization of the nonlinearity. Two dynamic obstacles move on the y-axis following a perturbed sinusoidal trajectory: yt,jobs=Asin((2πψj+ηj)t+ϕj)+y0,jobs,j=1,2. y^obs_t,j=A ((2π _j+ _j)t+ _j)+y^obs_0,j, j=1,2. (15) where ψj _j is the nominal frequency, ηj∼(0,0.01) _j (0,0.01) is a random frequency perturbation, and ϕj _j is a random phase. The finite-horizon objective l~ l in (8) is now time-varying because the obstacles move. We define l~(e0:T,u0:T) l(e_0:T,u_0:T) =∑t=0T(ltraj(et,ut)+lobs(et,t)). =Σ^T_t=0 (l_traj(e_t,u_t)+l_obs(e_t,t) ). where ltrajl_traj is a positive semidefinite quadratic penalty on tracking error and control effort, and lobs(et,t)l_obs(e_t,t) penalizes collisions with the moving obstacle. The system is simulated for 300300 time steps under a disturbance sequence generated at each step with bounded amplitude and interpreted as an element of ℓ2 _2 by zero extension beyond the simulation horizon. In other words, the disturbance is persistent over the considered horizon, but has finite energy on the full infinite sequence. Accordingly, in this experiment we set p=2p=2, consistently with the certified ℓ2 _2-gain of the REN controller class. We compare against (i) the offline controller of [8] and (i) a receding-horizon open-loop (RHO) planner that solves at time t the problem minu0:H−1∑s=0Sl~(e0:H,u0:H) _u_0:H-1\ _s=0^S l(e_0:H,u_0:H) s.t. ek+1s=fe(eks,uk)+wks,e0s=et,∀s,s∼, e_k+1^s=f_e(e_k^s,u_k)+w_k^s,\ e^s_0=e_t,\ ∀ s,\ w^s D, and applies the first input ut=u0∗u_t=u^*_0, advances the state, and re-solves at t+1t+1. For a fair comparison, all approaches use the same horizon (H=10H=10) and cost weights. Algorithm 1 attempts updates M every topt=1t_opt=1 time step, using a budgeted, inexact solution (100 epochs of gradient descent warm-started from the previous parameters). This is admissible under Theorem 2 because stability is guaranteed even without reaching local optimality. The right panel in Figure 2 shows the closed-loop trajectory (in grey) and the predicted trajectories (in blue). Persistent disturbances make perfect tracking unachievable, and apparent overlaps between moving obstacles and past trajectory segments are only visualization artifacts. Over 50 runs, the proposed approach achieves lower cost than both RHO and the offline controller from [8], while also exhibiting smaller variability; the corresponding boxplots are reported in Appendix -C. Overall, the results indicate that online adaptation with closed-loop prediction improves both performance and reliability relative to the two baselines. VI Conclusions We introduced a framework for iteratively updating neural-network-based controllers while maintaining closed-loop stability via gain-based update conditions. This opens several research directions. First, robustness margins against model-plant mismatch could be embedded directly into the update logic, preserving stability under imperfect models. Second, online optimization could be used to update both the control operator and the system model used by the controller. Finally, connecting these ideas to reinforcement learning by enforcing our gain-based admissibility condition during policy improvement may offer stability guarantees during learning, not only after convergence. References [1] N. H. Barbara, R. Wang, and I. R. Manchester (2023) Learning over contracting and lipschitz closed-loops for partially-observed nonlinear systems. In 2023 62nd IEEE Conference on Decision and Control (CDC), p. 1028–1033. Cited by: §I. [2] D. P. Bertsekas (1995) Dynamic programming and optimal control. Athena Scientific. Cited by: §I. [3] F. Bonassi, C. Andersson, P. Mattsson, and T. B. Schön (2024) Structured state-space models are deep wiener models. IFAC-PapersOnLine 58 (15), p. 247–252. Cited by: §I. [4] G. Cybenko (1989) Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems. Cited by: §I. [5] C. De Persis, R. De Santis, and A. S. Morse (2003) Switched nonlinear systems with state-dependent dwell-time. Systems & Control Letters 50 (4), p. 291–302. Cited by: §I. [6] C. E. de Souza and L. Xie (1992) On the discrete-time bounded real lemma with application in the characterization of static state feedback h∞ controllers. Systems & Control Letters 18 (1). Cited by: §V-A. [7] L. Furieri, C. L. Galimberti, and G. Ferrari-Trecate (2022) Neural system level synthesis: learning over all stabilizing policies for nonlinear systems. In 2022 IEEE 61st Conference on Decision and Control (CDC), p. 2765–2770. Cited by: §I, §I. [8] L. Furieri, C. L. Galimberti, and G. Ferrari-Trecate (2024) Learning to boost the performance of stable nonlinear systems. IEEE Open Journal of Control Systems. Cited by: Figure 4, Figure 4, §I, §I, §I, §I, §I, §V-A, §V-A, §V-B, §V-B, §V, §V, Theorem 1. [9] C. L. Galimberti, L. Furieri, and G. Ferrari-Trecate (2025) Parametrizations of all stable closed-loop responses: from theory to neural network control design. Annual Reviews in Control 60, p. 101012. External Links: ISSN 1367-5788, Document Cited by: §I. [10] L. Grüne and J. Pannek (2017) Nonlinear model predictive control. Springer. Cited by: §I. [11] W. P. Heemels, K. H. Johansson, and P. Tabuada (2012) An introduction to event-triggered and self-triggered control. In 2012 IEEE 51st Conference on Decision and Control (CDC), Cited by: §I. [12] D. Ho (2020) A system level approach to discrete-time nonlinear systems. In 2020 American Control Conference (ACC), Cited by: §I. [13] Z. Jiang and Y. Wang (2001) Input-to-state stability for discrete-time nonlinear systems. Automatica 37 (6), p. 857–869. Cited by: Corollary 1. [14] D. Liberzon and A. S. Morse (1999) Basic problems in stability and design of switched systems. IEEE control systems magazine. Cited by: §I. [15] D. Liberzon Switching in systems and control. Vol. 190, Springer. Cited by: §I. [16] H. Lin and P. J. Antsaklis (2009) Stability and stabilizability of switched linear systems: a survey of recent results. IEEE Transactions on Automatic control 54 (2), p. 308–322. Cited by: §I. [17] L. Massai and G. Ferrari-Trecate (2025) Free parametrization of l2-bounded state space models. arXiv preprint arXiv:2503.23818. Cited by: §I. [18] A. S. Morse (1997) Control using logic-based switching. Trends in control: A European perspective, p. 69–113. Cited by: §I. [19] D. Onken, L. Nurbekyan, X. Li, S. W. Fung, S. Osher, and L. Ruthotto (2021) A neural network approach applied to multi-agent optimal control. In 2021 European Control Conference (ECC), Cited by: §V. [20] J. B. Rawlings, D. Q. Mayne, and M. M. Diehl (2017) Model predictive control: theory, computation, and design. 2nd edition, Nob Hill Publishing. Cited by: §I, §I. [21] M. Revay, R. Wang, and I. R. Manchester (2023) Recurrent equilibrium networks: flexible dynamic models with guaranteed stability and robustness. IEEE Transactions on Automatic Control. Cited by: §I, §V. [22] D. Saccani, L. Fagiano, M. N. Zeilinger, and A. Carron (2023) Model predictive control for multi-agent systems under limited communication and time-varying network topology. In 2023 62nd IEEE Conference on Decision and Control (CDC), p. 3764–3769. Cited by: §I. [23] D. Saccani, L. Massai, L. Furieri, and G. Ferrari-Trecate (2024) Optimal distributed control with stability guarantees by training a network of neural closed-loop maps. In 2024 IEEE 63rd Conference on Decision and Control (CDC), p. 3776–3781. Cited by: §I. [24] E. D. Sontag (1993) Neural networks for control. In Essays on Control: Perspectives in the Theory and its Applications, Cited by: §I. [25] R. Wang and I. R. Manchester (2022) Youla-ren: learning nonlinear feedback policies with robust stability guarantees. In 2022 American Control Conference (ACC), p. 2116–2123. Cited by: §I. [26] M. Zakwan and G. Ferrari-Trecate (2024) Neural port-hamiltonian models for nonlinear distributed control: an unconstrained parametrization approach. arXiv preprint arXiv:2411.10096. Cited by: §I. -A Proof of Theorem 2 and Input to State stability relation We first establish an upper bound for the state and input norms when applying (′) M^(0) on the first interval [t0,t1−1][t_0,t_1-1], where t0=0t_0=0. From the definition of F in (2) and the controller ~ M, the controller input is (0)=(x0,w1,…)z^(0)=(x_0,w_1,…). From (1), one has w0=x0w_0=x_0, so (0)=z^(0)=w. Let x¯0=x0:t1−1 x_0=x_0:t_1-1 and w¯0=w0:t1−1 w_0=w_0:t_1-1. ‖0:t1−1‖ \|u_0:t_1-1\| ≤γ((′))‖⊒¯′‖ ≤γ( M^(0))\| w_0\| (16) ‖x¯0‖ \| x_0\| ≤γ()(∥⊓′:⊔∞−∞∥+∥⊒¯′∥) ≤γ( F)(\|u_0:t_1-1\|+\| w_0\|) (17) Substituting (16) into (17) yields: ‖x¯0‖≤γ()(γ((′))+∞)⏟C0‖w¯0‖.\| x_0\|≤ γ( F)(γ( M^(0))+1)_C_0\| w_0\|. (18) Now consider the evolution on a subsequent window i≥1i≥ 1 on the interval [ti,ti+1−1][t_i,t_i+1-1]. Let x¯i=xti:ti+1−1 x_i=x_t_i:t_i+1-1, u¯i=uti:ti+1−1 u_i=u_t_i:t_i+1-1, w¯i=wti:ti+1−1 w_i=w_t_i:t_i+1-1, and z¯i=(xti,wti+1,…,wti+1−1) z_i=(x_t_i,w_t_i+1,…,w_t_i+1-1). We obtain ‖x¯i‖p≤γp()|§⊔⟩|+γ√()(‖⊓¯⟩‖√+‖⊒¯⟩‖√)\| x_i\|_p≤ _p( F)|x_t_i|+ _p( F)(\| u_i\|_p+\| w_i\|_p) (19) The controller’s output u¯i u_i is bounded by its gain with zero initial condition: ‖u¯i‖p≤γp((⟩))‖‡¯⟩‖√.\| u_i\|_p≤ _p( M^(i))\,\| z_i\|_p. (20) The input z¯i z_i is bounded using the triangle inequality: ‖z¯i‖p≤|xti|+‖wti+1:ti+1−1‖p≤|xti|+‖w¯i‖p\| z_i\|_p≤|x_t_i|+\|w_t_i+1:t_i+1-1\|_p≤|x_t_i|+\| w_i\|_p (21) Substitute (20) and (21) into (19) and grouping terms by |xti||x_t_i| and ‖w¯i‖p\| w_i\|_p yields: ‖x¯i‖p \| x_i\|_p ≤γp()(γ√((⟩))+∞)|§⊔⟩| ≤ _p( F)( _p( M^(i))+1)|x_t_i| +γp()(γ√((⟩))+∞)‖⊒¯⟩‖√ + _p( F)( _p( M^(i))+1)\| w_i\|_p Now, apply the triggering condition (11), |xti|≤ϵ(i)|x_t_i|≤ε^(i), and the gain condition (10), γp()(γ√((⟩))+∞)ϵ(⟩)≤∇(⟩) _p( F)( _p( M^(i))+1)ε^(i)≤ r^(i): ‖x¯i‖p \| x_i\|_p ≤r(i)+C‖w¯i‖p, ≤ r^(i)+C\| w_i\|_p, (22) where C=γp()(γ¯+∞)C= _p( F)( γ+1) (using γ¯ γ as an upper bound). We now analyze the total norm of =(x¯0,x¯1,…)x=( x_0, x_1,…). Case: 1≤p<∞1≤ p<∞. Summing the p-th power of (22) over all windows i: ‖p \|x\|_p^p =‖x¯0‖p+∑i=1∞‖x¯i‖p =\| x_0\|_p^p+ _i=1^∞\| x_i\|_p^p (23) ≤(C0‖w¯0‖p)p+∑i=1∞(r(i)+C‖w¯i‖p)p ≤(C_0\| w_0\|_p)^p+ _i=1^∞ (r^(i)+C\| w_i\|_p )^p Let ~=(‖w¯1‖p,‖w¯2‖p,…) w=(\| w_1\|_p,\| w_2\|_p,…). The sum in (23) is the p-norm of a sum of sequences in ℓp _p: ‖p \|x\|_p^p ≤(C0‖w¯0‖p)p+(‖+C~‖p)p ≤(C_0\| w_0\|_p)^p+ (\|r+C w\|_p )^p ≤(C0‖w¯0‖p)p+(‖p+C‖~‖p)p ≤(C_0\| w_0\|_p)^p+ (\|r\|_p+C\| w\|_p )^p Using (a+b)p≤2p−1(ap+bp)(a+b)^p≤ 2^p-1(a^p+b^p), which holds for p∈ℕ≥1p _≥ 1 on the second term: ‖p≤(C0‖w¯0‖p)p+2p−1(‖p+Cp‖~‖p).\|x\|_p^p≤(C_0\| w_0\|_p)^p+2^p-1(\|r\|_p^p+C^p\| w\|_p^p). The windows w¯i w_i are disjoint. Let ′=(‖w¯0‖,~)w =(\| w_0\|, w). Then ‖′‖p=∑i=0∞‖w¯i‖p=‖p\|w \|_p^p= _i=0^∞\| w_i\|_p^p=\|w\|_p^p. The total norm is bounded by: ‖p≤2p−1((C0‖w¯0‖p)p+‖p+Cp‖~‖p) \|x\|_p^p≤ 2^p-1 ((C_0\| w_0\|_p)^p+\|r\|_p^p+C^p\| w\|_p^p ) ≤2p−1(max(C0,C)p∥p+∥p) ≤ 2^p-1 ( (C_0,C)^p\|w\|_p^p+\|r\|_p^p ) Since ,∈ℓpr,w∈ _p, the right-hand side is finite, so we conclude that ‖p<∞\|x\|_p<∞. The same arguments applied to (20) show ‖p<∞\| u\|_p<∞. Thus, the system is ℓp _p-stable. Case: p=∞p=∞. The bound (22) (which also holds for p=∞p=∞) becomes: ‖x¯i‖∞≤r(i)+C‖w¯i‖∞.\| x_i\|_∞≤ r^(i)+C\| w_i\|_∞. Taking the supremum over all i∈ℕ≥1i _≥ 1: supi∈ℕ≥1‖x¯i‖∞ _i _≥ 1\| x_i\|_∞ ≤supi∈ℕ≥1r(i)+supi∈ℕ≥1(C‖w¯i‖∞) ≤ _i _≥ 1r^(i)+ _i _≥ 1(C\| w_i\|_∞) The total state norm is ‖∞=max(‖x¯0‖∞,supi∈ℕ≥1‖x¯i‖∞)\|x\|_∞= (\| x_0\|_∞, _i _≥ 1\| x_i\|_∞). From (18), ‖x¯0‖∞≤C0‖w¯0‖∞\| x_0\|_∞≤ C_0\| w_0\|_∞, and hence ‖∞≤max(C0‖w¯0‖∞,‖∞+Csupi∈ℕ≥1‖w¯i‖∞)\|x\|_∞≤ (C_0\| w_0\|_∞,\|r\|_∞+C _i _≥ 1\| w_i\|_∞) (24) Since supi∈ℕ≥0‖w¯i‖∞≤‖∞ _i _≥ 0\| w_i\|_∞≤\|w\|_∞, and ,∈ℓ∞r,w∈ _∞, the right hand side of (24) is finite. Thus, the system is ℓ∞ _∞-stable. Therefore, for any p∈[1,∞]p∈[1,∞] and any disturbance sequence ∈ℓpw∈ _p, the corresponding closed-loop trajectories satisfy ∈ℓpx∈ _p and ∈ℓpu∈ _p. Hence, the closed loop induced by the concatenated controller ~ M is ℓp _p-stable. ■ Next, we discuss how imposing additional conditions on r can yield an input-to-state type bound for the resulting closed loop, on top of ℓp _p-stability. Corollary 1 (ISS under time-scheduled persistent updates) Consider Algorithm 1 with p=∞p=∞ and fixed update times ti=itoptt_i=i\,t_opt, i∈ℕi , and assume that the conditions of Theorem 2 hold at every scheduled update. Suppose that r(i)=d(i)γ()(γ((′))+∞)|§′|+∥⊒∥∞,⟩≥∞,r^(i)=d^(i)\,γ( F) (γ( M^(0))+1 )|x_0|+g\|w\|_∞, i≥ 1, for some positive, strictly decreasing scalar sequence d(i)i≥0\d^(i)\_i≥ 0 such that d(0)≥1d^(0)≥ 1 and limi→∞d(i)=0 _i→∞d^(i)=0, and for some constant g>0g>0. Then the resulting closed loop is Input-to-State Stable (ISS) [13]; that is, there exist functions β∈ℒβ and α∈∞α _∞ such that, for any initial condition x0x_0 and any bounded disturbance sequence ∈ℓ∞w∈ _∞, the state trajectory satisfies |xt|≤β(|x0|,t)+α(‖∞),∀t∈ℕ.|x_t|≤β(|x_0|,t)+α(\|w\|_∞), ∀ t . Proof: Let D:=γ()(γ((′))+∞),:=γ()(γ¯+∞).D:=γ( F) (γ( M^(0))+1 ), G:=γ( F)( γ+1). From the proof of Theorem 2, if t∈[ti,ti+1)t∈[t_i,t_i+1) with i≥1i≥ 1, then |xt|≤r(i)+G‖∞=Dd(i)|x0|+(g+G)‖∞.|x_t|≤ r^(i)+G\|w\|_∞=D\,d^(i)|x_0|+(g+G)\|w\|_∞. For t∈[t0,t1)t∈[t_0,t_1), (18) gives |xt|≤D|x0|+G‖∞≤Dd(0)|x0|+(g+G)‖∞,|x_t|≤ D|x_0|+G\|w\|_∞≤ D\,d^(0)|x_0|+(g+G)\|w\|_∞, since d(0)≥1d^(0)≥ 1 and g>0g>0. Now define a continuous, nonincreasing function d¯:[0,∞)→(0,∞) d:[0,∞)→(0,∞) as follows: set d¯(t)=d(0) d(t)=d^(0) for t∈[0,t1]t∈[0,t_1], and for each i≥1i≥ 1 let d¯ d be linear on [ti,ti+1][t_i,t_i+1] with d¯(ti)=d(i−1),d¯(ti+1)=d(i). d(t_i)=d^(i-1), d(t_i+1)=d^(i). Since d(i)\d^(i)\ is strictly decreasing and ti→∞t_i→∞, the function d¯ d is continuous, nonincreasing, and satisfies d¯(t)→0 d(t)→ 0 as t→∞t→∞. Moreover, for every t∈[ti,ti+1)t∈[t_i,t_i+1) one has d¯(t)≥d(i) d(t)≥ d^(i). Therefore, for all t≥0t≥ 0, |xt|≤Dd¯(t)|x0|+(g+G)‖∞.|x_t|≤ D\, d(t)\,|x_0|+(g+G)\|w\|_∞. Define β(s,t):=Dd¯(t)s,α(s):=(g+G)s.β(s,t):=D\, d(t)\,s, α(s):=(g+G)s. Then α∈∞α _∞. Also, for each fixed t≥0t≥ 0, β(⋅,t)∈β(·,t) , and for each fixed s≥0s≥ 0, β(s,⋅)β(s,·) is continuous, nonincreasing, and converges to 0 as t→∞t→∞. Hence β∈ℒβ , and the claimed ISS estimate follows. ∎ -B Implementation details -B1 Design of the sequence r The sequence =r(i)i≥1r=\r^(i)\_i≥ 1 regulates how aggressive updates are allowed to be over time. The design of r can leverage a priori information obtained through offline closed-loop simulations. Let x~0:T x_0:T be a nominal state trajectory generated under a representative disturbance and a baseline controller (′) M^(0). We define a time-indexed budget profile 0≤rt=ρ‖x~t‖,t=0,…,T,0≤ r_t=ρ\,\| x_t\|, t=0,…,T, with ρ>0ρ>0 a scaling factor. To obtain an infinite-horizon sequence compatible with Theorem 2, the profile is extended for t>Tt>T by a nonnegative tail rtt>T\r_t\_t>T chosen so that rtt≥0∈ℓp\r_t\_t≥ 0∈ _p. For instance, when 1≤p<∞1≤ p<∞, one may append any summable tail, e.g. rt=rTηt−Tr_t=r_Tη^\,t-T, t>T,t>T, for some η∈(0,1)η∈(0,1), while for p=∞p=∞ any bounded continuation is admissible (and a decaying continuation can be used if one also wants the update budget to vanish asymptotically). Online, at each actual update time tit_i, we set r(i):=rtir^(i):=r_t_i. In this way, large nominal deviations ‖x~t‖\| x_t\| early in the system evolution yield larger r(i)r^(i), allowing updates even for relatively large states (or higher controller gains), while smaller ‖x~t‖\| x_t\| later in the system evolution yield smaller r(i)r^(i), enforcing more conservative updates near the origin. This provides a simple, simulation-driven procedure to generate a sequence r consistent with Theorem 2. -B2 Controller Updates through optimization The method for updating the controller depends significantly on the chosen update strategy and the available computational resources. Indeed, solving problem (8) can be computationally demanding. A practical way to reduce computational burden is to solve a simplified instance of (8) (shorter horizon H, fewer disturbance samples). Equivalently, one may warm-start from the previously found parameters θ and take a budgeted number of gradient steps during the sampling interval; the resulting inexact iterate can be deployed without risking destabilization. Indeed, the proposed method ensures stability without requiring optimality and without requiring specific assumptions on the cost function, except that it must be differentiable (see Problem 1). -C Numerical Example Figures 3 and 4 report the boxplots of the total closed-loop cost for the mountains and dynamic-obstacles scenarios, respectively. Figure 3: Mountains problem - Total cost (log scale) over 50 runs: nominal scenario (center) and perturbed scenario with impulse disturbances (14) (right). Figure 4: Dynamic obstacles problem - Total cost (log scale) over 50 runs for the offline controller in [8], a receding-horizon open-loop (RHO) planner, and the proposed approach. -C1 Computation of the upper bound of the ℓ2 _2-gain For the pre-stabilized error dynamics, we consider the least favorable slope of the friction term G(v)=−b1v+b2tanh(v),σwc:=−(b1−b2),G(v)=-b_1v+b_2 (v), _wc:=-(b_1-b_2), which corresponds to the minimum-damping case since −b1≤∂G∂v≤−(b1−b2).-b_1≤ ∂ G∂ v≤-(b_1-b_2). Using this worst-case slope, we obtain a linear error model of the form et+1=Awcet+Buut+wt+1.e_t+1=A_wce_t+B_uu_t+w_t+1. Let B¯:=[BuI] B:=[\,B_u\ \ I\,] and ξt:=[ut⊤wt+1⊤]⊤ _t:=[\,u_t \ \ w_t+1 \,] . We compute an upper bound on the ℓ2 _2-gain of ⌉:(⊓,⊒)↦⌉ F_e:(u,w) by finding P≻0P 0, η>0η>0, and ρ>0ρ>0 such that [Awc⊤PAwc−P+ηIAwc⊤PB¯B¯⊤PAwcB¯⊤PB¯−ρI]⪯0. bmatrixA_wc PA_wc-P+η I&A_wc P B\\ B PA_wc& B P B-ρ I bmatrix 0. (25) Then, with V(e)=e⊤PeV(e)=e Pe, (25) implies V(et+1)−V(et)≤−η‖et‖2+ρ(‖ut‖2+‖wt+1‖2).V(e_t+1)-V(e_t)≤-η\|e_t\|^2+ρ (\|u_t\|^2+\|w_t+1\|^2 ). Summing over time and using the convention w0=e0w_0=e_0 yields η‖22≤λmax(P)‖w0‖2+ρ‖22+ρ‖w1:∞‖22≤c(‖22+‖22),η\|e\|_2^2≤ _ (P)\|w_0\|^2+ρ\|u\|_2^2+ρ\|w_1:∞\|_2^2≤ c (\|w\|_2^2+\|u\|_2^2 ), where c:=maxλmax(P),ρc:= \ _ (P),ρ\. Therefore, ∥2≤γ^(⌉)(∥⊒∥∈+∥⊓∥∈),γ^(⌉):=max∞,⌋η.\|e\|_2≤ γ( F_e) (\|w\|_2+\|u\|_2 ), γ( F_e):= \! \1, cη \.