Paper deep dive
Wake Up to the Past: Using Memory to Model Fluid Wake Effects on Robots
Luca Vendruscolo, Eduardo Sebastián, Amanda Prorok, Ajay Shankar
Intelligence
Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 95%
Last extracted: 3/26/2026, 1:27:17 AM
Summary
This paper investigates the modeling of fluid wake effects on autonomous robots (aerial and aquatic) using data-driven approaches. It identifies that memory-less models fail in agile scenarios due to finite propagation times of wake disturbances. Through an empirical study of seven architectures across four domains, the authors demonstrate that incorporating historical state data and explicit transport delay prediction significantly improves wake-effect prediction accuracy.
Entities (6)
Relation Signals (3)
Luca Vendruscolo → affiliatedwith → University of Cambridge
confidence 100% · Authors are with the Department of Computer Science and Technology, University of Cambridge, UK.
Agile MLP → performspoorlyin → Agile scenarios
confidence 90% · Such models often perform poorly in agile scenarios: since the wake effect has a finite propagation time
Mamba → usedtomodel → Fluid Wake Effect
confidence 90% · We explore seven data-driven models designed to capture the spatio-temporal evolution of fluid wake effects
Cypher Suggestions (2)
Identify researchers affiliated with the University of Cambridge. · confidence 100% · unvalidated
MATCH (r:Researcher)-[:AFFILIATED_WITH]->(o:Organization {name: 'University of Cambridge'}) RETURN r.nameFind all model architectures used to study fluid wake effects. · confidence 90% · unvalidated
MATCH (m:ModelArchitecture)-[:USED_TO_MODEL]->(p:PhysicalPhenomenon {name: 'Fluid Wake Effect'}) RETURN m.nameAbstract
Abstract:Autonomous aerial and aquatic robots that attain mobility by perturbing their medium, such as multicopters and torpedoes, produce wake effects that act as disturbances for adjacent robots. Wake effects are hard to model and predict due to the chaotic spatio-temporal dynamics of the fluid, entangled with the physical geometry of the robots and their complex motion patterns. Data-driven approaches using neural networks typically learn a memory-less function that maps the current states of the two robots to a force observed by the "sufferer" robot. Such models often perform poorly in agile scenarios: since the wake effect has a finite propagation time, the disturbance observed by a sufferer robot is some function of relative states in the past. In this work, we present an empirical study of the properties a wake-effect predictor must satisfy to accurately model the interactions between two robots mediated by a fluid. We explore seven data-driven models designed to capture the spatio-temporal evolution of fluid wake effects in four different media. This allows us to introspect the models and analyze the reasons why certain features enable improved accuracy in prediction across predictors and fluids. As experimental validation, we develop a planar rectilinear gantry for two spinning monocopters to test in real-world data with feedback control. The conclusion is that support of history of previous states as input and transport delay prediction substantially helps to learn an accurate wake-effect predictor.
Tags
Links
- Source: https://arxiv.org/abs/2603.22472v1
- Canonical: https://arxiv.org/abs/2603.22472v1
Full Text
48,558 characters extracted from source content.
Expand or collapse full text
Wake Up to the Past: Using Memory to Model Fluid Wake Effects on Robots Luca Vendruscolo Eduardo Sebastián Amanda Prorok Ajay Shankar Authors are with the Department of Computer Science and Technology, University of Cambridge, UK. *Author for correspondence; e-mails: lv407, es2121, asp45, as3233@cst.cam.ac.uk. Abstract Autonomous aerial and aquatic robots that attain mobility by perturbing their medium, such as multicopters and torpedoes, produce wake effects that act as disturbances for adjacent robots. Wake effects are hard to model and predict due to the chaotic spatio-temporal dynamics of the fluid, entangled with the physical geometry of the robots and their complex motion patterns. Data-driven approaches using neural networks typically learn a memory-less function that maps the current states of the two robots to a force observed by the “sufferer” robot. Such models often perform poorly in agile scenarios: since the wake effect has a finite propagation time, the disturbance observed by a sufferer robot is some function of relative states in the past. In this work, we present an empirical study of the properties a wake-effect predictor must satisfy to accurately model the interactions between two robots mediated by a fluid. We explore seven data-driven models designed to capture the spatio-temporal evolution of fluid wake effects in four different media. This allows us to introspect the models and analyze the reasons why certain features enable improved accuracy in prediction across predictors and fluids. As experimental validation, we develop a planar rectilinear gantry for two spinning monocopters to test in real-world data with feedback control. The conclusion is that support of history of previous states as input and transport delay prediction substantially helps to learn an accurate wake-effect predictor— sites.google.com/view/wake-up-to-the-past I Introduction Robots that fly or swim generate momentum by pushing against the air or water around them. This interaction perturbs the medium and produces fluid wake effects—such as the downwash from a multicopter or the trailing vortex of a torpedo—that act as significant disturbances for adjacent robots [13]. Accurate modeling and compensating for these effects is thus crucial for safe and reliable operation of a fleet in close proximity [30, 25]. While humans can manually correct for these forces during specific maneuvers like mid-flight refueling or ship docking, robot deployments with unknown motion patterns require predictors that are accurate and computationally lean, as real-time low-latency disturbance compensation is crucial. The primary challenge in modeling these effects lies in the chaotic spatio-temporal dynamics of the fluid, which are inextricably linked to the physical geometry and complex motion of the robots [5, 13]. Existing observation schemes typically include current velocity and relative velocity as variables that inform the evolution of the interaction between the robots [34, 19]. However, instantaneous information is insufficient for capturing the mechanical aspects of the medium, in particular, transport delays in the propagation of wake effects. Because wake effects have a finite propagation time, the disturbance currently affecting a sufferer robot is in fact a function of the “wake source” robot’s relative state at a point in the past. Without temporal context, the sufferer is unable to correlate the current relative state with the current disturbance. Consequently, memory-less models perform poorly in agile scenarios. In spite of this dependency, there is a dearth of analyses that incorporate history and recurrence in wake-effect prediction models for robots. The goal of this paper is to fill this gap and shed light on the properties a data-driven predictor must satisfy to guarantee accurate modeling of wake effects at a low computational cost, such that the predictor can be deployed in real-time feedback. Specifically, our contributions are: • An empirical study of wake effects across seven data-driven architectures and four domains. The architectures range from simple history-based MLPs to advanced sequential models such as Mamba [14], to determine which temporal mechanisms best capture fluid dynamics. The domains include 2D particle-based simulations to datasets from computational fluid dynamics simulators with aerial and aquatic robots. We identify two key enablers for accurate predictions: a representation of past states in the input, and a module that explicitly predicts transport delay. • The design of a planar rectilinear gantry for two tethered monocopters, on which we collect real-world interaction data under time-varying wake source thrust and validate the prediction accuracy of each architecture. Related Work. Modeling inter-robot wake interaction effects has seen substantial interest in recent years, particularly in the aerial domain. Multirotor downwash, for instance, can exhibit high variability in turbulence depending on the flight regime [41], and has non-trivial effects on neighboring robots [13]. These effects have been studied extensively using numerical models [18, 33, 41] as well as some empirical validation [17, 19]. Computational and numeric fluid modeling has also similarly been applied for aquatic surface and sub-surface vehicles [35, 12]. Of particular interest is the ability to incorporate a given model into a real-time feedback control and planning framework, i.e., the ability to reject the disturbance. To this end, recent work has modeled these effects on multirotors as forces and torques acting on a sufferer, and has developed methods to predict them using numerical models [19], and neural networks with differential [24, 32, 31, 8] and geometric priors [34, 18]. In aquatic domains, distributed networks of sensors have been integrated with hydrodynamic models to supply the information needed to predict the forces suffered by underwater vehicles [20, 27]. Early research even considered wake-effect predictors for control compensation in aquatic manipulation [23, 26]. From a biological perspective, studies on fish schooling [25] have shown the importance of wake-effect prediction not only for stability, but also for energy efficiency in robot motion. Inspired by them, current methods for aquatic robots explore computational fluid simulators and pressure-based analytical methods for soft robots that mimic the mechanical structure of the animals [39, 38, 29]. Similar approaches are found in the field of bio-inspired flapping-wing robots [2, 22, 28, 6]. While the aforementioned data-driven approaches have shown impressive results, we observe that the regimes where such models typically tend to perform well are slow-moving and near-steady-state, laminar flow. This assumption admits modeling the observed effect at a given time as a function of the relative state at that time. However, wake effects have a finite propagation time in any medium. Thus, in highly dynamic settings, the impact on a sufferer is a result of some relative state in the past, and the instantaneous state is a poor approximation for modeling. This leads to the premise of this work, which we will analyze in the remainder of the paper. I Preliminaries Consider two robots, a wake source and a sufferer, moving in a fluid medium. The wake source’s actuation at some time t perturbs the medium, creating a wake effect (e.g., downwash or trailing vortices). This perturbation propagates through the fluid at a finite speed and eventually reaches the sufferer at time t+Δt+ t, which we model as a disturbance force Fdb∈ℝ3F_db ^3. The objective of a wake-effect predictor is to learn a mapping Φ that predicts the instantaneous disturbance force Fdb(t+Δt)F_db(t+ t) experienced by the sufferer. We assume that the wake source and sufferer can cooperate, and thus, wake source can communicate unobservable information (such as its thrust vector) to the sufferer when necessary. Similar to prior work, we also assume that the sufferer has a measure of FdbF_db through disturbance predictors [34], direct sensing [19, 11], or simulators [18]. For the most generalizable setting, we collect all relevant data into a time-indexed dataset comprised of the full state of the two robots and FdbF_db. The full state information is used to build an observation vector o(t)∈ℝdo(t) ^d, which is then preprocessed and/or concatenated with past data (depending on the model architecture) to form an input vector O(t)O(t). We train a predictor over this dataset using a supervised loss to learn the mapping Φ with a neural network F^db(t)=f(O(t),θ), F_db(t)=f(O(t),θ), where θ denotes the learnable parameters of the model and ⋅ · denotes predicted force. We use a weighted mean-squared error (MSE) loss of the form ℒ=wxMSE(f^x,fx)+wyMSE(f^y,fy)+wzMSE(f^z,fz),L=w_xMSE( f_x,f_x)+w_yMSE( f_y,f_y)+w_zMSE( f_z,f_z), where fx,fyf_x,f_y are the horizontal force components of FdbF_db parallel to the sufferer’s geometric plane and fzf_z is the perpendicular component, expressed in the sufferer’s frame of reference. A higher weight (e.g., 10×10×) is typically applied to the horizontal components to compensate for their smaller magnitudes relative to vertical forces. The goal of this work is to study the properties that O and f must satisfy in order to make accurate predictions of FdbF_db. In particular, we test the key hypothesis that encoding temporal context and modeling propagation delay Δt t are essential in highly agile and dynamic settings. The specific observation vector O varies across domains, but generally encodes relative position and velocity between robots, actuation inputs (e.g., thrust or rudder commands), and domain-specific kinematic quantities. The dimensionality of the observations range from 3 (fish schooling) to 14 (3D multirotor computational fluid dynamics). I Model Architectures The modeling of fluid wake effects presents a significant challenge due to the chaotic spatio-temporal dynamics of the interactions between robots and medium. We investigate a diverse range of data-driven predictors categorized by their mechanism for handling temporal context. In this section, we describe each of the models, detailing their relevant hyperparameters and the rationale behind their architecture (and summarized in Table I). The goal is to build a common framework to make results comparable across domains. We remark that the specific values of the hyperparameters of all models (per model and for each domain) have been found through Bayesian optimization using Optuna [1], keeping the total number of trainable parameters below 100100K to ensure the models are computationally lean and, potentially, implementable on board a robot. TABLE I: Summary of model architectures. Parameter counts refer to trainable weights. Architecture Parameters Temporal Mechanism Optimized Hyperparameters Category Agile MLP [18] 18K Instantaneous input features #layers, hidden dimensions Memoryless History MLP [4] 49K Flattened window of past snapshots #snapshots, hidden dimensions Explicit History Delay Embedding [40] 35K Learned Gaussian kernel over history μ0,σ0 _0, _0, selector dimensions Explicit History GRU [9] 71K Learned hidden state hidden dimensions, dropout Recurrent TCN [3] 73K Causal dilated 1D convolutions #layers, kernel size, dropout Recurrent Mamba [14] 9K Selective state-space model dmodeld_model, dstated_state, convolution width Recurrent RC (ESN) [16] 2K* Hidden state from fixed random reservoir reservoir size, spectral radius, leak rate Recurrent Cross-Attention [37] 16K Weighted combination of processed tokens embed dimensions, #heads Attention-based *RC (ESN) uses additional 1,008K frozen weights in the reservoir. I-A Memoryless Baseline Agile MLP [18] is a memory-less feedforward network that maps the d−d-dimensional input equivariant feature vector directly to a 3D force prediction in the equivariant frame. It assumes the fluid wake is a function of the instantaneous relative state (geometry, velocity, thrust). The architecture is a single MLP with ReLU activations. It has recently been shown that this parameterization outperforms other existing memory-less models in pair-wise downwash prediction in quadrotors [24, 34, 32, 31], justifying it as a primary baseline. I-B Explicit History Models These models explicitly provide a history of past snapshots to capture temporal changes without recurrence. I-B1 History MLP The History MLP concatenates multiple past snapshots into a single flattened input vector. By observing a sliding window of recent history, the model can infer which wake patterns generated in the past are arriving at the sufferer now. It utilizes evenly-spaced snapshots covering a tunable time window. The architecture mirrors Agile MLP but with an input layer sized for the flattened history. I-B2 Delay Embedding The Delay Embedding model learns to pick an snapshot at time τ rather than processing a full window of snapshots. Since the wake travels at a finite speed, this model uses a selector to identify the specific past state that caused the wake effect. A selector MLP maps past snapshots to Gaussian attention parameters (delay center μ and kernel width σ) to weigh historical observations. In practice, it employs a “delay gap” to mask the most recent snapshots, forcing the model to rely on past observations. I-C Recurrent Models These models maintain an internal hidden state to summarize the history of observations, rather than explicitly holding an explicit time-series in the input. I-C1 Gated Recurrent Unit (GRU) GRUs process observations sequentially, maintaining a compressed summary of the history in a hidden state. Learned gates (reset and update) allow the model to selectively remember stable wake patterns or rapidly update the state during agile maneuvers. The architecture includes a linear input projection layer, a GRU, and an output projection layer. I-C2 Temporal Convolutional Network The Temporal Convolutional Network (TCN) processes trajectories using causal dilated 1D convolutions. By using exponentially increasing dilations, the TCN builds a wide receptive field that can detect temporal patterns at multiple timescales simultaneously. Its architecture stacks multiple layers to cover a receptive field well beyond the typical transport delays. I-C3 Mamba Mamba is a selective state-space model that makes the recurrence parameters dependent on the input. It allows the model to selectively retain or discard information at every timestep, providing a learned counterpart to fixed-recurrence systems. The model dimension and state dimension are tuned per domain via Optuna. I-C4 Reservoir Computing with Echo State Networks Reservoir Computing with Echo State Networks (RC-ESN) uses a large, fixed, randomly wired recurrent reservoir and only learns the readout layer. Inspired by cognitive models of the brain, a high-dimensional random dynamical system naturally creates “echoes” of past inputs, allowing a linear readout to pick the most predictive temporal features. The architecture uses 10001000 reservoir neurons with a sparse connectivity recurrent matrix. Only the readout linear output layer is trained, although it is important to remark that the reservoir involves a set of frozen weights that contribute to computation, even if they are not learned. I-D Cross-Attention The Cross-Attention model allows the sufferer to selectively query relevant moments from the wake source’s past trajectory. By dynamically weighing the relative states, the model accounts for the motion of both robots during the interaction. In practice, the network is based on multi-head cross-attention with fixed sinusoidal positional encodings. IV Evaluations Figure 2: Performance comparison of each of the methods on the four domains. (left) A CFD dataset for two quadrotors [18]. (center left) A numerical model of ship-ship interactions [36]. (center right) A dataset that captures hydrodynamic interactions between two fish swimming in line [10]. (right) Closed-loop tracking on a custom 2D downwash simulator. We now evaluate each of the temporal-context models, as well as memory-less baselines on four different evaluation domains: a CFD dataset for two quadrotors [18], a model for ship-to-ship interactions in encounter and overtaking maneuvers [36], a dataset that captures hydrodynamic interactions between two fish swimming in line [10], and a custom 2D multicopter downwash simulator. In the first three domains, we focus our study on the accuracy of the models in predicting the wake-effect disturbance. In the 2D simulator, we additionally consider the closed-loop trajectory tracking performance when the predictions of the model are incorporated into the control loop. All the results are averaged over 3 seeds per model. Finally, we evaluate the models on our real-world experimentation setup using two tethered monocopters on a gantry. LABEL:fig:teaser presents a glimpse of the conclusions of the study: we already see that models that incorporate temporal context outperform those that do not. The bar plots for each of the four domains show the performance of the memory-less MLP baseline, compared against the best performing memory-based architecture in each domain. We now analyze the results for all methods and domains in detail111We provide more materials on the website.. IV-A Agile quadrotors interactions We begin our analysis on the influence of temporal context using datasets from prior work, FlareDW [18], which utilizes a custom high-fidelity CFD simulator to model interactions between two P600 quadrotors (3 kg3\,kg mass, 0.6 m0.6\,m rotor-to-rotor distance). It enables us to explore agile, non-stationary downwash disturbances at velocities ranging from 0.5 m/s to 4.0 m/s0.5\,m/s4.0\,m/s, well beyond the low-speed regimes (<0.5 m/s<$0.5\,m/s$) typically addressed in prior work. Data collection, recorded at 200 Hz, encompasses four distinct maneuvers: Fly Below, Fly Above, Swapping (both moving), and Fast Swapping (up to 8 m/s8\,m/s relative velocity). The resulting dataset, derived from 30 minutes of simulated flight, incorporates absolute velocities, thrust vectors and relative states to capture the distorted, time-delayed force profile observed at higher speeds. Figure˜2 shows the performance of each of the models on this dataset. We report root mean squared error (RMSE) in predicting downwash force, measured in Newtons. All temporal models excluding RC outperform the baseline MLP. We hypothesize that RC struggles to retain information from the wide variety of scenarios in the FlareDW database using only the 2K trainable parameters. Cross-Attention and Delay Embedding outperform the other methods by providing the predictor with a direct estimate of the wake source’s state at the time of wake creation. History MLP, despite access to history, can struggle when the true delay falls between sliding windows. The sequential models (GRU, TCN, Mamba) are competitive but the relevant past state is diluted through recurrent processing, yielding noisier estimates. IV-B Ship encounter and overtaking Next, we evaluate wake effects in maritime environments using a numerical model of ship-to-ship interaction forces [36]. This model specifically investigates sway forces and yaw moments experienced by vessels during encounter and overtaking maneuvers in restricted channels. Data are generated using a discrete vortex distribution numerical technique and slender body theory, which is particularly suited for high-speed, fine-form craft. Crucially, this numerical framework has been empirically validated in prior research [21], showing reasonably good agreement with experimental results. The dataset focuses on the influence of water depth-to-draught ratio, lateral separation distance, and relative ship speeds. The models are trained to predict the sway force and yaw moment experienced by the sufferer ship. Unlike the original formulation [36], which assumes fixed parallel paths, we introduce time-varying lateral targets for the sufferer that are not included in the observation vector. This creates a temporal ambiguity that a memory-less model cannot resolve: from a single observation, a model without temporal context cannot distinguish whether the sufferer is veering due to hydrodynamic interaction forces or because it is tracking a new lateral target. Only models with access to temporal context can disambiguate by observing the sufferer’s trajectory over time. Figure˜2 shows the performance of each of the models in this domain. As expected, we observe that models that incorporate history generally outperform the baseline MLP. However, in contrast to the results from FlareDW (Section˜IV-A), here we observe that History MLP performs the best. Unlike the downwash domains where the transport delay is roughly constant, the ship encounter produces a continuously varying delay as the vessels approach and recede, making it harder for explicit-delay models to learn a single representative lag. A simple sliding window suffices, as the relative observation features already encode the encounter phase. IV-C Fish Schooling Next, we use data from an experimental setup for fish schooling that uses a robotic platform designed to replicate a simplified two-fish subsystem [10]. The system consists of an actively pitching NACA 0012 airfoil positioned upstream and a compliant Mylar flag located 5.4 cm5.4\,cm downstream. These components interact via two distinct pathways: a hydrodynamic pathway mediated by vortex shedding from the airfoil, and an electromechanical pathway where the flag’s leading rib is actuated to follow the airfoil’s motion with a controlled time delay. We use only the single-pathway (hydrodynamic) condition, in which the flag responds passively, to isolate wake-mediated coupling. Data collection includes high-resolution camera tracking of component movements at 60 Hz60\,Hz, downsampled to 30 Hz30\,Hz for training, and laser Doppler velocimetry (LDV) to record streamwise flow velocity at an intermediate point between the bodies. The input features are airfoil pitching angle, LDV velocity, and flag front displacement; the prediction target is flag tip displacement. Causal interactions and associated time lags are then disentangled using an information-theoretic approach based on transfer entropy, which identified a causal delay of ∼0.33 s $0.33\,s$, which subsequently informs our 0.5 s0.5\,s history window. This is an interesting domain since the disturbances are sinusoidal, and follow a nearly constant period and transport delay. The performance of each of the models is shown in Figure˜2. Similar to the prior two domains, models that incorporate temporal context are the best-performing. To investigate further, in Figure˜3 we additionally report the coefficient of determination for the best (GRU) and the worst performer (baseline MLP). Although the baseline MLP achieves a small RMSE, we observe that it does so with an R2R^2 of only 0.570.57. In addition, Figure˜3 also shows that the memory-less MLP has two correlation modes (clusters) due to the phase ambiguity described above. We attribute this to two factors. First, the wake vortexes take ≈0.33 s≈$0.33\,s$ to travel from the airfoil to the flag, so the MLP correlates the current foil angle with a flag response that was caused by a past state it cannot observe. Second, the sinusoidal pitching introduces a phase ambiguity: the same foil angle occurs twice per cycle, once in ascent and the again in descent, producing different wake structures each time. Without temporal context, the MLP cannot distinguish these phases. Furthermore, the airfoil occasionally makes motions that deviate from the standard oscillation, producing non-periodic wake structures that a memory-less model cannot anticipate. History-aware models resolve these limitations by observing the trajectory of the foil angle over time, recovering the phase and detecting transient deviations. Recurrent networks, including GRUs, are inherently designed to process sequential data, making them effective at recognizing repeating patterns over time. Figure˜3 shows that GRU, with R2=0.96R^2=0.96, presents a more uniform spread about y=xy=x, which indicates an unbiased modeling of the wake effect. Compared to the two previous scenarios, models that explicitly predict transport delay, namely Delay Embedding and Cross-Attention, are not as effective in this domain. We investigate this by analyzing the Delay Embedding model, which exposes its learned delay through the μ parameter. Figure˜3 compares the final predicted delay across different random initializations of μ. Because the disturbance is nearly sinusoidal, shifted copies of the signal at multiples of the half-period correlate almost as well as the true delay, creating alias correlation peaks that act as local minima during optimization. As a result, the model converges to the true physical delay only when μ is initialized in its basin of attraction; otherwise, it settles on an alias peak with lower R2R^2. This sensitivity is specific to domains with constant-frequency oscillations. In domains with richer temporal structure, such as FlareDW and the 2D downwash simulator, the Delay Embedding model reliably converges to the true physical delay regardless of the initialization of μ. Figure 3: Fish schooling analysis. (left) Scatter plot of true vs. predicted disturbances. The memoryless MLP (R2=0.57R^2=0.57) exhibits two distinct modes, failing to distinguish periodic phases, while the GRU achieves accurate, unbiased predictions. (right) Convergence basin of the Delay Embedding model. Depending on the random initialisation of μ, the model converges to either the true physical delay (0.33 s, higher R2R^2) or an alias peak (0.148 s, lower R2R^2). IV-D 2D Downwash Simulator We now introduce a custom 2D multicopter downwash simulation that models the velocity field generated by a wake source multicopter. The simulator models the multicopters as planar rigid bodies (XZ plane) that tilt and produce 2D thrust vectors. The velocity field =(vx,vz)v=(v_x,v_z) evolves on a 2D Eulerian grid with cell sizes Δx,Δz x, z through four operations applied sequentially at timestep Δt t, similar to what is done in [7, 15]. First, the wake source at position (px,pz)(p_x,p_z) with pitch θ injects momentum using a Gaussian spatial profile, Δij=Suuhoverexp(−rij22σ2)Δt(sinθ−cosθ), _ij=S uu_hover \! (- r_ij^22σ^2 ) t pmatrix θ\\ - θ pmatrix, where rij2=(xi−px)2+(zj−pz)2r_ij^2=(x_i-p_x)^2+(z_j-p_z)^2, S is the injection strength, σ is the spatial width, u and uhoveru_hover denote the wake source’s thrust and the thrust at hover, and (sinθ,−cosθ)⊺( θ,- θ) is a unit vector opposite to the body-frame thrust vector. Second, the field advects downward at a constant speed wa=6 m/sw_a=$6\,m/s$ using an upwind finite-difference scheme, followed by a semi-Lagrangian horizontal advection that traces each cell back to its upstream source position xsrc=i−vx[i,j]⋅Δt/Δx_src=i-v_x[i,j]· t/ x. Third, diffusion is modeled as a Gaussian blur with standard deviation ℓd=2κΔt _d= 2κ t, equivalent to solving ∂/∂t=κ∇2 /∂ t=κ∇^2v for one timestep, with κ=0.3 m2/sκ=$0.3\,m^2/s$ the diffusivity parameter. Finally, energy dissipation is captured by an exponential decay, ←⋅e−λΔtv · e^-λ t. The downward advection produces an emergent transport delay of Δz/wa≈0.38 s z/w_a≈$0.38\,s$ for a default vertical separation of Δz=2.3 m z=$2.3\,m$. The disturbance force at the sufferer is computed as quadratic drag, Fdb=12ρCd∥F_db= 12ρ C_d\,v , evaluated at the sufferer’s position via bilinear interpolation. Each multicopter uses a Linear Quadratic Regulator (LQR) for flight, and the predicted interaction forces are incorporated as feedforward thrust and pitch corrections. This allows us to evaluate closed-loop tracking performance under three conditions: a baseline with no compensation, using an oracle with access to the true forces, and using predictions from each of the trained models. The observation is an 8D vector comprising relative states and thrust components, and the output is the 2D disturbance force, all recorded at 100 Hz100\,Hz. Figure˜2 shows the tracking performance of the sufferer when following random pre-planned trajectories. We observe again that models that incorporate temporal context outperform the baseline MLP. To further validate that the models learn the correct temporal structure, Figure˜4 visualizes the average attention weights of the Cross-Attention model across all episodes. Since the vertical separation between the two multicopters is constant, the transport delay is fixed across all episodes and attention weights can be meaningfully averaged. The attention profile reveals a clear peak at 0.5 s0.5\,s in the past, close to the true physical transport delay Δz/wa z/w_a at 0.38 s0.38\,s. Furthermore, the distribution of weights is concentrated in a window around the predicted delay, with negligible weight assigned to more recent or distant time steps. We also compare the true vs learned delay in the Delay Embedding model for five different Δz z (Figure˜4, right). The model clearly learns to attend to the past states of the wake-source, albeit with an added ≈0.12 s≈$0.12\,s$ shift across all tests (similar to Cross-Attention). We hypothesize that this shift is a mechanism learned in both models for conservativeness and in-distribution generalization: looking slightly beyond the true transport delay can reveal more information about relative motion of the source. Finally, we leverage the customization capabilities of the simulator to test how well the different models generalize to fluid conditions out of the training distribution. We train all models with wake speed, diffusivity, and vertical separation sampled uniformly at ±10%± 10\% of their nominal values (wa=6.0 m/sw_a=$6.0\,m/s$, κ=0.30 m2/sκ=$0.30\,m^2/s$, Δz=2.3 m z=$2.3\,m$) and evaluate closed-loop RMSE under increasingly perturbed conditions: ±10%± 10\%, ±50%± 50\%, and ±75%± 75\%. As shown in Figure˜5, performance retains the same relative ordering across conditions. Therefore, the performance gains achieved by the memory-based models do not come from overfitting to the training dataset—“just” due to a greater representational capacity—but because memory-based models are better suited than memory-less models to capture the underlying wake-effect dynamics. Figure 4: (left) Average attention weights of the Cross-Attention model over all episodes in the 2D downwash simulator. The profile shows a clear peak near the true physical delay (red dashed line at 0.38 s0.38\,s), demonstrating that the model attends to the correct time lag. (right) Learned versus physical delay for the Delay Embedding model across five values of Δz z. Figure 5: Out-of-distribution closed-loop ablation on the 2D downwash simulator. Models are trained with physics parameters sampled at ±10%± 10\% of their nominal values and evaluated at ±10%± 10\%, ±50%± 50\%, and ±75%± 75\% perturbation levels. Red lines denote the baseline closed-loop RMSE from a 3-seed evaluation trained and evaluated at the nominal (unperturbed) conditions. Degradation preserves the relative ranking across conditions. IV-E Real tethered monocopters Finally, we perform validations on real hardware using a custom in-house rectilinear 2D gantry mechanism. The system, shown in Figure˜6, tethers two monocopters to enable constrained motion in the XZ plane (similar to the 2D simulation framework above). The motion in the vertical axis is supported by two SBR12 linear rails with metal bearings that have very low static and dynamic friction. The lateral movement is effected using a Nema 23 stepper motor with a belt drive system. Each monocopter is built using an off-the-shelf brushless DC outrunner motor, which is driven by a commercial off-the-shelf brushless electronic speed controller. The feedback control and measurement is implemented on a Raspberry Pi 5, which interfaces with an Arduino UNO (ATmega328P) microcontroller to generate PWM drive signals for the ESCs. A VL53L0X Time-of-Flight sensor mounted at the base measures the altitude, while the lateral position is measured using open-loop stepper step counting with a calibrated constant of 4200 steps/m4200\,steps/m. We use a Kalman filter to estimate position and velocity at 32 Hz32\,Hz during each episode. Acceleration is obtained offline via a Rauch-Tung-Striebel smoother, which runs a forward-backward pass over the recorded data to produce near-zero-lag estimates. The total round-trip communication and sensing latency is empirically determined to be ≈18 ms≈$18\,ms$. Figure 6: The tethered monocopters gantry system. (top left) The wake source can move freely in the X axis, while the sufferer can move freely in the XZ plane. The sensors interface with an Arduino UNO and a Raspberry Pi 5 to implement feedback control with/without disturbance rejection. (top right) Four overlay snapshots from an episode where the sufferer attempts to track a reference trajectory (dashed red) and deviates significantly (green curve) due to the time-varying turbulence caused by the wake source. (bottom left) Memory-based predictors achieve better open-loop prediction performance than the memory-less baseline, matching numerical and simulated evaluations in Figure˜2. (bottom right) Mean altitude trajectory across n=5n=5 repeated PID-only episodes on the same randomization seed. The dashed red line indicates the target altitude; the solid green curve shows the mean position; the shaded region denotes ±1σ± 1σ. The characteristic downwash-induced dip is clearly visible. During each episode, both the lateral velocity of the carriage and the RPM of the wake source monocopters are varied continuously, creating a rapidly changing downwash field. We do not model the response time of the motor and the ESC to a new PWM command. Thus, it is likely that the commands sent to the wake source sometimes change faster than the mechanical spool rate of the system. This makes the instantaneous RPM (and therefore the downwash) a function of a sliding window of some last n PWM commands. This effect compounds with the wake transport delay, further increasing the temporal context required for accurate prediction. A memory-less model observing only the current PWM command cannot recover either lag, whereas temporal models can infer the true motor state from the recent history of commands. This also explains why Delay Embedding performs very close to the baseline MLP: it attends to a single past snapshot at the learned delay, which cannot capture the moving-average nature of the motor response. V Discussion Our empirical results establish that models with temporal context consistently and substantially outperform memory-less baselines in modeling fluid wake effects. While prior work often assumes steady-state or laminar conditions where instantaneous observations suffice, our study demonstrates that in agile regimes the finite propagation time of disturbances makes memory-less approximations inadequate. The consistent performance of memory-based architectures over the memory-less baseline confirms that the sufferer robot’s state is inextricably linked to the wake source’s past actions. However, the optimal temporal mechanism can vary depending on the physical characteristics of the medium and the motion patterns involved. A key finding is the efficacy of explicit delay prediction. In FlareDW and the 2D Simulator, Cross-Attention and Delay Embedding excel by providing a direct estimate of the wake source’s state at the time of wake creation, giving the predictor an unambiguous causal signal. Explicit-delay models struggle when the delay is not clearly identifiable: in Fish Schooling, periodicity creates alias peaks, and in Ship Encounter the changing inter-vessel distance makes the delay non-stationary. Recurrent models like GRUs offer a robust alternative by implicitly encoding the relevant history in a hidden state. RC (ESN), despite its large reservoir, is constrained by its low trainable capacity and succeeds only when the temporal structure is simple. A key trade-off is evident in the computational cost. While temporal models provide the accuracy needed for safe proximity flight, they require more data and compute for training and real-time inference. History MLP is the simplest temporal model and performs consistently well across all domains, offering a practical default for adding wake-effect compensation to any downwash problem with a lower computational footprint than the sequential alternatives. Consequently, we summarize the findings of the paper in two complementary features that an accurate wake-effect predictor must have: (i) some explicit representation of temporal context in its input, and, (i) a module that explicitly predicts transport delay. A limitation we observe in this study is the absence of a clear ‘winner’ among the models with temporal context across domains. The combination of both mechanisms is an interesting line for further analysis. VI Conclusion This paper has demonstrated that capturing the spatio-temporal evolution of fluid wake effects is essential for maintaining the stability of autonomous aerial and aquatic robots. Our core findings have revealed that the inclusion of temporal context significantly improves force prediction accuracy compared to memory-less baselines, as these models have successfully accounted for the finite propagation time of fluid disturbances. We have shown that data-driven models are capable of learning physical transport delays directly from data without explicit physics programming. While no single neural architecture has outperformed all others across every experimental domain, the necessity for models to support the history of previous states and include a transport delay predictor has emerged as a universal requirement. Our evaluations have confirmed that the benefits of temporal context are most pronounced in agile motion regimes where current state features are insufficient to encode fluid dynamics. Future work will focus on the real-world validation of these models within full closed-loop controller integrations and extending the framework to handle interactions between more than two agents. This latter focus is especially critical for swarm autonomy, where emergent interactions are typically more complex than the sum of their individual parts. References [1] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama (2019) Optuna: a next-generation hyperparameter optimization framework. In ACM International Conference on Knowledge Discovery & Data Mining, p. 2623–2631. Cited by: §I. [2] S. Armanini, M. Karásek, G. De Croon, and C. De Visser (2017) Onboard/offboard sensor fusion for high-fidelity flapping-wing robot flight data. Journal of Guidance, Control, and Dynamics 40 (8), p. 2121–2132. Cited by: §I. [3] S. Bai, J. Z. Kolter, and V. Koltun (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271. Cited by: TABLE I. [4] H. Bourlard and C. J. Wellekens (1988) Links between markov models and multilayer perceptrons. Advances in Neural Information Processing Systems 1. Cited by: TABLE I. [5] C. Brouzet, C. Raufaste, and M. Argentina (2025) Undulatory underwater swimming: linking vortex dynamics, thrust and wake structure with a biorobotic fish. Journal of Fluid Mechanics 1015, p. A53. Cited by: §I. [6] X. Cai, D. Kolomenskiy, T. Nakata, and H. Liu (2021) A CFD data-driven aerodynamic model for fast and precise prediction of flapping aerodynamics in various flight velocities. Journal of Fluid Mechanics 915, p. A114. Cited by: §I. [7] K. Chang, S. Chen, M. Wang, X. Xue, and Y. Lan (2023) Numerical simulation and verification of rotor downwash flow field of plant protection uav at different rotor speeds. Frontiers in plant science 13, p. 1087636. Cited by: §IV-D. [8] K. Y. Chee, P. Hsieh, G. J. Pappas, and M. A. Hsieh (2025) Flying quadrotors in tight formations using learning-based model predictive control. In IEEE International Conference on Robotics and Automation, p. 10951–10957. Cited by: §I. [9] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. Cited by: TABLE I. [10] R. Das, S. D. Peterson, and M. Porfiri (2025) Disentangling coexisting sensory pathways of interaction in schooling fish. Flow. Cited by: Figure 2, Figure 2, §IV-C, §IV. [11] E. Davis and P. E. Pounds (2017) Direct sensing of thrust and velocity for a quadrotor rotor array. IEEE Robotics and Automation Letters 2 (3), p. 1360–1366. Cited by: §I. [12] S. Gaggero, S. Brizzolara, et al. (2007) Exact modeling of trailing vorticity in panel method for marine propeller. International Conference on Marine Research and Transportation 98. Cited by: §I. [13] J. Gielis, A. Shankar, R. Kortvelesy, and A. Prorok (2023) Modeling aggregate downwash forces for dense multirotor flight. In International Symposium on Experimental Robotics, p. 393–404. Cited by: §I, §I, §I. [14] A. Gu and T. Dao (2024) Mamba: linear-time sequence modeling with selective state spaces. In Conference on Language Modeling, Cited by: 1st item, TABLE I. [15] C. He and J. Zhao (2009) Modeling rotor wake dynamics with viscous vortex particle method. AIAA journal 47 (4), p. 902–915. Cited by: §IV-D. [16] H. Jaeger (2002) Adaptive nonlinear system identification with echo state networks. Advances in Neural Information Processing Systems 15. Cited by: TABLE I. [17] K. P. Jain, T. Fortmuller, J. Byun, S. A. Mäkiharju, and M. W. Mueller (2019) Modeling of aerodynamic disturbances for proximity flight of multirotors. In IEEE International Conference on Unmanned Aircraft Systems, p. 1261–1269. Cited by: §I. [18] P. Kharitenko, Y. Fan, X. Liu, and Y. Wang (2025) A spatiotemporal downwash modeling for agile close-proximity multirotor flight. In IEEE/RSJ International Conference on Intelligent Robots and Systems, p. 807–812. Cited by: §I, §I, §I, §I-A, TABLE I, Figure 2, Figure 2, §IV-A, §IV. [19] A. Kiran, N. Ayanian, and K. Breuer (2025) Influence of static and dynamic downwash interactions on multi-quadrotor systems. In Robotics: Science and Systems, Cited by: §I, §I, §I, §I. [20] M. Krieg, K. Nelson, and K. Mohseni (2019) Distributed sensing for fluid disturbance compensation and motion control of intelligent robots. Nature machine intelligence 1 (5), p. 216–224. Cited by: §I. [21] V. KS (2002) Identification of trends in extremes of sway-yaw interference for several ships meeting in restricted waters. Schiffahrts-Verag Hansa 49, p. 174–191. Cited by: §IV-B. [22] D. F. Kurtulus (2009) Ability to forecast unsteady aerodynamic forces of flapping airfoils by artificial neural network. Neural Computing and Applications 18 (4), p. 359–368. Cited by: §I. [23] B. Lévesque and M. J. Richard (1994) Dynamic analysis of a manipulator in a fluid environment. The International journal of robotics research 13 (3), p. 221–231. Cited by: §I. [24] J. Li, L. Han, H. Yu, Y. Lin, Q. Li, and Z. Ren (2023) Nonlinear mpc for quadrotors in close-proximity flight with neural network downwash prediction. In IEEE Conference on Decision and Control, p. 2122–2128. Cited by: §I, §I-A. [25] L. Li, X. Zheng, R. Mao, and G. Xie (2021) Energy saving of schooling robotic fish in three-dimensional formations. IEEE Robotics and Automation Letters 6 (2), p. 1694–1699. Cited by: §I, §I. [26] T. W. McLain and S. M. Rock (1998) Development and experimental validation of an underwater manipulator hydrodynamic model. The International Journal of Robotics Research 17 (7), p. 748–759. Cited by: §I. [27] K. Nelson and K. Mohseni (2020) Hydrodynamic force decoupling using a distributed sensory system. IEEE Robotics and Automation Letters 5 (2), p. 3235–3242. Cited by: §I. [28] C. T. Orlowski and A. R. Girard (2011) Modeling and simulation of nonlinear dynamics of flapping wing micro air vehicles. AIAA journal 49 (5), p. 969–981. Cited by: §I. [29] R. Pramanik, R. Verstappen, and P. Onck (2024) Computational fluid–structure interaction in biology and soft robots: a review. Physics of Fluids 36 (10). Cited by: §I. [30] A. Shankar, H. Woo, and A. Prorok (2023) Docking multirotors in close proximity using learnt downwash models. In International Symposium on Experimental Robotics, p. 427–437. Cited by: §I. [31] G. Shi, W. Hönig, X. Shi, Y. Yue, and S. Chung (2021) Neural-swarm2: planning and control of heterogeneous multirotor swarms using learned interactions. IEEE Transactions on Robotics 38 (2), p. 1063–1079. Cited by: §I, §I-A. [32] G. Shi, W. Hönig, Y. Yue, and S. Chung (2020) Neural-swarm: decentralized close-proximity multirotor control using learned interactions. In IEEE International Conference on Robotics and Automation, p. 3241–3247. Cited by: §I, §I-A. [33] D. Shukla and N. Komerath (2019) Low reynolds number multirotor aerodynamic wake interactions. Experiments in Fluids 60 (4), p. 77. Cited by: §I. [34] H. Smith, A. Shankar, J. Gielis, J. Blumenkamp, and A. Prorok (2023) So (2)-equivariant downwash models for close proximity flight. IEEE Robotics and Automation Letters 9 (2), p. 1174–1181. Cited by: §I, §I, §I, §I-A. [35] S. Tavakoli, P. Shaghaghi, S. Mancini, F. De Luca, and A. Dashtimanesh (2022) Wake waves of a planing boat: an experimental model. Physics of Fluids 34 (3). Cited by: §I. [36] K. S. Varyani, R. C. McGregor, P. Krishnankutty, and A. Thavalingam (2002) New empirical and generic models to predict interaction forces for several ships in encounter and overtaking manoeuvres in a channel. International Shipbuilding Progress 49 (4), p. 237–262. Cited by: Figure 2, Figure 2, §IV-B, §IV. [37] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017) Attention is all you need. In Advances in Neural Information Processing Systems, Vol. 30. Cited by: TABLE I. [38] Y. Xu and K. Mohseni (2013) Fish lateral line inspired hydrodynamic feedforward control for autonomous underwater vehicles. In IEEE/RSJ International Conference on Intelligent Robots and Systems, p. 3565–3870. Cited by: §I. [39] Y. Xu and K. Mohseni (2013) Fish lateral line inspired hydrodynamic force estimation for autonomous underwater vehicle control. In IEEE Conference on Decision and Control, p. 6156–6161. Cited by: §I. [40] G. Yang (2019) Wide feedforward or recurrent neural networks of any architecture are gaussian processes. Advances in Neural Information Processing Systems 32. Cited by: TABLE I. [41] H. Zhang, Y. Lan, N. Shen, J. Wu, T. Wang, J. Han, and S. Wen (2020) Numerical analysis of downwash flow field from quadrotor unmanned aerial vehicles. International Journal on Precision Agricultural Aviation 3 (4), p. 1–7. Cited by: §I.