Paper deep dive

Learning Where the Physics Is: Probabilistic Adaptive Sampling for Stiff PDEs

Akshay Govind Srinivasan, Balaji Srinivasan

Year: 2026Venue: arXiv preprintArea: cs.CEType: PreprintEmbeddings: 33

Abstract

Abstract:Modeling stiff partial differential equations (PDEs) with sharp gradients remains a significant challenge for scientific machine learning. While Physics-Informed Neural Networks (PINNs) struggle with spectral bias and slow training times, Physics-Informed Extreme Learning Machines (PIELMs) offer a rapid, closed-form linear solution but are fundamentally limited by physics-agnostic, random initialization. We introduce the Gaussian Mixture Model Adaptive PIELM (GMM-PIELM), a probabilistic framework that learns a probability density function representing the ``location of physics'' for adaptively sampling kernels of PIELMs. By employing a weighted Expectation-Maximization (EM) algorithm, GMM-PIELM autonomously concentrates radial basis function centers in regions of high numerical error, such as shock fronts and boundary layers. This approach dynamically improves the conditioning of the hidden layer without the expensive gradient-based optimization(of PINNs) or Bayesian search. We evaluate our methodology on 1D singularly perturbed convection-diffusion equations with diffusion coefficients $\nu=10^{-4}$. Our method achieves $L_2$ errors up to $7$ orders of magnitude lower than baseline RBF-PIELMs, successfully resolving exponentially thin boundary layers while retaining the orders-of-magnitude speed advantage of the ELM architecture.

PDF

Open source PDF →

PDF not stored locally. Use the link above to view on the source site.

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 96%

Last extracted: 3/13/2026, 12:18:15 AM

Summary

The paper introduces GMM-PIELM, a probabilistic framework for solving stiff partial differential equations (PDEs) by adaptively sampling radial basis function kernels. By interpreting the PDE residual field as an unnormalized probability density function, the method uses an Expectation-Maximization (EM) algorithm to concentrate hidden-unit centers in regions of high numerical error, such as boundary layers, significantly improving accuracy and conditioning compared to standard Physics-Informed Extreme Learning Machines (PIELMs).

Entities (5)

GMM-PIELM · method · 100%PIELM · method · 100%PINNs · method · 100%Expectation-Maximization · algorithm · 98%Convection-Diffusion Equation · problem · 95%

Relation Signals (3)

GMM-PIELM → uses → Expectation-Maximization

confidence 100% · By employing a weighted EM algorithm, GMM-PIELM autonomously concentrates radial basis function centers

GMM-PIELM → improves → PIELM

confidence 95% · GMM-PIELM autonomously concentrates radial basis function centers in regions of high numerical error... dynamically improves the conditioning of the hidden layer

GMM-PIELM → solves → Convection-Diffusion Equation

confidence 95% · We evaluate our methodology on 1D singularly perturbed convection-diffusion equations

Cypher Suggestions (2)

Find all methods that improve upon PIELM · confidence 90% · unvalidated

MATCH (m:Method)-[:IMPROVES]->(p:Method {name: 'PIELM'}) RETURN m.name

List all problems solved by GMM-PIELM · confidence 90% · unvalidated

MATCH (m:Method {name: 'GMM-PIELM'})-[:SOLVES]->(p:Problem) RETURN p.name

Full Text

32,328 characters extracted from source content.

Expand or collapse full text

Learning Where the Physics Is: Probabilistic Adaptive Sampling for Stiff PDEs Akshay Govind Srinivasan, Balaji Srinivasan Indian Institute of Technology, Madras Chennai, TN 600107, India me22b102@smail.iitm.ac.in, sbalaji@dsai.iitm.ac.in Abstract Modeling stiff partial differential equations (PDEs) with sharp gradients remains a significant challenge for scientific machine learning. While Physics-Informed Neural Networks (PINNs) struggle with spectral bias and slow training times, Physics-Informed Extreme Learning Machines (PIELMs) offer a rapid, closed-form linear solution but are fundamentally limited by physics-agnostic, random initialization. We introduce the Gaussian Mixture Model Adaptive PIELM (GMM-PIELM), a probabilistic framework that learns a probability density function representing the “location of physics” for adaptively sampling kernels of PIELMs. By employing a weighted EM algorithm, GMM-PIELM autonomously concentrates radial basis function centers in regions of high numerical error, such as shock fronts and boundary layers. This approach dynamically improves the conditioning of the hidden layer without the expensive gradient-based optimization(of PINNs) or Bayesian search. We evaluate our methodology on 1D singularly perturbed convection-diffusion equations with diffusion coefficients ν=10−4ν=10^-4. Our method achieves L2L_2 errors up to 77 orders of magnitude lower than baseline RBF-PIELMs, successfully resolving exponentially thin boundary layers while retaining the orders-of-magnitude speed advantage of the ELM architecture. 1 Introduction Modeling stiff ODEs/PDEs with sharp gradients poses severe resolution challenges (Madhavi et al., 2025). While high-fidelity classical methods exist, they are often cost-prohibitive due to stability constraints and complex meshing requirements (Yagawa and Okuda, 2011). This motivates the development of mesh-free, model-based solvers to efficiently resolve stiff dynamics. Physics-Informed Neural Networks (PINNs) (Raissi et al., 2019) offer a versatile mesh-free framework but suffer from slow training and hyperparameter sensitivity. Recent benchmarks indicate they often require significantly more wall-clock time than classical solvers to reach comparable accuracy, particularly for stiff PDEs (McGreivy and Hakim, 2024; Karnakov et al., 2024). Furthermore, spectral bias and optimization pathologies frequently lead to the under-resolution of sharp gradients, even in overparameterized networks (Krishnapriyan et al., 2021) Physics-Informed Extreme Learning Machines (PIELMs) (Dwivedi and Srinivasan, 2020) substitute iterative backpropagation with a closed-form linear least-squares solution, achieving orders-of-magnitude acceleration. Yet, the efficacy of PIELMs is fundamentally limited by their initialization: random, physics-agnostic hidden features often suffer from spectral mismatch with the underlying stiff dynamics, leading to ill-conditioned representations (Huang et al., 2006; Dwivedi et al., 2025b). Although Radial Basis Function variants like RBF-PIELMs (Dwivedi et al., 2025a; Srinivasan et al., 2025) introduce interpretable, localized receptivity, they typically rely on static node allocations that lack the flexibility to track evolving discontinuities and require manually designed problem-specific heuristics. Consequently, there remains a critical need for principled, data-driven adaptation mechanisms that can dynamically align basis functions with the moving wavefronts of singular perturbation problems while retaining the computational efficiency of linear solvers. To tackle these challenges, this work introduces the Gaussian Mixture Model Adaptive PIELM (GMM-PIELM), a probabilistic training framework that treats the PDE residual field as an error density over the computational domain. We postulate that the log⁡(1+|residual|) (1+|residual|) field can be interpreted as unnormalized probability density function representing the “location of physics”. We then learn this distribution using an Expectation–Maximisation (E-M) based algorithm. GMM-PIELM concentrates hidden-unit centers in regions of high numerical error. This adaptive redistribution improves the conditioning of the hidden-layer system and enhances expressivity exactly where the solution exhibits stiff behavior, while retaining the linear least-squares solve of the ELM architecture (unlike Tang et al. (2023)). Compared to KAPI-ELM (Dwivedi et al., 2025b), that relies on iterative Bayesian optimization for center placement, our method relies on unsupervised learning of the probability density of the residual field. As demonstrated on 1D singularly perturbed convection–diffusion equations, the proposed method attains much lower L2L_2 errors than RBF-PIELM while capturing the complex boundary layer. Contributions This paper presents a EM-based algorithm to adaptively sample kernels for PIELMs. Specifically: • We present a fast and interpretable residual field based expectation maximization algorithm to adaptively sample kernels for solving stiff PDEs using RBF-PIELMs • We evaluate the capability and efficiency of the algorithm against RBF-PIELMs using a 1D boundary layer problem with a single and double boundary layer as benchmark. 2 Mathematical Formulation We consider a stationary PDE defined on a bounded domain Ω⊂ℝd ^d with boundary ∂Ω∂ : ℒ[u](x)=f(x),x∈Ωℬ[u](x)=g(x),x∈∂ΩL[u](x)=f(x), x∈ [u](x)=g(x), x∈∂ (1) where ℒL is a linear differential operator, ℬB is a boundary operator, f(x)f(x) is a source term, and g(x)g(x) is the boundary data. In RBF-PIELM, We approximate the solution u(x)u(x) using a Single-Hidden-Layer Feedforward Network (SLFN) with N hidden neurons: u^(x;β)=∑j=1Nβjϕj(x)=∑j=1Nβjexp⁡(−‖x−x0j‖22sj2) u(x;β)= _j=1^N _j _j(x)= _j=1^N _j (-\|x-x_0^j\|^2 2s_j^2 ) (2) where βj _j are the hidden unit weights and ϕj(x) _j(x) are non-linear activations. x0jx^j_0 and sj2s_j^2 are the parameters of RBF kernel. Sampling these points randomly poses challenge especially for stiff problems (refer Appendix Section A.1). Dwivedi et al. (2025a); Srinivasan et al. (2025) use a manually designed heuristic to sample these points. But these require prior knowledge of underlying physics and is challenging to adapt in dynamic problems. 2.1 GMM-PIELM: Gaussian Mixture Model-Based Kernel Adaptation The accuracy of the approximation u u in the previous section is heavily contingent on the placement of the basis centers x0jx_0^j and the choice of widths sjs_j, especially for stiff problems. We address this by proposing a probabilistic framework that allocates resources proportional to the local physical complexity of the problem. Probabilistic Formulation Let ℛ(x;θ)=ℒ[u^θ](x)−f(x)R(x;θ)=L[ u_θ](x)-f(x) denote the PDE residual field for the current approximation u^θ u_θ. Standard physics-informed approaches minimize the global L2L_2 norm of this residual, effectively averaging errors across the domain. However, for singularly perturbed problems, this global metric is dominated by smooth regions, often washing out the high-frequency errors localized at shocks or boundary layers (Krishnapriyan et al., 2021). To address this, we postulate that the log⁡(1+|residual|) (1+|residual|) field acts as an unnormalized probability density function (PDF) representing the spatial concentration of approximation error. We provide justification for the choice of log -transform in Appendix Section A.5. We define the Residual Energy Density, pres(x)p_res(x), as: pres(x)=log⁡(1+|ℛ(x;θ)|)Z,whereZ=∫Ωlog⁡(1+|ℛ(z;θ)|)z.p_res(x)= (1+|R(x;θ)|)Z, Z= _ (1+|R(z;θ)|)\,dz. (3) Intuitively, pres(x)p_res(x) maps the “location of physics,” highlighting regions where the spectral bandwidth is insufficient to capture the underlying dynamics (Nabian and Meidani, 2019; Wu et al., 2023). GMM-PIELM Under this hypothesis, the optimal allocation of hidden neurons corresponds to a distribution of basis centers jj=1N\c_j\_j=1^N that maximizes the likelihood of sampling from this residual landscape. Hence we model this distribution as a mixture of gaussian x0j∼p(x;Θ)=∑k=1Kπk(x∣μk,Σk),x_0^j p(x; )= _k=1^K _k\,N(x _k, _k), (4) where Θ=πk,μk,Σkk=1K =\ _k, _k, _k\_k=1^K are the mixing coefficients, means, and covariances, respectively. To fit this model to the evolving solution, we employ a weighted EM framework to maximize the residual-weighted log-likelihood. In the E-step, we evaluate the current approximation and compute the ”responsibility” qikq_ik—the posterior probability that the i-th collocation point xix_i (with residual weight wi=log⁡(1+|ℛ(xi)|)w_i= (1+|R(x_i)|)) belongs to the k-th Gaussian component: qik=πk(xi∣μk,Σk)∑ℓ=1Kπℓ(xi∣μℓ,Σℓ).q_ik= _k\,N(x_i _k, _k) _ =1^K _ \,N(x_i _ , _ ). (5) In the subsequent M-step, we update the GMM parameters to concentrate the components in regions of high error. The updated means μk∗ _k^* and covariances Σk∗ _k^* are computed as: Nk=∑i=1Nevalwiqik,μk∗=1Nk∑i=1Nevalwiqikxi,Σk∗=1Nk∑i=1Nevalwiqik(xi−μk∗)(xi−μk∗)⊤N_k= _i=1^N_evalw_iq_ik, _k^*= 1N_k _i=1^N_evalw_iq_ikx_i, _k^*= 1N_k _i=1^N_evalw_iq_ik(x_i- _k^*)(x_i- _k^*) (6) where NevalN_eval is the number of evaluation points on a dense grid. The derivation of the parameter updates is provided in Appendix Section A.4. Finally, we augment the adaptive sampling with a set of uniformly distributed centers to ensure global domain coverage and prevent basis depletion in low-residual regions. We then determine the local kernel width sjs_j based on the k-nearest neighbor distance to ensure consistent overlap even in highly clustered arrangements. 3 Numerical Experiments We evaluate the capability of the GMM-PIELM framework against RBF-PIELM on the singularly perturbed 1D steady-state convection–diffusion equation with single and double boundary layers. It is a canonical benchmark for validating numerical schemes designed for stiff dynamics (Roos et al., 2008). The underlying PDE is defined on the domain x∈(0,1)x∈(0,1) as: −νuxx+ux=0-ν u_x+u_x=0 (7) where ν>0ν>0 represents the diffusion coefficient. The exact analytical solution is given in Dwivedi et al. (2025b). We assess performance in the stiff regimes of ν=10−4ν=10^-4. This equation serves as a fundamental model for high-Péclet-number transport phenomena encountered in fluid mechanics, isolating the physics of boundary layer formation where advective forces dominate diffusive ones. The primary computational challenge in this regime is the formation of an exponentially thin boundary layer of width δ∼(ν)δ (ν) near the outlet (x=1x=1). Resolving this feature typically necessitates prohibitive mesh densities (N≫1/νN 1/ν) for uniform solvers, while learning-based methods like PINNs suffer from pronounced spectral bias, often failing to capture the high-frequency transition without specialized loss weighting (Krishnapriyan et al., 2021; Wang et al., 2022). Similarly, standard PIELMs with randomized initialization frequently lack sufficient basis support in this narrow region, leading to ill-conditioned linear systems and degraded accuracy (refer Appendix Section A.1). Refer Appendix Section A.6 for implementation details. Results We observe from figures 1(a) and 1(c) that our method (GMM-PIELM) is able to capture the sharp boundary layer at x=1x=1 (x=0,1x=0,1 in case of double boundary layer) as opposed to our baseline (RBF-PIELM). Our solution achieves upto 77 orders of magnitude better accuracy as compared to RBF-PIELM (refer Table 1). Additionally, we observe from figures 1(b) and 1(d) that the learned GMM distribution has similar structure to the error. Boundary Condition Method Mean L2L_2 Error (RMSE) Time (s) Single (N=300N=300) Baseline RBF-PIELM 5.00×10−15.00× 10^-1 0.0710.071 Ours (GMM-adaptive) 2.73×10−82.73× 10^-8 0.6960.696 Double (N=500N=500) Baseline RBF-PIELM 1.01×10−51.01× 10^-5 0.3120.312 Ours (GMM-adaptive) 1.04×10−91.04× 10^-9 1.7761.776 Table 1: Comparison of Baseline vs. GMM-adaptive RBF-PIELM for single and double boundary layer problem (ν=10−4ν=10^-4), showcasing significant improvement in accuracy. (a) Solution profile for single boundary layer (b) Error density and learned distribution for single boundary layer (c) Solution profile for double boundary layer (d) Error density and learned distribution for double boundary layer Figure 1: Performance analysis of GMM-PIELM on the 1D convection-diffusion equation with single and double boundary layer. 4 Conclusion This work introduces the Gaussian Mixture Model Adaptive PIELM (GMM-PIELM), a framework that treats the PDE residual as a probability density representing the ”location of physics.” By interpreting the log⁡(1+|residual|) (1+|residual|) field as an unnormalized PDF, we use an EM algorithm to dynamically concentrate hidden-unit centers and widths in regions of high numerical error. This approach addresses the fundamental initialization limitations of standard Physics-Informed Extreme Learning Machines; instead of relying on physics-agnostic random sampling or heuristic node placement, the algorithm tracks sharp gradients in solution automatically. Numerical experiments on 1D singularly perturbed convection–diffusion equations demonstrate that GMM-PIELM achieves L2L_2 errors up to 77 orders of magnitude lower than baseline RBF-PIELMs, successfully resolving thin boundary layers (ν=10−4ν=10^-4) that standard methods fail to capture. While the adaptive process incurs a moderate computational overhead, it retains the fundamental speed advantages of the ELM architecture over gradient-based training. Consequently, GMM-PIELM offers a highly efficient and robust alternative for solving stiff, multi-scale physical systems that typically defy traditional mesh-less methods. Future Work Future Work will focus on extending this probabilistic framework to time-dependent PDEs, allowing the GMM centroids to track moving wavefronts in real-time. Additionally, we aim to investigate the scalability and stability of this approach to high-dimensional problems and complex geometries to enable it’s use across multitude of applications. References V. Dwivedi, B. Sixou, and M. Sigovan (2025a) Curriculum learning-driven PIELMs for fluid flow simulations. Neurocomputing 616, p. 130924. External Links: Document Cited by: §A.6, §1, §2. V. Dwivedi, B. Srinivasan, et al. (2025b) Kernel-adaptive PI-ELMs for forward and inverse problems in PDEs with sharp gradients. arXiv preprint arXiv:2507.10241. Cited by: §1, §1, §3. V. Dwivedi and B. Srinivasan (2020) Physics informed extreme learning machine (PIELM) – a rapid method for the numerical solution of partial differential equations. Neurocomputing 391, p. 96–118. External Links: Document Cited by: §1. G. Huang, Q. Zhu, and C. Siew (2006) Extreme learning machine: theory and applications. Neurocomputing 70, p. 489–501. External Links: Document Cited by: §1. P. Karnakov, S. Litvinov, and P. Koumoutsakos (2024) Solving inverse problems in physics by optimizing a discrete loss: fast and accurate learning without neural networks. PNAS Nexus 3 (1), p. pgae005. External Links: Document Cited by: §1. A. Krishnapriyan, A. Gholami, S. Zhe, R. Kirby, and M. W. Mahoney (2021) Characterizing possible failure modes in physics-informed neural networks. In Advances in Neural Information Processing Systems, Vol. 34, p. 26548–26560. Cited by: §1, §2.1, §3. M. V. D. N. S. Madhavi, M. S. Ali, S. L. Dhondge, V. N. Deshmukh, G. R. Gandhe, and P. V. S. Sairam (2025) Advanced numerical methods for solving nonlinear partial differential equations in fluid mechanics: applications in aerospace engineering. International Journal of Applied Mathematics 38 (3s). External Links: Link Cited by: §1. N. McGreivy and A. Hakim (2024) Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations. Nature Machine Intelligence 6 (10), p. 1256–1269. External Links: Link Cited by: §1. M. A. Nabian and R. Meidani (2019) Efficient training of physics-informed neural networks via adaptive importance sampling. Computer-Aided Civil and Infrastructure Engineering 34 (12), p. 1026–1037. Cited by: §2.1. M. Raissi, P. Perdikaris, and G. E. Karniadakis (2019) Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, p. 686–707. External Links: Document Cited by: §1. H. Roos, M. Stynes, and L. Tobiska (2008) Robust numerical methods for singularly perturbed differential equations: convection-diffusion-reaction and flow problems. 2nd edition, Springer Series in Computational Mathematics, Vol. 24, Springer-Verlag, Berlin, Heidelberg. External Links: Document Cited by: §3. A. G. Srinivasan, V. Dwivedi, and B. Srinivasan (2025) Deep vs. shallow: benchmarking physics-informed neural architectures on the biharmonic equation. arXiv preprint arXiv:2510.04490. Cited by: §1, §2. K. Tang, X. Wan, and C. Yang (2023) DAS-pinns: a deep adaptive sampling method for solving high-dimensional partial differential equations. Journal of Computational Physics 476, p. 111868. External Links: ISSN 0021-9991, Document, Link Cited by: §1. S. Wang, X. Yu, and P. Perdikaris (2022) When and why PINNs fail to train: a neural tangent kernel perspective. Journal of Computational Physics 449, p. 110768. Cited by: §3. C. Wu, M. Zhu, Q. Tan, Y. Ktha, and E. Lu (2023) A comprehensive review of physics-informed neural networks: perspectives on the failure modes and solution strategies. arXiv preprint arXiv:2303.08823. Cited by: §2.1. G. Yagawa and H. Okuda (2011) Computational performance of free mesh method applied to continuum mechanics problems. Interaction and Multiscale Mechanics 4 (1), p. 1–28. External Links: Document Cited by: §1. Appendix A Appendix A.1 The Condition Number Problem To determine β=[β1,…,βL]Tβ=[ _1,…, _L]^T, we substitute the ansatz u u into the PDE and boundary conditions and evaluate them at a set of collocation points. Let xf(i)i=1Nf\x_f^(i)\_i=1^N_f be interior points and xb(k)k=1Nb\x_b^(k)\_k=1^N_b be boundary points. For a linear operator ℒL, the residual constraints are linear in β: ∑j=1Lβjℒ[ϕj](xf(i))=f(xf(i)) _j=1^L _jL[ _j](x_f^(i))=f(x_f^(i)) (8) ∑j=1Lβjℬ[ϕj](xb(k))=g(xb(k)) _j=1^L _jB[ _j](x_b^(k))=g(x_b^(k)) (9) This system can be written in matrix form as β=Hβ=T, where: =[ℒ[ϕ1](xf(1))…ℒ[ϕL](xf(1))⋮⋱⋮ℒ[ϕ1](xf(Nf))…ℒ[ϕL](xf(Nf))λℬ[ϕ1](xb(1))…λℬ[ϕL](xb(1))⋮⋱⋮],=[f(xf(1))⋮f(xf(Nf))λg(xb(1))⋮]H= bmatrixL[ _1](x_f^(1))&…&L[ _L](x_f^(1))\\ & & \\ L[ _1](x_f^(N_f))&…&L[ _L](x_f^(N_f))\\ [ _1](x_b^(1))&…& [ _L](x_b^(1))\\ & & bmatrix, = bmatrixf(x_f^(1))\\ \\ f(x_f^(N_f))\\ λ g(x_b^(1))\\ bmatrix (10) Here, λ is a penalty weight for boundary conditions. Since usually Nf+Nb>LN_f+N_b>L, the system is overdetermined. The optimal weights in the least-squares sense are obtained via the Moore-Penrose pseudo-inverse: β∗=†=(T)−1Tβ^*=H T=(H^TH)^-1H^TT (11) The accuracy of the least-squares solution depends heavily on the condition number κ()κ(H). In stiff PDEs, if the collocation points are uniformly distributed, the rows of H corresponding to the smooth regions are well-behaved, but the rows corresponding to the boundary layer (where gradients are huge) are sparse. If only a few points fall into the layer, the matrix H fails to capture the high-frequency dynamics, essentially treating the layer as noise. This results in a poor approximation β∗β^* that smooths over the shock. To fix this, we must increase the row density in the shock region, thereby explicitly constraining the solver to respect the physics in that zone. A.2 Algorithmic Implementation This probabilistic feedback loop is formalized in Algorithm 1. The procedure alternates between a fast linear solve (ELM step) and a statistical adaptation of the basis functions (EM step) until the residual energy stabilizes. Algorithm 1 GMM-PIELM Adaptive Solving 1:Operators ℒ,ℬL,B; Iterations T; Mixing Ratio α 2:Initialize: Centers x0(0)x_0^(0) (uniform), Widths sj(0)s_j^(0) (constant) 3:for t=0t=0 to T do 4: // Step 1: Linear Solve (PIELM) 5: Construct matrices Hin,HbdH^in,H^bd using current x0j,sj\x_0^j,s_j\ 6: Solve (t)=arg⁡min⁡‖H−‖22w^(t)= _w\|Hw-b\|_2^2 7: Construct approx: u^(t)(x)=∑wjϕ(x;x0j,sj) u^(t)(x)=Σ w_jφ(x;x_0^j,s_j) 8: // Step 2: Assessment 9: Evaluate residual ℛ(x;θ)=ℒ[u^(t)]−fR(x;θ)=L[ u^(t)]-f on grid 10: Compute density pres(x)∝log⁡(1+|ℛ(x;θ)|)p_res(x) (1+|R(x;θ)|) 11: if t<Tt<T then 12: // Step 3: Adaptation (EM & Hybrid Sampling) 13: Fit GMM parameters Θ to samples from pres(x)p_res(x) 14: Sample xgmm∼p(x;Θ)x_gmm p(x; ) and xuni∼(Ω)x_uni ( ) 15: Update centers: x0(t+1)←αxgmm∪(1−α)xunix_0^(t+1)←α x_gmm∪(1-α)x_uni 16: Update widths: sj(t+1)←β⋅distk(x0j)+ϵs_j^(t+1)←β·dist_k(x_0^j)+ε 17: end if 18:end for 19:Return Final approximation u^(T) u^(T) A.3 Compute Specifications For Experiments, we run the experiment only on a CPU namely, a Ryzen 7 5800H with 16GB RAM. All processes are run on a single thread and process. A.4 Derivation of EM Steps The objective of the GMM-PIELM framework is to adaptively distribute the centers of the radial basis functions to match the spatial distribution of the PDE error. We postulate that the squared PDE residual field represents an unnormalized probability density function (PDF) indicating the “location of physics.” Let the Residual Energy Density, pres(x)p_res(x), be defined as: pres(x)=log⁡(1+|ℛ(x;θ)|)Z,whereZ=∫Ωlog⁡(1+|ℛ(z;θ)|)z.p_res(x)= (1+|R(x;θ)|)Z, Z= _ (1+|R(z;θ)|)\,dz. (12) We approximate this target distribution using a Gaussian Mixture Model (GMM), p(x;Θ)p(x; ), parameterized by Θ=πk,μk,Σkk=1K =\ _k, _k, _k\_k=1^K: p(x;Θ)=∑k=1Kπk(x|μk,Σk)p(x; )= _k=1^K _kN(x| _k, _k) (13) We seek the parameters Θ that make the model distribution p(x;Θ)p(x; ) best approximate the target residual distribution pres(x)p_res(x). The information-theoretic measure of the difference between two probability distributions is the Kullback-Leibler (KL) Divergence. We minimize the KL divergence from the true distribution presp_res to the model pΘp_ : Θ∗ ^* =argminΘDKL(pres||pΘ) = _ D_KL(p_res||p_ ) (14) =arg⁡minΘ∫Ωpres(x)ln⁡(pres(x)p(x;Θ))x = _ _ p_res(x) ( p_res(x)p(x; ) )dx (15) =arg⁡minΘ⁡[∫Ωpres(x)ln⁡pres(x)x−∫Ωpres(x)ln⁡p(x;Θ)x] = _ [ _ p_res(x) p_res(x)dx- _ p_res(x) p(x; )dx ] (16) The first term represents the entropy of the residual field. Since the residual ℛ(x)R(x) is fixed during the adaptation step (it depends only on the previous iteration’s weights), this term is constant with respect to Θ . Minimizing the KL divergence is therefore equivalent to maximizing the second term (the cross-entropy): Θ∗=arg⁡maxΘ∫Ωpres(x)ln⁡p(x;Θ)x ^*= _ _ p_res(x) p(x; )dx (17) We approximate the integral using a dense set of collocation points xii=1N\x_i\_i=1^N via a Riemann sum. Substituting pres(xi)∝log⁡(1+|ℛ(xi)|)p_res(x_i) (1+|R(x_i)| ): Θ∗≈arg⁡maxΘ∑i=1Nlog⁡(1+|ℛ(xi)|)ln⁡p(xi;Θ) ^*≈ _ _i=1^N (1+|R(x_i)| ) p(x_i; ) (18) Defining the weight wi=log⁡(1+|ℛ(xi)|)w_i= (1+|R(x_i)|), we arrive at the Weighted Log-Likelihood objective function: ℒ(Θ)=∑i=1Nwiln⁡(∑k=1Kπk(xi|μk,Σk))L( )= _i=1^Nw_i ( _k=1^K _kN(x_i| _k, _k) ) (19) Direct maximization of ℒ(Θ)L( ) is intractable due to the summation inside the logarithm. We employ the EM algorithm. We introduce latent variables zi∈1,…,Kz_i∈\1,…,K\ indicating the component assignment for point xix_i. The expected complete-data log-likelihood (the Q-function) for the weighted dataset is: Q(Θ,Θ(t))=∑i=1Nwi∑k=1Kqik[ln⁡πk+ln⁡(xi|μk,Σk)]Q( , ^(t))= _i=1^Nw_i _k=1^Kq_ik [ _k+ (x_i| _k, _k) ] (20) where qik=P(zi=k|xi,Θ(t))q_ik=P(z_i=k|x_i, ^(t)) is the responsibility computed in the E-step (Eq. 5 in the paper). We maximize Q with respect to μk _k and Σk _k by setting the gradients to zero. Consider the terms in Q dependent on μk _k: J(μk)=∑i=1Nwiqik(−12(xi−μk)TΣk−1(xi−μk))J( _k)= _i=1^Nw_iq_ik (- 12(x_i- _k)^T _k^-1(x_i- _k) ) (21) Taking the derivative with respect to μk _k: ∂J∂μk ∂ J∂ _k =∑i=1NwiqikΣk−1(xi−μk) = _i=1^Nw_iq_ik _k^-1(x_i- _k) (22) Setting ∂J∂μk=0 ∂ J∂ _k=0 and multiplying by Σk _k: ∑i=1Nwiqik(xi−μk) _i=1^Nw_iq_ik(x_i- _k) =0 =0 (23) ∑i=1Nwiqikxi _i=1^Nw_iq_ikx_i =μk∑i=1Nwiqik = _k _i=1^Nw_iq_ik (24) Solving for μk _k: μk∗=∑i=1Nwiqikxi∑i=1Nwiqik _k^*= _i=1^Nw_iq_ikx_i _i=1^Nw_iq_ik (25) Consider the terms dependent on Σk _k: J(Σk)=∑i=1Nwiqik(−12ln⁡|Σk|−12(xi−μk)TΣk−1(xi−μk))J( _k)= _i=1^Nw_iq_ik (- 12 | _k|- 12(x_i- _k)^T _k^-1(x_i- _k) ) (26) Using the matrix derivative identities ∂ln⁡|Σ|∂Σ=Σ−1 ∂ | |∂ = ^-1 and ∂aTΣ−1a∂Σ=−Σ−1aaTΣ−1 ∂ a^T ^-1a∂ =- ^-1a^T ^-1: ∂J∂Σk ∂ J∂ _k =∑i=1Nwiqik(−12Σk−1+12Σk−1(xi−μk)(xi−μk)TΣk−1) = _i=1^Nw_iq_ik (- 12 _k^-1+ 12 _k^-1(x_i- _k)(x_i- _k)^T _k^-1 ) (27) Setting to zero and multiplying by Σk _k on both sides: ∑i=1NwiqikI _i=1^Nw_iq_ikI =∑i=1Nwiqik(xi−μk)(xi−μk)TΣk−1 = _i=1^Nw_iq_ik(x_i- _k)(x_i- _k)^T _k^-1 (28) Σk(∑i=1Nwiqik) _k ( _i=1^Nw_iq_ik ) =∑i=1Nwiqik(xi−μk)(xi−μk)T = _i=1^Nw_iq_ik(x_i- _k)(x_i- _k)^T (29) Solving for Σk _k: Σk∗=∑i=1Nwiqik(xi−μk∗)(xi−μk∗)T∑i=1Nwiqik _k^*= _i=1^Nw_iq_ik(x_i- _k^*)(x_i- _k^*)^T _i=1^Nw_iq_ik (30) These equations exactly match Equation 6 in the GMM-PIELM paper formulation. A.5 Justification on log -transform pres(x)∝log⁡(1+|ℛ(x)|)p_res(x) (1+|R(x)|) is a critical mechanism for dynamic range compression, particularly in the stiff regimes targeted by our framework. Empirical tests using the raw squared residual field as the density concentrated the all the centers at the boundary, since the residuals in singularly perturbed problems vary by several orders of magnitude between the boundary layer and the rest of the domain. This led to a ”collapse” of the basis distribution where global domain information was lost and the linear system became ill-conditioned in smoother regions. By applying the log transform, we successfully damp these extreme variations, allowing the Gaussian Mixture Model to resolve the ”shoulders” of the error distribution rather than just the singularity. This ensures that while high-gradient regions are prioritized, sufficient information and basis support are maintained across the entire domain to yield a stable and accurate solution. A.6 Implementation Details Experimental Setup All experiments were implemented using scikit-learn for GMM estimation and scipy for linear algebra operations, with a fixed random seed of 42. For both single and double boundary layer experiments, we sample a fixed set of Neval=1500N_eval=1500 collocation points uniformly from the domain Ω=(0,1) =(0,1) to construct the linear system. Boundary conditions are enforced via soft constraints by appending the boundary coordinates to the collocation set. Baseline RBF-PIELM The baseline model initializes the hidden layer with N neurons (N=300N=300 for single, N=500N=500 for double boundary layers). The centers are sampled uniformly: x0j∼Uniform(0,1),j=1,…,N.x_0^j (0,1), j=1,…,N. (31) The widths are set to a constant value sj=|Ω|N×2.5s_j= | |N× 2.5, consistent with the overlap analogy presented in Dwivedi et al. (2025a). Proposed: Adaptive GMM-PIELM The adaptive method refines the network architecture over T iterations. In each iteration, we compute the residual field ℛ(x;θ)R(x;θ) on the dense grid and construct the residual energy density pres(x)p_res(x) as defined in Eq. equation 12. We then fit a GMM to data sampled from pres(x)p_res(x) and generate a new set of centers x0jj=1N\x_0^j\_j=1^N using a hybrid sampling strategy: • Residual-Guided (100α100α%): Sampled from the fitted GMM components to target high-error regions (boundary layers). • Global Uniform (100(1−α)100(1-α)%): Sampled uniformly from Ω to act as a safety net against basis depletion in smooth regions. To accommodate the non-uniform clustering of centers, the local widths sjs_j are adapted using k-Nearest Neighbors (k-N): sj=β⋅distk(x0j)+ϵ,s_j=β·dist_k(x_0^j)+ε, (32) where distkdist_k is the Euclidean distance to the k-th neighbor (k=2k=2) and β is an overlap scaling factor. Hyperparameters Table 2 details the specific parameters used for the Single Boundary Layer (BL) and Double BL experiments. Table 2: Hyperparameter settings for implementation. Parameter Single BL Double BL Diffusivity (ν) 10−410^-4 10−410^-4 Number of Neurons (N) 300 500 GMM Components (K) 8 16 Hybrid Ratio (α) 0.7 0.7 Adaptation Iterations (T) 3 3 Sigma Scaling (β) 1.1 1.5 Random Seed 42 42