Paper deep dive
Graph-Instructed Neural Networks for parametric problems with varying boundary conditions
Francesco Della Santa, Sandra Pieraccini, Maria Strazzullo
Abstract
Abstract:This work addresses the accurate and efficient simulation of physical phenomena governed by parametric Partial Differential Equations (PDEs) characterized by varying boundary conditions, where parametric instances modify not only the physics of the problem but also the imposition of boundary constraints on the computational domain. In such scenarios, classical Galerkin projection-based reduced order techniques encounter a fundamental bottleneck. Parametric boundaries typically necessitate a re-formulation of the discrete problem for each new configuration, and often, these approaches are unsuitable for real-time applications. To overcome these limitations, we propose a novel methodology based on Graph-Instructed Neural Networks (GINNs). The GINN framework effectively learns the mapping between the parametric description of the computational domain and the corresponding PDE solution. Our results demonstrate that the proposed GINN-based models, can efficiently represent highly complex parametric PDEs, serving as a robust and scalable asset for several applied-oriented settings when compared with fully connected architectures.
Tags
Links
- Source: https://arxiv.org/abs/2603.08304v1
- Canonical: https://arxiv.org/abs/2603.08304v1
PDF not stored locally. Use the link above to view on the source site.
Intelligence
Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 94%
Last extracted: 3/13/2026, 12:42:53 AM
Summary
The paper introduces Graph-Instructed Neural Networks (GINNs), specifically termed μBC-GINNs, to address the simulation of parametric Partial Differential Equations (PDEs) with varying boundary conditions. Unlike traditional Galerkin projection-based reduced order models (ROMs) that struggle with re-formulating discrete problems for changing boundaries, the GINN framework learns a mapping between parametric domain descriptions and PDE solutions, offering a scalable, robust, and real-time surrogate model.
Entities (5)
Relation Signals (3)
μBC-GINN → isatypeof → Graph-Instructed Neural Networks
confidence 100% · we adopt the Graph-Instructed Neural Network (GINN) architecture for defining our models... denoted as μBC-GINN.
Graph-Instructed Neural Networks → solves → Parametric Partial Differential Equations
confidence 95% · The GINN framework effectively learns the mapping between the parametric description of the computational domain and the corresponding PDE solution.
Reduced Order Models → isinefficientfor → Parametric Partial Differential Equations
confidence 90% · classical Galerkin projection-based reduced order techniques encounter a fundamental bottleneck... Parametric boundaries typically necessitate a re-formulation
Cypher Suggestions (2)
Find all neural network architectures proposed for solving parametric PDEs. · confidence 90% · unvalidated
MATCH (n:Methodology)-[:SOLVES]->(p:MathematicalModel {name: 'Parametric Partial Differential Equations'}) RETURN n.nameIdentify relationships between different surrogate modeling techniques. · confidence 85% · unvalidated
MATCH (a)-[r]->(b) WHERE a.entity_type = 'Methodology' OR b.entity_type = 'Methodology' RETURN a, r, b
Full Text
93,439 characters extracted from source content.
Expand or collapse full text
Graph-Instructed Neural Networks for parametric problems with varying boundary conditions Francesco Della Santa 1,b, ̊ , Sandra Pieraccini 1,b , Maria Strazzullo 1,b a Department of Mathematical Sciences, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy b Gruppo Nazionale per il Calcolo Scientifico INdAM, Piazzale Aldo Moro 5, 00185, Rome, Italy Abstract This work addresses the accurate and efficient simulation of physical phenomena governed by parametric Partial Differential Equations (PDEs) characterized by varying boundary conditions, where parametric instances modify not only the physics of the problem but also the imposition of boundary constraints on the computational domain. In such scenarios, classical Galerkin projection-based reduced order techniques encounter a fundamental bottle- neck. Parametric boundaries typically necessitate a re-formulation of the discrete problem for each new configuration, and often, these approaches are unsuitable for real-time applications. To overcome these limitations, we propose a novel methodology based on Graph-Instructed Neural Networks (GINNs). The GINN framework effectively learns the mapping between the parametric description of the computational domain and the corresponding PDE solution. Our results demonstrate that the proposed GINN-based models, can efficiently represent highly complex parametric PDEs, serving as a robust and scalable asset for several applied-oriented settings when compared with fully connected architectures. Keywords: Parametric PDEs, Graph Neural Networks, Boundary Conditions, Surrogate Models, Deep Learning 1. Introduction Parametric partial differential equations (PDEs) are essential for reliably describing complex physical phenomena in many scientific and industrial contexts, playing an ubiquitous role in applied sciences. The computational costs related to numerical simulations of parametric systems increase for complex problems featuring high variability with respect to the parametric instance. Clearly, the computational burden may be unacceptable when numerical simula- tions are related to time-consuming activities, such as parameter estimation, uncertainty quantification, or control. For this reason, many efforts have been made to conceive numerical methods to efficiently solve parametric PDEs. An intriguing enhancement to the simulation of parametric PDEs is the efficient treatment of parametric boundary conditions (BCs) with the final goal of a fast adaptation to several behaviors of the model without any re-meshing or re-assembling of the system. This contribution focuses on these models, where the parameter defines the portion of the boundary of the computational domain where the type and the value of the BCs might vary. The formulation directly adapts from [1], where the authors address a parametric changing of the position of Neumann boundary control. We here propose to extend the model to various combinations of BCs of Neumann, Dirichlet, and mixed types. Varying parametric settings can be used to describe many interesting phenomena. For example, the variation of baffle geometries can increase the performance of heat exchangers. Indeed, modi- fying the position and the length of the baffles drastically affects the heat conduction [2, 3]. Another example is the simulation of flows in porous fractured media, which is influenced by the position and the lengths of the intersections among the fractures [4, 5]. In addition, groundwater flow models often involve boundary conditions with uncertain locations, such as spring zones or extraction wells [6, 7]. Varying boundary models are crucial in this setting. More- over, in aerodynamics, variable actions of deflectors and flaps integrated into turbine blade profiles are commonly ̊ Corresponding author Preprint submitted to arXivMarch 10, 2026 arXiv:2603.08304v1 [math.NA] 9 Mar 2026 used to control loads and improve the energetic performance [8, 9]. These models can also find useful applications in urban and building managing, where the position of the openings directly affects the dispersion of pollutants and/or heat [10, 11]. When the parameter represents different physical or geometrical affine deformations, reduced order methods (ROMs) [12, 13] have been successfully used in many settings, such as medical, industrial, and environmental ap- plications (for an overview on ROMs and their applications, see, e.g., [14, 15, 16, 17] and the references therein). ROMs build a low-dimensional framework based on properly chosen parametric instances of a reference model to solve each new parametric instance in a fast and reliable manner. However, ROMs might fail and are not straight- forwardly applicable in these varying boundary PDEs, being inefficient and inaccurate with respect to the reference solutions. In this work, we propose a novel N-based strategy that addresses, in real-time, parametric PDE problems with varying boundary PDEs on a domain with fixed geometry. The method consists of building a N model that predicts the values of the PDE solution at the nodes of a fixed mesh, given any configuration of BCs and physical parameters belonging to an admissible set on which the N has been trained; we refer to such models as Parametric Boundary Conditions Neural Networks (μBC-NNs). There are no specific restrictions on the N architecture, except for the input and output layers; indeed, the input layer must be able to “read” all the parameters and BCs information, while the output layer must return the solution values at each mesh node. Due to the strict connection of the method with the definition of a mesh on the domain, we adopt the Graph-Instructed Neural Network (GINN) architecture for defining our models (see [18, 19, 20]), denoted asμBC-GINN. Deep Learning (DL) and Neural Network (N) models prove to be effective instruments for building surrogate models of highly complex, parametrized problems. These N-based approaches can be of many different types, ranging from Physics-Informed models that exploit the differential equations of the problem [21, 22, 23, 24, 25], to Neural Operators [26, 27, 28], to data-driven approaches [29, 30, 31, 32, 33]. However, for ML-guided techniques in the context of varying boundary conditions, the investigation is still limited. For example, we refer to [34] for the application of Neural Operators for data collecting different BCs. We strongly differentiate from the previous literature in several important aspects: • the proposed model of varying boundary PDEs in a parametric setting, which describes not only the action of different BCs acting on the spatial domain and the changing of the physics of the problem; • the integration of parameter variation in the GINN architecture, building a scalable and robust surrogate real- time model for different BCs actions based on sparse local information, linked to the mesh discretization. The effectiveness of the proposed approach is validated through several numerical tests and compared with non- GINN, more traditional neural N models based on 1D-Convolutional layers and Fully Connected (FC) layers. In particular, these comparisons highlight the advantages of using GINN architectures for this problem, also on a reduced number of training simulations. Indeed, embedding the mesh structure into the model through Graph-Instructed layers, together with the sparsity of the corresponding weight matrices, allows the construction of deeper models with improved performance compared to other models with a comparable number of parameters but based on 1D- Convolutional and FC layers. The paper is outlined as follows. In Section 2.1, we present the varying boundary PDEs at the continuous and discrete levels. Section 3 is devoted to a literature review on ROMs and surrogate models for PDEs, highlighting the limitations of current approaches in the proposed setting. In Section 4 we introduce the usage of NNs and their extension to the treated parametric setting, with a focus on GINNs. In Section 5, we test the robustness and the effectiveness of theμBC-GINN though several tests: a linear PDE with parametric varying Neumann boundaries, a linear PDE with parametric varying mixed boundaries, and a nonlinear PDE for parametric varying mixed boundaries. Conclusions follow in Section 6. 2. Problem Formulation This section provides the continuous and discrete formulation of the problem with varying boundary conditions and their main features. 2 2.1. Continuous problem Let us consider a parametric PDE G : Uˆ P Ñ U ̊ , for a suitable Hilbert space U and a proper parametric space P. We denote the solution of the PDE as u. The general problem formulation reads: given a parameterμ P P, and a forcing term f P U ̊ , find u “ upμq P U such that Gpu,μq “ fpμq in U ̊ ,(1) for Gpu,μq P U ̊ , possibly nonlinear in u, equipped with boundary conditions. Problem (1) can be interpreted in a variational formulation as: forμ P P, find u P U such that gpu, v ;μq ́ fpv ;μq : “ x Gpu,μq, v y ́ x Fpμq, v y “ 0 for all v P U,(2) with x ̈, ̈ y denoting the duality pairing between U ̊ and U, and Fpμq P U ̊ comprises the contributions of the forcing term and the boundary conditions. In particular, defining Γ μ b Ď BΩ the parameter varying boundary, we consider a parameter of the formμ “ pμ φ ,μ b ,μ v q P P “ R p φ ˆ F pwc pΓ μ b q ˆ F pwc pΓ μ b q, with p φ ě 1, and F pwc pΓ μ b q being the space of piece-wise constant functions defined on Γ μ b . Specifically,μ φ acts on the physics of the problem, μ b is a piece-wise constant function that determines where and what kind of boundary conditions are applied (e.g., value 0 for Dirichlet, value 1 for Neumann), and μ v assigns the value of the boundary condition, say z N pμq and z D pμq for Neumann and Dirichlet conditions, respectively. The whole boundary is the union of the portion affected by the action of μ b and μ v , and a possibly empty portion with fixed boundary conditions Γ fix , i.e., BΩ “ Γ μ b Y Γ fix . In other words, Γ fix is a boundary region, of Neumann or Dirichlet type, that is fixed and does not change the nature and properties with respect to μ b and μ v , with boundary value z fix pμq, which can only depend on the physical parameterμ φ . The parametric varying boundary Γ μ b is defined by two boundary regions, Γ μ b N and Γ μ b D , which may not be connected, featuring Neumann and Dirichlet boundary conditions, respectively. The presented formulation admits that or Γ μ b N or Γ μ b D can be possibly empty. The description of Γ μ b is encoded in μ b . Clearly, Γ fix X Γ μ b N X Γ μ b D “ H. From this definition, we specify the forcing term as x Fpμq, v y “ x fpμq, v y ` x f N pμq, v y ` x f D pμq, v y ` x f fix pμq, v y ,(3) where x f N pμq, v y “ ż Γ μ b N z N pμqv ds, x f D pμq, v y “ ż Γ μ b D z D pμqv ds x f fix pμq, v y “ ż Γ fix z fix pμqv ds. (4) Figure 1 schematically represents a generic spatial domain Ω with the boundary BΩ under the action of two different parametersμ ̊ “ pμ ̊ φ ,μ ̊ b ,μ ̊ v q and μ “ pμ φ ,μ b ,μ v q. Remark 2.1. We remark that the continuous formulation reflects the problems treated in the numerical experiments. As a consequence, the discrete formulation, described in the following Section 2.2, also reflects this characteristic. However, the proposed strategies can be straightforwardly extended to other complex scenarios, where the physical parameter might be non-constant in space, as well as z N pμq and z D pμq be more involved depending on the quality of the data or the discetization used. 2.2. Discrete formulation We now focus on the algebraic discrete formulation of the weak problem (2) and of the discrete representation of parametric space P, which we denote with P h , which will be detailed later for the sake of presentation. Givenμ P P h and a tesselation, i.e., a mesh of the physical domain Ω with N h nodes, and considering N dof degrees of freedoms (dofs), one wants to find the approximated solution vector u “ upμq P R N dof such that Gpu;μq “ Fpμq,(5) 3 Ω Γ μ ̊ b D Γ μ ̊ b N Ω Γ μ b D Γ μ b N Γ fix Γ fix Figure 1: Representation of a general domain Ω for two values of μ b , i.e., μ b “ μ ̊ b (left) and μ b “μ b (right). The solid, the dotted, and the dashed-dotted lines represent Γ μ b D , Γ μ b N , and Γ fix , respectively. where G : R N dof ˆ P Ñ R N dof and Fpμq P R N dof are the discrete parametric system and the parametric forcing term of the system considered. In this setting, N dof and N h might not coincide. We remark that upμq “ ru 1 pμq, ̈ ̈ ̈ , u N dof pμqs J represents the value of the discrete solution for each dof. Moreover, we use u h pμq “ ru 1 h pμq, ̈ ̈ ̈ , u N h h pμqs J to denote the value of the solution in the nodes of the mesh, collected in the set M. SinceμBC-NNs do not depend on the numer- ical discretization strategy employed to solve (5), we here skip the details concerning the numerical approximation, and we postpone them to the numerical investigation section. The major aspect of the discretization is to provide a suitable discrete interpretation ofμ P P for the surrogate model. Indeed, at the discrete level, we can defineμ “ pμ φ ,μ b ,μ v q in P h Ă R P , with P “ p φ ` p b ` p v , consisting of: •μ φ P R p φ , where p φ is the number of physical parameters of the problem; •μ b P R p b , where p b is the number of mesh nodes in Γ μ b . We recall that this parameter indicates where the boundary conditions change and the type of boundary condition, Dirichlet or Neumann. We will give more details about the encoding strategy in Section 5. •μ v P R p v , which indicates the value of the boundary condition related to the nodes lying on Γ μ b . In particular, p v depends on the number n v of values in R k that the function μ v assumes on Γ μ b , where k ą 1 in case of vectorial PDEs with BCs characterized by vectors in R k ; for example, with scalar BCs, if Γ μ b is partitioned in one homogeneous Dirichlet BC, one homogeneous Neumann BC, and one non-homogeneous Neumann BC, we have p v “ n v ̈ k “ 3 ̈ 1 “ 3. For the sake of clarity, in Figure 2, we illustrate an example for a mesh on a square domain. The black crosses denote the nodes of M located in Γ fix (the continuous black line), where the boundary conditions are prescribed, while the circle and square dots denote the nodes of M that may vary the boundary conditions according to the parametersμ b “ pμ p1q b ,...,μ p7q b q P R 7 andμ v ; i.e., they are the nodes on Γ μ b (continuous, grey, thick line). For the example illustrated in Figure 2, squares denote Neumann BCs, while circles denote Dirichlet BCs; the color of the dots represents the BC (scalar) value contained inμ v P R 3 . In the next section, we propose a literature overview of intrusive and non-intrusive strategies to solve parametric PDEs, providing the core motivation of the employment ofμBC-NNs in this varying boundary setting. 3. State of the art for efficiently solving parametric PDEs In several fields based on PDEs, standard discretization techniques can result in high computational costs due to the large number of degrees of freedom N dof , and may be unsuitable for real-time and many-query applications. This section proposes an overview on the numerical surrogate models proposed to address the specific issue of efficient multiple parametric evaluation. 4 ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ μ p1q b μ p2q b μ p3q b μ p4q b μ p5q b μ p6q b μ p7q b Γ μ b Γ fix Figure 2: Schematic representation of a mesh discretization on a square domain Ω with a portion of the boundary under the action of the parameters μ b andμ v . This portion is Γ μ b , the continuous, grey, and thick line. The nodes that may change features are denoted with square and circle dots, while the nodes of the boundary with fixed features are denoted with black crosses. the color of the square and circle dotes denote the value of the BCs described byμ v . 3.1. Surrogate Models based on ROMs Classically, ROMs have being conceived to build a linear low-dimensional framework of dimension N ! N dofs based on proper chosen parametric solutions to provide a surrogate reduced representation of u h , say u N “ u N pμq, with u « Vu N , where V P R N dofs ˆN collects, column-wise, the basis functions of the reduced space. We stress thatμ has always been interpreted as a physical or geometrical parameter for fixed boundaries in a compact subset D of R P , with P ě 1. The ROM strategy is based on an efficient separation into: • an offline phase, that builds the matrix V and assembles quantities that are not related toμ. • an online stage, where the reduced system is assembled and solved for each new parametric instanceμ ̊ , finding u N pμ ̊ q efficiently, exploiting the pre-computed offline quantities. This offline-online decomposition is based on the affinity assumption. Namely, for the parameterμ P D, the system (2) must be represented as gpu, v;μq ́ fpv;μq “ Q g ÿ i“1 θ i pμqg i pu, vq ́ Q f ÿ i“1 θ i pμq f i pvq “ 0,(6) for Q g and Q f in N, with θ i : D Ñ R smooth real functions and g i p ̈, ̈q and f i p ̈q forms not depending on the parameter, comprising forcing terms and boundary conditions action. Assumption (6) is crucial, but it is not verified in our case. Indeed, the forcing terms in (4) depend on μ b and μ v , thus, need to be re-assembled for each new parametric instance. Many strategies can be employed to tackle this issue. For example, the Empirical Interpolation Method (EIM) [35], can be used to recover the affine decomposition of the system, allowing the application of standard ROMs techniques such as Proper Orthogonal Decomposition (POD) and Greedy [12]. Another approach builds the reduced vector coefficient u N not relying on a linear reduced system, but non-intrusively, as done in POD with interpolation [36, 37] or by POD-N [38]. Despite the non-intrusiveness, these approaches still rely on linear compressions and approximations. This might represent a bottleneck for problems with low reducibility [39]. For this reason, nonlinear ROMs have been extensively and successfully employed. They are based on a paradigm change in the reduced representation given by u « ψpu N q, with ψ : R N Ñ R N dofs being a nonlinear map. Among nonlinear ROMs, we mention local POD [40], kernel POD [41], registration methods [42], shifted POD [43], and nonlinear ROMs based on autoencoders have emerged and have been applied in several fields, see, e.g., [44, 45, 46, 47, 48, 49]. However, in the proposed context, several issues arise in terms of the suitability of ROMs. For example, classical ROMs can be applied only when Γ μ b D “ H and even in this simpler settings the EIM strategies might lead to a 5 large number of interpolation points around Γ μ b N to properly describe the system at hand, yielding inefficient online solutions, as already shown in [1] for varying Neumann control problems. In the same work, the complex behavior related to different Neumann boundary conditions results in a large reduced space, even once the affine assumption is recovered, and tailored strategies based on local POD must be employed to increase the accuracy of the solution. Another challenge adds to the problem under investigation: even when Γ μ b D ‰ H, we deal with a great variability in the parametric boundary conditions. The primary issue is the difficult representation of a globally coherent latent space for the solution varying with respectμ. We believe that this challenge affects the direct applicability of ROMs based on autoencoder architectures. Similarly, other nonlinear reduction techniques, such as registration-based approaches or shifted POD, rely on the existence of smooth alignment maps between solution fields. However, the nature of such mappings is not evident in the case we address. Finally, we remark that the case of changing the location of Dirichlet boundary conditions changes the number N dofs from one parameter to another. Thus, it cannot be addressed by any Galerkin projection-based ROM approach and, indeed, lacks an investigation in the ROMs community. 3.2. Surrogate Models based on ML Lately, ML-based surrogate models have been extensively exploited in numerical simulations, and NNs have been applied as direct solvers for approximated systems, offering highly efficient solutions in many applications [29, 30, 31, 33]. One of the most interesting innovation in using N in the PDE solution was to integrate the information of the residual of the PDE in the loss function, as introduced in [50], i.e., working with Physics informed Neural Networks (PINNs). While PINNs proved successful in many contexts [21, 22, 23, 24, 25], they suffer in representing solutions with high frequencies [51], convergence issues due to different loss components [51, 52], and complex dynamics [53]. All these issues are amplified in the extension to parametric problems and do not make PINNs a suitable option for the problem we are dealing with. Recently, there has been growing interest in Neural Operators (NOs), conceived as an extension of PINN to learn maps between function spaces [54]. NOs proved really effective in many contexts, see, e.g., [26, 27, 28]. Often these architectures are based on convolutional layers and, thus, they strongly rely on structured data. This hypothesis is restrictive in a PDE context, based on simulations on usually unstructured meshes, or for a real-data case scenario. For this reason, the application of Graph Neural Networks (GNN) has become more and more popular in the context of PDEs. GNN naturally suits the mesh design of nodes and edges. Recently, many contributions have been made to the field of GNN-based surrogates, see [55, 56, 57]. In the parametric setting, we list these notable contributions [58, 44, 49]. However, the problem of varying boundary poses several challenges, as already highlighted in Section 3. For these reasons, we decided to address the peculiar parametric nature of varying boundary conditions with novel architectures based on Graph-Instructed Neural Networks, which will be detailed in the following section. 4.μBC-NNs: NNs for Parametric PDEs with Parametric Boundary Conditions In this section, we describe the N architectures that we use as surrogate models for predicting a solution u h pμq, given the parametersμ P P h and the mesh with N h P N nodes for the domain Ω (see Section 2.2). A Parametric Boundary Conditions N (μBC-N) is a N characterized by an architecture that is able to “read” the discrete formulation described in Section 2.2 of a parametric PDE problem, and that returns an array p u h pμq P R N h ˆm such that p u h pμq « u h pμq, where m ě 1 allows for vector valued solutions. While the output encoding is trivial (the output layer just returns a vector if m “ 1, a matrix otherwise), the input encoding must consider all the information that the parameterμ P P h implicitly contains about the problem, and how it can vary in P h . After preliminary investigations, the input encoding adopted for a generic problem (see Section 2.2) is based on a characterization of all the mesh nodes with a vector of values pβ piq , d piq , n piq ,φ piq q, for i “ 1,..., N h , such that: • d piq P t0, 1u, and n piq P t0, 1u describe if the node is a Dirichlet node (d piq “ 1) or a Neumann node (n piq “ 1), respectively. If both are zeros, the node is not a boundary node. •β piq P R k is the vector of the BC values (Dirichlet or Neumann, depending on d piq and n piq , values). We have k ą 1 if BC values are denoted by vectors (i.e., when solving vectorial PDEs). By convention,β piq “ 0 if d piq “ n piq “ 0 (i.e., for a non-boundary mesh node). 6 •φ piq P R q is the vector of physical parameter values at the i-th mesh node. If these parameters are not depending on the mesh node’s coordinates in the domain (see Section 2), then q “ p φ and we haveφ piq “φ “μ φ constant for each i “ 1,..., N h . From preliminary analyses and experiments, the redundancy of information helps the N training, therefore this input encoding is also used for the case of constantφ piq (i.e., repeatingμ φ for each node). By convention, if there are no physical parameter values, each vectorφ piq becomes a scalar value φ “ 1. In summary, the input of aμBC-N is a matrix μ pencq B » — — — – β p1q T d p1q n p1q φ p1q T . . . . . . . . . . . . β pN h q T d pN h q n pN h q φ pN h q T fi ffi ffi ffi fl P R N h ˆpk`2`q ,(7) where the order of parametersβ piq , d piq , n piq ,φ piq illustrated in (7) is the one adopted in the implementations for the numerical experiments. Concerning the hidden layers of aμBC-N, there are no specific restrictions. Actually, any sequence of N layers that is able to transform inputs like (7) into arrays p u h pμq P R N h ˆm can be used. Nonetheless, not all N layers are able to process inputs like (7); in this work, for doing so, we consider the following types of layers: Graph-Instructed (GI) layers and 1-dimensional Convolutional (Conv1D) layers. These layers share the characteristic of seeing the N h -by-pk ` 2 ` q input matrix as a sequence of N h vectors in R k`2`q , and they process this sequence according to these principles: • GI layers: process and transform the input sequence of vectors following the adjacency of nodes in the mesh (for more details, see [18, 19, 20] and Section 4.1 in the following); • Conv1D layers: process and transform the input sequence of vectors by index proximity, depending on the size of the kernel (for more details, see [59]). Since mesh nodes’ indexing i “ 1,..., N h may not reflect proximity in the domain, we use Conv1D layers with kernel size equal to N h in order to gather all the mesh nodes together. From these two types of layers, we develop two distinct N architecture archetypes, respectively: one based on GI layers and one based on Conv1D and Fully Connected (FC) layers. Between the two N architectures, for approximately the same number of parameters, the one based on GI layers can achieve significantly greater depth; i.e., a substantially larger number of layers. This enables a more effective exploitation of deep (and residual) architectures. This is made possible by the sparsity of the mesh adjacency matrix, which allows the construction of GI layers with sparse weight tensors, thereby increasing model depth without increasing the parameter count. Moreover, numerical experiments show that the GI-based N archetype is also the one with the best performances (see Section 5), thanks to the embedding of the mesh’s geometrical properties into the N architecture. We continue this section by briefly recalling the mechanism of GI layers; then, we describe the N architec- ture archetypes based on GI layers and Conv1D-FC layers, respectively (Section 4.2). In the end, we conclude by describing the loss function used for training these N models (Section 4.3). 4.1. Graph-Instructed Layers and Graph-Instructed NNs A GINN is a N characterized by Graph-Instructed (GI) Layers. These layers are defined by an alternative graph- convolution operation introduced for the first time in [18]. Briefly, given a graph G “ pV, Eq, with nodes V , edges E, without self-loops and an associated input feature x i P R for each graph’s node v i P V , the graph-convolution operation of a GI layer consists in a message-passing procedure where each node v i P V sends to its neighbors a message equal to the input feature x i rescaled by a weight w i P R assigned to the node by the GI layer; then, the output feature corresponding to each node is obtained by summing up all the messages received by the neighbors, adding the bias, and applying the activation function. Therefore, the message-passing interpretation can be summarized by Figure 3 and the following node-wise equation x 1 i “ σ ́ ÿ jPN in piqYtiu x j w j ` b i ̄ ,(8) 7 where x 1 i is the output feature corresponding to node v i and N in piq is the set of indices such that j P N in piq if and only if e i j “ tv i , v j u is an edge of the graph (i.e., e i j P E). The symbol σ denotes the activation function σ : R Ñ R. x 1 w 1 x 2 w 2 x 4 w 4 σ p Σ ` b 1 q x 1 x 2 x 3 x 4 w 1 w 2 w 3 w 4 ř x 1 1 x 1 2 x 1 3 x 1 4 Figure 3: Visual representation of (8). Example with i “ 1 and a non-directed graph of four nodes. Assuming to apply (8) to each node of the graph, we can formally define the GI layers through a compact formu- lation. Given the adjacency matrix A P R VˆV of the graph G, V “ |V|, the basic and simplest version of GI layer with respect to G is a N layer that process one input feature per node and returns one output feature per node, and that is described by a function L GI : R V Ñ R V , such that L GI pxq “σ ́ p W T x` b ̄ “ “σ ́ p diagpwqpA` I V q q T x` b ̄ , (9) where x P R V denotes the vector of input features and: • w P R V is the weight vector, with the component w i associated to node v i P V , for each i “ 1,..., V. The vector w is used to build the weight matrix p W :“ diagpwqpA` I V q. We remark that the diagonal matrix diagpwq in the definition of p W is used only for describing in matrix form the multiplication of the i-th row of pA ` I V q by the weight w i , for each i “ 1,..., V. •σ : R V Ñ R V is the element-wise application of the activation function σ; • b P R V is the bias vector, such that b i is the bias associated to node v i , for each i “ 1,..., V. Looking better at equation (9), we observe that it is equivalent to the action of a “constrained” FC layer where the weights are the same if the connection is outgoing from the same unit, whereas it is zero if two units correspond to graph nodes that are not connected (see Figure 4); more precisely: p w i j “ # w i ,if a i j ‰ 0 or i “ j 0,otherwise ,(10) where a i j , p w i j denote the pi, jq-th element of A, p W, respectively. GI layers characterized by (9) can be generalized to read any arbitrary number K ě 1 of input features per node and to return any arbitrary number F ě 1 of output features per node. This property is crucial for the possibility of using GI layers as building blocks of ourμBC-N models, since they can process and transform easily matricial inputs like (7). Then, the action of a general GI layer is a function L GI : R VˆK Ñ R VˆF , with K ě 1 and F ě 1. Additionally, pooling and other operations can be added; in general, for more details about GI layers, see [18, 19, 20]. In particular, we point out that the number of weights of a GI layer is equal to VK F ` VF, while the number of weights of a FC layer of n out units, that is reading the outputs of a layer of n in units, is equal to n out n in ` n out ; therefore, if we consider the case of n in “ n out “ V and K F ` F ă V ` 1 (satisfied for sufficiently large graphs, in general), the GI layer has fewer weights to be trained if compared with the FC layer. Moreover, the adjacency matrices are 8 x 1 w 1 x 2 w 2 x 3 w 3 x 4 w 4 L GI (a) FC representation of GI layer p W T “ » — — — — — – w 1 w 2 0w 4 w 1 w 2 w 3 w 4 0w 2 w 3 w 4 w 1 w 2 w 3 w 4 fi ffi ffi ffi ffi ffi fl (b) Weight matrix of GI layer Figure 4: Visual representation of a GI layer as a “constrained” FC layer (subfigure (A)), with weight matrix defined by (10) (subfigure (B)). This figure is based on the same graph illustrated in Figure 3. typically sparse and, therefore, the weight tensor is typically sparse too. Then, it is possible to exploit the sparsity of this tensor to reduce the memory cost of the GINN implementation, storing the trainable parameters only (see [20]). Now, we point the attention of the reader to the fact that the GI layer definition is very general and does not ask for a non-weighted graph. Indeed, one of the main advantages of using a GI layer is that it exploits the zeros of the adjacency matrix A, “pruning” the connections between layer units according to the graph edges; therefore, the zero elements of A are in the same position independently on the usage of weights for the graph edges. This observation is important to build GINNs with respect to the mesh of a given domain. In particular, we prefer a weighted graph because we want to emphasize the difference between connections according to the distance of the mesh nodes in the domain Ω. Then, using a weighted graph with weights inversely proportional to the mesh edge lengths, we have that the action of the GI layers’ weights is rescaled with respect to the non-trainable edge weights ω i j P p0, 1s, corresponding to the non-zero elements of the graph’s adjacency matrix. We conclude this subsection recalling that, in a GINN, the relationship between the input features of a node and the output features of another node depends on the GINN’s depth and on the distance between these nodes in the graph. Specifically, from Proposition 1 in [18], it holds that the input features corresponding to node v i contribute to the computation of the output features corresponding to node v j if the number H of consecutive GI layers of the GINN is greater than or equal to the distance between v i and v j in the graph G (in number of edges); i.e., if H ě dist G pv i , v j q. Additionally, it is trivial to prove that this input-output relationship holds only if H ě dist G pv i , v j q. The importance of the dependency of the input-output relationship on the GINN’s depth and the distances between nodes in the graph is important for selecting the depth hyper-parameter H of the GINN. 4.2. Architecture Archetypes forμBC-NNs Here, we describe two N architecture archetypes based on GI layers and Conv1D-FC layers, respectively, for buildingμBC-NNs; i.e., surrogate models for approximating the solution u h pμq of a parametric PDE with parametrized BCs, given the vector of parametersμ P P h , and a fixed mesh of N h nodes (i.e., a graph with |V| “ V “ N h ). All the hyper-parameters of these architectures (e.g., activation functions, number of filters, etc.) have been chosen after preliminary analyses and studies. For each parametric PDE problem, we recall that the two N architecture archetypes are characterized by the same input encodingμ pencq P R N h ˆpk`2`q , see (7), and the same output encoding (a matrix in R N h ˆm ); therefore, we will focus on the hidden layers that characterize and distinguish the architectures. Moreover, the architectures’ archetypes have in common also the usage of Batch-Normalization layers in the residual blocks, and a set of layers for preprocessing and postprocessing operations (see Section 4.2.3 below). 4.2.1.μBC-GINN:μBC-N Based on GI Layers This N archetype is made of residual blocks (see [60]) of GI layers based on the weighted adjacency matrix of the problem’s mesh (weights ω i j P p0, 1s, inversely proportional to edge lengths); then, we denote these NNs asμBC- GINNs. An important characteristic of ourμBC-GINNs is the (preprocessed) inputs re-feeding every ρ P N hidden GI layers; indeed, during preliminary analyses and studies, this strategy demonstrated to improve considerably the 9 training performances of theμBC-GINN model. The main idea of the input re-feeding is to let the model remember the BC nature of the mesh nodes and to combine it with the values computed by the intermediate layers. Let H P N be the hyper-parameter characterizing the number of hidden GI layers of theμBC-GINN model; then, we set H “ “ 3 2 diampGq ‰ , where G is the mesh graph and diampGq is the maximum distance between two nodes in G. This choice of H is motivated by the outputs’ dependency from the input features, based on the GINN’s depth previously mentioned (see Section 4.1); indeed, with the first H 1 “ diampGq layers, we have the guarantee that any input feature contributes to the computation of the predicted output features of each graph node, while with the remaining H 2 “ “ 1 2 diampGq ‰ layers, we give to the N extra layers for transforming the features and improving the approximation of the target solutions. In summary, the architecture of aμBC-GINN is characterized by the hyper-parameters ρ, H, and the hyper- parameters shared by all the GI layers, i.e.: the activation function σ and the number F P N of output features per node. 4.2.2.μBC-N Based on Conv1D and FC Layers This N archetype is made of residual blocks (see [60]) of FC layers with N h units each, and we denote them as μBC-FCNNs. However, before the FC layers, Conv1D layers are adopted to process the inputμ pencq P R N h ˆpk`2`q . In particular, the number of Conv1D layers used to transform an inputμ pencq P R N h ˆpk`2`q into a vector in R N h is equal to the number of input features (i.e., k` 2` q Conv1D layers). Concerning theμBC-FCNNs models, H P N denotes the hyper-parameter characterizing the number of FC layers in the model’s main structure. The number H of FC layers used is set H ě 3, and such that the total number of trainable parameters of theμBC-FCNN is approximately the same, or slightly greater than, the number of trainable parameters of aμBC-GINN designed for the same PDE problem. We set the lower bound of three FC layers for having at least one residual block of FC layers; indeed, due to the large number of weights that characterizes FC layers, for large meshes the number of weights of aμBC-FCNN increases rapidly, and deep models are almost never obtained. After this sequence of FC-based residual blocks, an extra FC layer is added. In the end, if m “ 1 (i.e., u h pμq P R N h ), after the FC layer of index H ` 1, there is a last FC layer. Otherwise, if m ą 1, we continue with other m parallel sequences of m FC layers with a residual connection to the pH ` 1q-th FC layer; then, the m branches’ outputs are concatenated into an N h -by-m matrix. Therefore, the architecture of aμBC-FCNN is characterized by the hyper-parameter H, the number F of filters for the Conv1D layers, and the activation function σ of the Conv1D/FC layers. Remark 4.1. The input re-feeding, characteristic ofμBC-GINNs, is not used inμBC-FCNNs. Indeed, the usage of FC layers makes it prohibitive to re-insert the (preprocessed) inputμ pencq , because FC layers are not able to “read” such kind of inputs. The only option would be to use Conv1D layers again, but this approach would increase too much the complexity of the N architecture and, as a consequence, the number of trainable parameters. 4.2.3. Preprocessing and Postprocessing Operations Both the N architecture archetypes share a set of layers that can be used for performing preprocessing operations to the inputsμ pencq , preprocessing operations to the targets, and/or postprocessing operations to the outputs. The embedded preprocessing operations can be of many kinds, from classical ones (e.g., standardization or rescal- ing of the input/target values) to specific operations identified during preliminary analyses (e.g., change of the constant homogeneous Dirichlet/Neumann conditions into constant unitary values, switch the 0/1 convention for BC type to ̆1 convention, etc.). The decision to embed the preprocessing operation is made in order to speed up the online phase of trained models and to ease the imposition of the Dirichlet BCs in the outputs. For more details about the advantages of embedded preprocessing, see Remark 4.2 below. On the other hand, the embedded postprocessing operations are divided into two types: the first type is the inverse of the preprocessing operations applied, during the training, to the targets (if there are any); the second type constrains the N’s predictions to satisfy the Dirichlet BCs indicated by the inputs. Remark 4.2. Despite preprocessing operations are typically performed “outside” of the model and directly on the data, we preferred to include them “inside” the model in order to maintain the same input encodingμ pencq (see (7)) for all theμBC-N models, and avoiding the need of exporting alternate versions of the data or exporting preprocessing 10 operators in case of need of reproducibility. This choice reduces the risk of performing wrong preprocessing for a third user and, in practical contexts, permits exporting and easily sharing a trained model. Moreover, embedded prepro- cessing operations are particularly helpful for the postprocessing operation dedicated to imposing the Dirichlet BCs. Indeed, having these operations embedded in the model, the input is always defined by the convention described in (7); therefore, the postprocessing operation used for imposing the Dirichlet BCs do not need to invert any preprocessing operation and only have to “read-and-apply” the BC values for Dirichlet nodes. 4.3. Training Loss forμBC-NNs Thanks to the embedded preprocessing and postprocessing operations,μBC-N models can return an approxima- tion of the target solution u h pμq without the need of applying any inverse transformation on their outputs. Moreover, they can authomatically impose the Dirichlet BCs described in the inputμ pencq on their predicted output p u h pμq. For this reason, we trainμBC-NNs on a modified version of the Mean Square Error (MSE) function, for reducing the importance of prediction errors on Dirichlet nodes. For better understanding the criteria behind the new loss function, we have to distinguish four types of mesh nodes, given a PDE problem: • Internal mesh nodes; • Boundary nodes always associated to Neumann BC (denoted as only-Neumann nodes); • Boundary nodes always associated to Dirichlet BC (denoted as only-Dirichlet nodes); • Boundary nodes that can be associated either to Dirichlet BC or to Neumann BC (denoted as BC-switch nodes); In general, we can describe the mechanism of the modified MSE loss function in this way: internal mesh nodes and only-Neumann nodes fully contribute to the loss value, only-Dirichlet nodes do not contribute to the loss value, and BC-switch nodes contribute to the loss value is rescaled. Mathematically, for each batch B of input-target pairs, the modified MSE loss function can be described as: MSE BC pBq :“ 1 |B| ÿ pμ,u h qPB ̃ 1 N h N h ÿ i“1 λ i ` pu h q i ́p p u h pμq i ̆ 2 ̧ ,(11) where λ i “ 1 if the mesh node x i is internal or only-Neumann, λ i “ 0 if the mesh node x i is only-Dirichlet, and λ i “ 10 ́1 if the mesh node x i is BC-switch. Concerning the boundary nodes, the idea behind the MSE BC loss is to focus on predictions for the only-Neumann and BC-switch nodes, because Dirichlet BCs are imposed. For this reason, we set the values of λ i following this criterion: we have λ i P p0, 1s for BC-switch nodes (specifically, λ i “ 10 ́1 after preliminary analyses) because these nodes during the inference phase can be either Dirichlet nodes (with value imposed by reading the input) or Neumann nodes (with value predicted by the N). 5. Numerical Results In this section, we test the proposedμBC-GINN architecture in three different settings, comparing the results with aμBC-FCNN with a number of trainable weights of the same order of magnitude. The three settings are: • Experiment 1, where the boundary parameters define only a Neumann type boundary for a linear diffusion PDE; • Experiment 2, where the boundary parameters define a mixed Dirichlet-Neumann type boundary for an advection- diffusion PDE; • Experiment 3, where the boundary parameters define a mixed Dirichlet-Neumann type boundary for the Navier- Stokes Equations; 11 The data needed for the surrogate model training have been obtained through the employment of FEniCS [61] and its dependencies, running the numerical simulations on the node of a remote cluster endowed with up to 60GB of RAM, 1CPU used (Intel Xeon Silver 4110, 2.10GHz). TheμBC-NNs have been developed using the Keras-TensorFlow module of Python [62, 63], and trained on node of a remote cluster endowed with up to 180GB of RAM, 10CPUs used (Intel Xeon E5-2640 v3, 2.60GHz), 1GPU used (GTX980 6G). In what follows, we list the metric used to assess the accuracy and the robustness of the proposed approach based onμBC-GINNs, compared toμBC-FCNNs (representative of a generic N-based approach). For a given solution u h “ u h pμq P R N h of the test set Ξ and theμBC-N prediction p u h “ p u h pμq P R N h , we define the following absolute errors e ℓ 2 pμ, u h q “ u h ́ p u h 2 , e L 2 pμ, u h q “ u h ́ p u h M ,e H 1 pμ, u h q “ u h ́ p u h A , (12) and relative errors re ℓ 2 pμ, u h q “ u h ́ p u h 2 u h 2 , re L 2 pμ, u h q “ u h ́ p u h M u h M ,re H 1 pμ, u h q “ u h ́ p u h A u h A , (13) where M and A represent the mass and stiffness matrices of the P 1 finite element PDE discretization, respectively. Namely, e L 2 and e H 1 provide the numerical approximation of the L 2 -norm and the H 1 -seminorm. Of these quantities, we will consider the mean over the whole test set Ξ; i.e., we consider the mean absolute, or relative, errors: me ℓ 2 pΞq “ 1 |Ξ| ÿ pμ,u h qPΞ e ℓ 2 pμ, u h q,mre ℓ 2 pΞq “ 1 |Ξ| ÿ pμ,u h qPΞ re ℓ 2 pμ, u h q, me L 2 pΞq “ 1 |Ξ| ÿ pμ,u h qPΞ e L 2 pμ, u h q,mre L 2 pΞq “ 1 |Ξ| ÿ pμ,u h qPΞ re L 2 pμ, u h q, me H 1 pΞq “ 1 |Ξ| ÿ pμ,u h qPΞ e H 1 pμ, u h q,mre H 1 pΞq “ 1 |Ξ| ÿ pμ,u h qPΞ re H 1 pμ, u h q. (14) In particular, for each configuration of training options and hyper-parameters, theμBC-NNs models are trained 5 times, varying the random seed, and the statistics of the mean errors (14) are analyzed and reported; this operation is necessary to better understand the robustness of a model with respect to its initialization. Concerning the different configurations considered, they are characterized by a decreasing training set size and validation set size, while the test set is kept fixed (also for maintaining a rigorous comparison of the performances). The training options adopted for all the experiments are the following: • mini-batch size of 16 samples; • early stopping patience of 250 epochs returning the best weights [64], over 10 000 maximum number of epochs; • reduce learning rate on plateaus with patience of 100 epochs and factor 0.5 [65]; • Adam optimizer [66] (starting learning rate 0.001, moment decay rates β 1 “ 0.9, β 2 “ 0.999). A valuable set of architecture hyper-parameters has been identified during preliminary analyses, and they are fixed; we postpone to future work an extensive analysis to identify the best combination of hyper-parameters for each model. Additionally, not only the approximation performance is measured, but also the computational costs of training the models, in terms of average time required by the model for performing a weight-update with respect to a mini-batch of the training set. In the end, we recall the different notation between continuous formulation and discrete formulation of a problem parametrization. The parameterμ P P characterizing the PDE problem in the continuous formulation is such that μ “ pμ φ ,μ b ,μ v q P P, with P “ R p φ ˆ F pwc pΓ μ b q ˆ F pwc pΓ μ b q (see Section 2.1); on the other hand, the parameter μ P P h characterizing the PDE problem in the discrete formulation is such thatμ “ pμ φ ,μ b ,μ v q P P h , with P h “ R p φ ˆ R p b ˆ R p v (see Section 2.2), and it is encoded as illustrated in (7), Section 4, for feeding theμBC-NNs. 12 5.1. Experiment 1 - Neumann linear case (Diffusion) Let us consider Ω “ r0, 1s 2 zB p p0.5, 0.5q, 0.3 q , as depicted in Figure 5a; B p p0.5, 0.5q, 0.3 q denotes the open ball of center p0.5, 0.5q and radius 0.3. Specifically, we represent a domain for a genericμ “ pμ b ,μ v q P P, where μ b defines Γ μ b as the inner circular boundary B p p0.5, 0.5q, 0.3 q and characterizes it into the portions Γ μ b 1 and Γ μ b 0 such that Γ μ b 1 YΓ μ b 0 “ Γ μ b “ B p p0.5, 0.5q, 0.3 q , where we apply the following Neumann BCs via μ v : μ v pΓ μ b 1 q “ μ 1 P p0, 1s, μ v pΓ μ b 0 q “ 0. For the ease of notation, we droppedμ φ inμ because the physical parameters are not parametrized in this experiment. As we stated above, we denote as Γ μ b 1 the portion of Γ μ b with non-homogeneous Neumann BCs of value μ 1 , while homogeneous Neumann BCs are applied to Γ μ b 0 “ Γ μ b zΓ μ b 1 . The external boundary is Γ fix “ Γ N Y Γ D , with Γ D “ p0, 1qˆt0uqYpp0, 1qˆt1uq denoting homogeneous Dirichlet BC, and Γ N “ pt0uˆp0, 1qqYpt1uˆp0, 1q denoting homogeneous Neumann BC. See Figure 5a, for a visual example. In particular, the problem we are solving reads: $ ’ ’ ’ ’ ’ ’ & ’ ’ ’ ’ ’ ’ % ́∆u “ 0in Ω, Bu B n “ μ 1 on Γ μ b 1 , Bu B n “ 0on Γ μ b 0 and on Γ N , u “ 0on Γ D where n is the normal outward vector to BΩ. We now present the rationale behind theμ “ pμ b ,μ v q variability distribution, specifically used for generating the training, validation, and test data. The parameter μ v pΓ μ b 1 q “ μ 1 is randomly generated with uniform distribution between zero and one; i.e., μ 1 „ Up0, 1q. Each μ b is generated by randomly selecting N c P N points on Γ μ b , with N c minimum one and up to five; i.e., N c „ Up1, 5q. Then, we denote these points as tc i u N c i Ă Γ μ b . After that, for each point c i , we randomly generate an interval on Γ μ b that is centered on c i and that will denote a portion of the boundary with non-homogeneous Neumann BC. For generating these intervals, we generate a length l i „ Up0, L c 5q, for each i “ 1,..., N c , where L c is the length of the circumference of Γ μ b ; then, the i-th interval is defined as I i “ rc i ́ l i 2, c i ` l i 2s, given a parametrized representation of the circumference, and the union of all the intervals I i defines Γ μ b 1 . See Figure 5b for a visual example. It is worth noting that this generation procedure can generate overlapping intervals or degenerate intervals of null length (i.e., I i “ tc i u). The above generation procedure for μ b can be easily translated to a discrete representation of the domain, after the creation of the mesh, for generating a vectorμ b . Denoting with x 1 ,..., x M the M points of the mesh on the boundary Γ μ b , the N c points are randomly sampled among them, while the interval lengths are measured in number of interval nodes, i.e.: l i „ Up1,rM5sq. Therefore, in the discrete case (and in practice) the parameterμ b is a vector in t0, 1u M Ă R M (i.e., p b “ M) such that its elements areμ p jq b “ 1 if x j P Γ μ b 1 , andμ p jq b “ 0 otherwise. Remark 5.1 (ROMs for Neumann varying boundary conditions). Projection-based ROMs can be applied in the pro- posed context, but inefficiently. Indeed, the problem at hand is not affine and would require additional hyper-reduction techniques, such as DEIM or EIM, to recover online efficiency. Moreover, as already experimented in [1], DEIM per- forms poorly when the variation is located mostly along the boundary. Furthermore, the complex behaviour related to different locations of non-homogeneous Neumann boundaries needs a large number of basis functions to properly represent the phenomenon, even with clustering strategies and local POD. 5.1.1. Experiment 1 - Results For this first experiment, we discretized the domain Ω with a mesh of N h “ 1141 nodes, p b “ 88 nodes on the boundary Γ μ b , generating a mesh graph G with diameter diampGq “ 54. The dataset is generated by running 5000 simulations through a P 1 finite element discretization on the given mesh; in particular, each simulation is done with respect to a randomly generated parameter vectorμ “ pμ b ,μ v q “ pμ b ,μ 1 , 0q P t0, 1u p b ˆ p0, 1s ˆ t0u. Given these 5000 simulation data, 2048 of them are selected randomly and used as a fixed test set; from the remaining data, we randomly extract an even number T P N of training set data and T2 validation set data. Each model (μBC-GINN andμBC-FCNN) is trained five times, with respect to five different random seeds for weight initialization, varying the amount of training data T “ 1024, 512, 256, 128. 13 Γ N Γ D Γ μ b 1 Γ μ b 0 p0, 0q p0, 1qp1, 1q p1, 0q (a) Γ μ ̊ N N Γ N zΓ μ ̊ N N Γ D Γ μ ̊ N N p0, 0q p0, 1qp1, 1q p1, 0q c 1 ` l 1 2 c 1 ́ l 1 2 c 1 c 2 c 2 ́ l 2 2 c 2 ` l 2 2 (b) Figure 5: Experiment 1. Spatial domain Ω for a specific parametric instance: schematic representation. (A) The homogeneous Dirichlet boundary, the non-homogeneous and homogeneous Neumann boundaries are applied to Γ D (solid line), Γ μ b 1 (dashed line), and Γ μ b 0 Y Γ N (dash-dotted and dotted lines), respectively. (B) Example of selection of circle intervals. The summation only represents the detection of the start and the end node of the intervals. The model performance is evaluated by considering the average errors me ℓ 2 , me L 2 , and me H 1 . These errors, together with information about training epochs and training time, are reported in Table 1 and illustrated in Figure 6. In Figure 7, we report some prediction examples taken from the test set Ξ. We do not report the relative errors for this test case because many solutions have nearly zero norm, making the relative errors meaningless (e.g., see Figure 7, bottom row). Modeln. weightsTme ℓ 2 me L 2 me H 1 batch tr.time (s)tr. epochs μBC- FCNN6.576e+06 1288.593e-011.809e-021.215e+002.166e-02471.4 2567.692e-011.609e-021.103e+001.931e-02622.4 5127.643e-011.605e-021.083e+001.818e-02580.0 10249.520e-012.009e-021.336e+001.747e-02411.0 μBC- GINN1.898e+06 1281.150e-012.325e-031.769e-011.250e+001745.4 2566.973e-021.396e-031.091e-018.090e-011535.8 5123.476e-026.800e-045.643e-026.684e-011462.6 10242.093e-024.040e-043.503e-025.393e-011742.2 Table 1: Experiment 1 (Diffusion) - Average performance of the models, with respect to five random seeds for weight initialization, varying the training set size. Looking at the behaviors and values of the average errors of the trained models, the advantage of using aμBC- GINN model instead of aμBC-FCNN becomes evident. Indeed,μBC-GINN models exhibit better predictive perfor- mance on the test set; moreover, this performance is stable with respect to the random initialization of the weights. In contrast,μBC-FCNNs only rarely achieve slightly better results for some initializations (and still not better than those ofμBC-GINNs). In general, we observe thatμBC-FCNNs suffer from poor generalization properties, even when the size of the training set is increased; indeed, the average value of the errors is almost constant. On the contrary,μBC-GINNs 14 (a) me ℓ 2 (b) me L 2 (c) me H 1 (d) legend Figure 6: Experiment 1 (Diffusion) - Error statistics with respect to five random seeds for weight initialization, varying the training set size. In blue, theμBC-GINN performance, in orange theμBC-FCNN performance. Light colored areas represent the Min-Max range of values, the dark colored areas represent the values between fist and third quartiles. The continuous lines represent the average errors (same as Table 1), the dotted lines represent the medians. exhibit a clear error reduction as the training set size increases and, moreover, they achieve remarkably low error values even when only a few hundred training samples are used. Regarding computational costs, we observe thatμBC-GINNs have approximately one-sixth the number of train- able parameters ofμBC-FCNNs (1.898e+06 weights instead of 6.576e+06, see Section 4.2 for details on the N architectures); however, their training time is longer (approximately one order of magnitude greater), because the GI layers are based on sparse tensor/matrix operations, for a wide larger number of layers (H “ 81 GI layers for μBC-GINNs and H “ 3 FC layers forμBC-FCNNs, see Section 4.2). 5.2. Experiment 2 - Neumann and Dirichlet linear case (Advection-Diffusion) We consider the physical domain Ω as the unit square. In this specific test case, the geometrical parameter affects both the Dirichlet and the Neumann boundaries. Specifically, we fix a boundary Γ fix “ Γ sides Y Γ top , with Γ sides “ pt0u ˆ r0.6, 1sq Y pt1u ˆ r0.6, 1sq and Γ top “ r0, 1s ˆ t1u. On Γ fix , homogeneous Dirichlet boundary conditions are applied. In terms of parameters, we defineμ “ pμ φ ,μ b q “ pμ 1 ,μ 2 ,μ b q, where μ 1 P r10 ́3 , 100s denotes the diffusion parameter, μ 2 P rp ́34qπ,p ́14qπs denotes the angle of the unitary advection vectorα P R 2 with respect to the horizontal axis, and μ b defines Γ μ b :“ BΩzΓ fix and characterizes the portions where homogeneous Dirichlet BCs or homogeneous Neumann BCs hold (denoted as Γ μ b D and Γ μ b N , respectively); for the ease of notation, we dropped the function μ v inμ because it is constantly equal to zero (only homogeneous BCs). See Figure 8a for a visual example of the domain Ω and with BCs defined by Γ fix and Γ μ b , respectively. Then, the problem we are solving is: $ ’ ’ & ’ ’ % ́μ 1 ∆u`αpμ 2 q ̈ ∇u “ 1in Ω, Bu B n “ 0on Γ μ b N , u “ 0on Γ μ b D and on Γ fix , 15 (a) Best case - u h (b) Best case - p u h (c) Best case -|u h ́ p u h | (d) Median case - u h (e) Median case - p u h (f) Median case -|u h ́ p u h | (g) Worst case - u h (h) Worst case - p u h (i) Worst case -|u h ́ p u h | Figure 7: Experiment 1 (Diffusion) - Best, median, and worst prediction cases with respect to the test set Ξ, ranked with respect to mre L 2 . Black, empty circle dots denote the boundary nodes with Neumann BC of value μ 1 P p0, 1s; magenta crosses denote boundary nodes with fixed homogeneous Dirichlet BCs. whereαpμ 2 q “ pcospμ 2 q, sinpμ 2 q. We now present the rationale behind the variability distribution ofμ “ pμ 1 ,μ 2 ,μ b q, specifically used for generating the training, validation, and test data. Each μ 1 and μ 2 is just randomly generated with uniform distribution with respect to the ranges defined above; i.e., μ 1 „ Up10 ́3 , 100q and μ 2 „ Upp ́34qπ,p ́14qπq. Now, for generating μ b , we split Γ μ b into three segments Γ μ b 1 “ t0u ˆ r0, 0.6s, Γ μ b 2 “ t1u ˆ r0, 0.6s, and Γ μ b 3 “ r0, 1s ˆ t0u; then, we randomly sample three points c 1 “ p0, y 1 q, c 2 “ p1, y 2 q, c 3 “ px 3 , 0q on Γ μ b 1 , Γ μ b 2 , Γ μ b 3 , respectively, and we generate three interval lengths l 1 , l 2 , l 3 such that l 1 , l 2 „ Up0.4, 0.6q and l 3 „ Up23, 1q. Therefore, for each sampling ofμ, we 16 define Γ μ b N :“ I 1 Y I 2 Y I 3 and Γ μ b D “ Γ μ b zΓ μ b N , where I 1 “ t0uˆ „ max " 0, y 1 ́ l 1 2 * , min " 0.6, y 1 ` l 1 2 *ȷ , I 2 “ t1uˆ „ max " 0, y 2 ́ l 2 2 * , min " 0.6, y 2 ` l 2 2 *ȷ , I 3 “ „ max " 0, x 3 ́ l 3 2 * , min " 1, x 3 ` l 3 2 *ȷ ˆt0u. The above generation procedure for μ b can be easily translated to a discrete representation of the domain, after the creation of the mesh, for generating a vectorμ b . Denoting with x 1 ,..., x M the M points of the mesh on the boundary Γ μ b , the points c 1 , c 2 , c 3 are randomly sampled among them, while the interval lengths are measured in number of interval nodes. Therefore, in the discrete case (and in practice) the parameterμ b is a vector in t0, 1u M Ă R M (i.e., p b “ M) such that its elements areμ p jq b “ 1 if x j P Γ μ b N , andμ p jq b “ 0 otherwise. See Figure 8b for a visual example. It is worth noting that this generation procedure can generate intervals connected at the corners. Γ μ b N Γ μ b D Γ sides Γ top p0, 0q p0, 1qp1, 1q p1, 0q (a) BΩzΓ μ ̊ D D Γ μ ̊ N D Γ in p0, 0q p0, 1qp1, 1q p1, 0q c 2 c 2 ` l 2 2 c 2 ́ l 2 2 c 3 ` l 3 2 c 3 ́ l 3 2 c 3 c 1 c 1 ́ l 1 2 c 1 ` l 1 2 (b) Figure 8: (Experiments 2 and 3) Spatial domain Ω for a specific parametric instance: schematic representation. (A) Γ sides (dash-dotted line) and Γ top (dashed line) form the fixed boundary Γ fix . The boundary parameterμ b characterizes Γ μ b N (dotted line) and Γ μ b D “ Γ μ b zΓ μ b N (solid line), respectively. (B) Example of selection of circle intervals. The summation only represents the detection of the start and the end node of the intervals. 5.2.1. Experiment 2 - Results For this second experiment, we discretized the domain Ω with a mesh of N h “ 3993 nodes, p b “ 256 nodes on the boundary Γ μ b , generating a mesh graph G with diameter diampGq “ 74. The dataset is generated by running 10 000 simulations through a P 1 finite element discretization on the given mesh; in particular, each simulation is done with respect to a randomly generated parameter vectorμ “ pμ 1 ,μ 2 ,μ b q P r10 ́3 , 100s ˆ rp ́34qπ,p ́14qπs ˆ t0, 1u p b . Given these simulation data, 2048 of them are selected randomly and used as a fixed test set; from the remaining data, we randomly extract an even number T P N of training set data and T2 validation set data. Each model (μBC-GINN andμBC-FCNN) is trained five times, with respect to five different random seeds for weight initialization, varying the amount of training data T “ 1024, 512, 256, 128. The model performance is evaluated by considering the average errors illustrated in (14). These errors, together with information about training epochs and training time, are reported in Table 2 and illustrated in Figure 9. In Figure 10, we report some prediction examples taken from the test set Ξ. Also in this second experiment, looking at the behaviors and values of the average errors, it is evident the advantage of using aμBC-GINN model instead of aμBC-FCNN one. Indeed, also in this case theμBC-GINN models exhibit 17 (a) legend (b) me ℓ 2 (c) mre ℓ 2 (d) me L 2 (e) mre L 2 (f) me H 1 (g) mre H 1 Figure 9: Experiment 2 (Advection-Diffusion) - Error statistics with respect to five random seeds for weight initialization, varying the training set size. In blue, theμBC-GINN performance, in orange theμBC-FCNN performance. Light colored areas represent the Min-Max range of values, the dark colored areas represent the values between fist and third quartiles. The continuous lines represent the average errors (same as Table 2), the dotted lines represent the medians. 18 Modeln. weightsTme ℓ 2 me L 2 me H 1 mre ℓ 2 mre L 2 mre H 1 batch tr.time (s)tr. epochs μBC- FCNN8.017e+07 1282.322e+013.283e-012.689e+017.179e+007.018e+008.385e+002.235e+01431.0 2568.206e+001.065e-011.227e+017.329e-016.991e-019.041e-012.232e+01460.8 5128.180e+001.063e-011.223e+017.251e-016.913e-018.985e-012.234e+01632.8 10248.096e+001.048e-011.219e+017.327e-016.981e-019.092e-012.228e+01628.4 μBC- GINN9.248e+06 1281.439e+001.795e-022.347e+001.812e-011.674e-012.582e-015.835e+001161.2 2561.048e+001.306e-021.711e+001.279e-011.182e-011.800e-014.015e+001152.0 5127.477e-019.286e-031.234e+009.436e-028.607e-021.395e-013.040e+001079.2 10245.458e-016.741e-039.134e-015.121e-024.522e-028.327e-022.557e+001244.4 Table 2: Experiment 2 (Advection-Diffusion) - Average performance of the models, with respect to five random seeds for weight initialization, varying the training set size. better predictive performance on the test set, that are also stable with respect to the random initialization of the weights. In contrast,μBC-FCNNs show poor prediction performance, in all the configurations analyzed. In general, we observe thatμBC-FCNNs suffer from poor generalization properties, with an average value of the errors that is almost constant, independently on the training set size. On the contrary,μBC-GINNs (like in Experiment 1) exhibit a clear error reduction as the training set size increases, achieving low error values even when only a few hundred training samples are used. Regarding computational costs, we observe thatμBC-GINNs have a number of trainable parameters that is one order of magnitude smaller than the number of weights ofμBC-FCNNs (9.248e+06 weights instead of 8.017e+07); moreover, in this experiment we notice a completely opposite behavior with respect to Experiment 1, where the training time ofμBC-GINNs is remarkably shorter than the one ofμBC-FCNNs (seconds instead of tens of seconds per mini-batch). The reason of this opposite behavior for the training times is the larger number of mesh nodes; indeed, in this experiment we have N h “ 3993 nodes, while in Experiment 1 the nodes were 1141. Therefore,μBC- FCNN models requires much more computational resources to manage the training of FC layers made of N h units, because FC layers scale much worse than GI layers (both in number of weights and in training time) with respect to this quantity. 5.3. Experiment 3 - Neumann and Dirichlet nonlinear case (Navier-Stokes) We consider steady and incompressible Navier-Stokes equations for the domain Ω “ r0, 1s 2 with boundary con- ditions defined and parametrized as the domain of the second experiment (see Section 5.2 and Figure 8), except for the different values of the BCs. Specifically, the problem reads as $ ’ ’ ’ ’ ’ ’ ’ ’ ’ & ’ ’ ’ ’ ’ ’ ’ ’ ’ % ́ 1 μ φ ∆u`pu ̈ ∇qu` ∇ p “ 0in Ω, ∇ ̈ u “ 0in Ω, u “ u in on Γ top , u “ 0on Γ sides and on Γ μ b D , ́ pn` 1 μ 1 Bu B n “ 0on Γ μ b N , (15) where μ φ P R denotes the Reynolds number (Re), u and p are the velocity and the pressure fluid fields, respectively, and u in px 1 , x 2 q “ $ ’ ’ ’ & ’ ’ ’ % ́ x 1 0.06 , 0 ̄ for x 1 ď 0.06 in Γ top , ˆ 1 ́ x 1 0.06 , 0 ̇ for x 1 ě 0.94 in Γ top , p1, 0qelsewhere in Γ top . (16) For the ease of notation, we dropped the function μ v inμ because we have homogeneous BCs only. 19 (a) Best case - u h (b) Best case - p u h (c) Best case -|u h ́ p u h | (d) Median case - u h (e) Median case - p u h (f) Median case -|u h ́ p u h | (g) Worst case - u h (h) Worst case - p u h (i) Worst case -|u h ́ p u h | Figure 10: Experiment 2 (Advection-Diffusion) - Best, median, and worst prediction cases with respect to the test set Ξ, ranked with respect to mre L 2 . Magenta dots denote fixed homogeneous Dirichlet BCs, while red dots denote the homogeneous Dirichlet BC defined by the boundary parametrization of the problem. Concerning the rationale behind theμ “ pμ φ ,μ b q variability distribution used for generating the training, vali- dation, and test data, we have that μ b is generated as the μ b of the previous experiment (see Section 5.2), while the parameter μ φ is such that Re is randomly chosen between 100 and 500; i.e., μ φ „ Up100, 500q. 5.3.1. Experiment 3 - Results For this third experiment, we discretized the domain Ω with a mesh of N h “ 1444 nodes, p b “ 144 nodes on the boundary Γ μ b , generating a mesh graph G with diameter diampGq “ 47. The dataset is generated by running 10 000 simulations through a P 2 ́ P 1 finite element discretization on the given mesh; in particular, each simulation is done with respect to a randomly generated parameter vectorμ “ pμ φ ,μ b q P r100, 500sˆt0, 1u p b . Among the 10 000 simulations, we removed the ones that did not reach convergence (due to limit-case scenarios randomly generated), 20 obtaining a total of 9 118 simulations. Given these simulation data, 2048 of them are selected randomly and used as a fixed test set; from the remaining data, we randomly extract an even number T P N of training set data and T2 validation set data. Each model (μBC-GINN andμBC-FCNN) is trained five times, with respect to five different random seeds for weight initialization, varying the amount of training data T “ 1024, 512, 256, 128. The model performance is evaluated by considering the average errors illustrated in (14), computed for the first and second components of the velocity and the pressure field. These errors, together with information about training epochs and training time, are reported in Tables 3-5 and illustrated in Figures 11-13. In Figure 14, we report some prediction examples taken from the test set Ξ. Modeln. weightstr. size (T )me ℓ 2 me L 2 me H 1 mre ℓ 2 mre L 2 mre H 1 batch tr.time (s)tr. epochs μBC- FCNN3.354e+07 1281.309e+002.698e-022.190e+001.333e-011.422e-011.229e-014.711e-01541.4 2561.297e+002.668e-022.180e+001.317e-011.401e-011.221e-014.688e-01470.4 5121.331e+002.727e-022.253e+001.351e-011.431e-011.261e-014.693e-01578.2 10241.348e+002.758e-022.284e+001.369e-011.449e-011.280e-014.655e-01449.6 μBC- GINN2.655e+07 1287.518e-011.565e-021.248e+007.639e-028.232e-026.987e-022.311e+00471.4 2566.625e-011.368e-021.115e+006.727e-027.188e-026.241e-021.884e+00518.4 5125.629e-011.141e-029.688e-015.705e-025.979e-025.418e-021.564e+00599.8 10244.926e-019.875e-038.579e-014.999e-025.181e-024.803e-021.504e+00489.4 Table 3: Experiment 3 (Navier-Stokes) - Average performance of the models, with respect to five random seeds for weight initialization, varying the training set size. Results for the first component of solution u. Modeln. weightstr. size (T )me ℓ 2 me L 2 me H 1 mre ℓ 2 mre L 2 mre H 1 batch tr.time (s)tr. epochs μBC- FCNN3.354e+07 1288.112e-011.587e-021.460e+001.288e-011.350e-011.234e-014.711e-01541.4 2568.101e-011.583e-021.461e+001.284e-011.345e-011.232e-014.688e-01470.4 5128.330e-011.625e-021.504e+001.320e-011.380e-011.269e-014.693e-01578.2 10248.502e-011.658e-021.536e+001.348e-011.409e-011.296e-014.655e-01449.6 μBC- GINN2.655e+07 1283.949e-017.914e-036.878e-016.242e-026.705e-025.788e-022.311e+00471.4 2563.562e-017.180e-036.143e-015.634e-026.088e-025.174e-021.884e+00518.4 5123.093e-016.193e-035.383e-014.889e-025.248e-024.528e-021.564e+00599.8 10242.804e-015.581e-034.910e-014.430e-024.728e-024.128e-021.504e+00489.4 Table 4: Experiment 3 (Navier-Stokes) - Average performance of the models, with respect to five random seeds for weight initialization, varying the training set size. Results for the second component of solution u. In this last, third experiment, the results we obtain are similar to the ones obtained for the first experiment. Specif- ically, we still have the evidence of the advantage in usingμBC-GINNs instead ofμBC-FCNNs, by looking at the behaviors and values of the average errors of the trained models. TheμBC-GINN models have better predictive per- formance on the test set for both u and the pressure p. As in the previous experiments, this performance is stable with respect to the random initialization of the weights. In contrast,μBC-FCNNs only rarely achieve slightly better results for some initializations (and still not better than those ofμBC-GINNs). Like in Experiment 1,μBC-FCNNs suffer from poor generalization properties for all the training set sizes, with an average value of the errors that is almost constant. On the contrary, as already noticed in the previous experiments, μBC-GINNs exhibit a clear error reduction as the training set size increases and low error values even with few hundred training samples. Regarding computational costs, in this caseμBC-GINNs have the same order of magnitude in the number of trainable parameters asμBC-FCNNs, although still slightly fewer (2.655e+07 weights instead of 3.357e+07). On the other hand, the training time ofμBC-GINNs is longer (approximately one order of magnitude greater). This similarity in computational costs between Experiment 1 and Experiment 3 is due to the similar number of mesh nodes N h (1141 21 (a) legend (b) me ℓ 2 (c) mre ℓ 2 (d) me L 2 (e) mre L 2 (f) me H 1 (g) mre H 1 Figure 11: Experiment 3 (Navier-Stokes) - Error statistics with respect to five random seeds for weight initialization, varying the training set size. In blue, theμBC-GINN performance, in orange theμBC-FCNN performance. Light colored areas represent the Min-Max range of values, the dark colored areas represent the values between fist and third quartiles. The continuous lines represent the average errors (same as Table 3), the dotted lines represent the medians. Results for the first component of solution u. 22 (a) legend (b) me ℓ 2 (c) mre ℓ 2 (d) me L 2 (e) mre L 2 (f) me H 1 (g) mre H 1 Figure 12: Experiment 3 (Navier-Stokes) - Error statistics with respect to five random seeds for weight initialization, varying the training set size. In blue, theμBC-GINN performance, in orange theμBC-FCNN performance. Light colored areas represent the Min-Max range of values, the dark colored areas represent the values between fist and third quartiles. The continuous lines represent the average errors (same as Table 4), the dotted lines represent the medians. Results for the second component of solution u. 23 (a) legend (b) me ℓ 2 (c) mre ℓ 2 (d) me L 2 (e) mre L 2 (f) me H 1 (g) mre H 1 Figure 13: Experiment 3 (Navier-Stokes) - Error statistics with respect to five random seeds for weight initialization, varying the training set size. In blue, theμBC-GINN performance, in orange theμBC-FCNN performance. Light colored areas represent the Min-Max range of values, the dark colored areas represent the values between fist and third quartiles. The continuous lines represent the average errors (same as Table 5), the dotted lines represent the medians. Results for the pressure solution p. 24 Modeln. weightstr. size (T )me ℓ 2 me L 2 me H 1 mre ℓ 2 mre L 2 mre H 1 batch tr.time (s)tr. epochs μBC- FCNN 3.354e+07 1285.109e-019.276e-039.968e-011.744e-011.659e-011.881e-014.711e-01541.4 2565.239e-019.512e-031.023e+001.806e-011.710e-011.958e-014.688e-01470.4 5125.545e-011.003e-021.087e+001.919e-011.806e-012.094e-014.693e-01578.2 10245.976e-011.076e-021.179e+002.070e-011.936e-012.278e-014.655e-01449.6 μBC- GINN2.655e+07 1281.899e-013.784e-033.303e-016.741e-026.950e-026.575e-022.311e+00471.4 2561.711e-013.409e-032.981e-016.087e-026.262e-025.959e-021.884e+00518.4 5121.590e-013.146e-032.803e-015.659e-025.774e-025.613e-021.564e+00599.8 10241.489e-012.934e-032.633e-015.294e-025.380e-025.267e-021.504e+00489.4 Table 5: Experiment 3 (Navier-Stokes) - Average performance of the models, with respect to five random seeds for weight initialization, varying the training set size. Results for the pressure solution p. in Experiment 1 and 1444 in Experiment 3). For more details about the scalability of the models, see Section 5.4 below. 5.4. General Comments for Experiment Results Overall, across the three experiments, a clear and consistent trend emerges in favor ofμBC-GINNs overμBC- FCNNs. Concerning the aspects of predictive performance,μBC-GINNs systematically achieve lower average errors on the test set, for all the errors considered. Moreover, their performance is remarkably stable with respect to the random initialization of the weights, indicating a more robust and reliable training process. In contrast,μBC-FCNNs exhibit higher variability across different initializations and only exceptionally attain slightly improved results, which, however, never surpass those obtained withμBC-GINNs. A second key aspect concerns generalization with respect to the training set size. In all experiments,μBC-FCNNs display poor scalability in terms of accuracy: the average error remains almost constant as the number of training samples increases, highlighting limited generalization capabilities. On the contrary,μBC-GINNs consistently show a clear error decay as the training set size grows, together with satisfactory accuracy levels even in low-data regimes (i.e., when only a few hundred training samples are available). This behavior suggests that the graph-instructed architecture is significantly more data-efficient and better suited to capturing the underlying structure of the problem. From a computational perspective, the comparison is more nuanced and strongly influenced by the number of mesh nodes N h . In terms of trainable parameters,μBC-GINNs always require fewer weights thanμBC-FCNNs, with differences ranging from moderate to one order of magnitude, depending on the experiment. However, training times seem not to not follow a uniform trend: for smaller meshes,μBC-GINNs may require longer training times despite having fewer parameters, whereas for larger meshes they become significantly more efficient. Actually, this behavior reflects the different scaling properties of FC layers and GI layers with respect to N h , with the FC layers exhibiting a markedly worse scalability both in parameter count and computational cost. See Figure 15 for better understanding the scalability of the training times. Taken together, these results indicate thatμBC-GINNs provide a more robust, scalable, and data-efficient modeling strategy, particularly in regimes characterized by increasing mesh resolution or limited training data availability. 6. Conclusions In this work, we propose a novel methodological setting based on GINNs to tackle parametric varying PDEs, the μBC-GINNs, where the parameter acts on both the physics of the problem and the imposition of the BCs, surpassing the classical surrogate and reduced models proposed in the literature. We remark that the varying boundary problem has been limitedly investigated, both in the ROM and surrogate model communities. For the first time, an architecture related to the local and sparse information of the mesh discretization is employed, establishing a real-time approach that maps the parametric information to the PDE solution. Our numerical experiments demonstrate that theμBC-GINNs are an accurate and efficient surrogate model for these types of problems, thanks to a thorough validation in several settings based on linear and nonlinear PDEs. 25 (a) Best case - u h (b) Best case - p u h (c) Best case -|u h ́ p u h | (d) Median case - u h (e) Median case - p u h (f) Median case -|u h ́ p u h | (g) Worst case - u h (h) Worst case - p u h (i) Worst case -|u h ́ p u h | Figure 14: Experiment 3 (Navier-Stokes) - Best, median, and worst prediction cases with respect to the test set Ξ, ranked with respect to mre L 2 averaged on the three solutions. Magenta dots denote fixed homogeneous Dirichlet BCs, while red dots denote the homogeneous Dirichlet BC defined by the boundary parametrization of the problem. Moreover, the GINN-based framework demonstrates more robust and accurate results compared to its fully connected counterpart, theμBC-FCNNs. This work represents a first methodological step and its results suggest thatμBC- GINNs are a promising asset for many-query scenarios. A natural step forward in our research is the extension ofμBC-GINNs to more challenging problems where fast- response simulations are needed, such as design and control. Another interesting research direction is related to varying geometries and complex topologies. We believe that this work paves the way for a more sustainable and efficient integration of mathematical models directly in simulation science and its application. 26 Figure 15: Statistics of the average training time per mini-batch (in seconds), varying the experiments (Experiment 1, 2 and 3 are denoted by (D), (AD), and (NS), respectively); i.e., varying the number N h of mesh nodes. In blue the training times ofμBC-GINNs, in orange the training times ofμBC-FCNNs. Acknowledgments This study was carried out within the “20227K44ME - Full and Reduced order modelling of coupled systems: focus on non-matching methods and automatic learning (FaReX)” project – funded by European Union – Next Generation EU within the PRIN 2022 program (D.D. 104 - 02/02/2022 Ministero dell’Universit ` a e della Ricerca). This manuscript reflects only the authors’ views and opinions, and the Ministry cannot be considered responsible for them. Moreover, MS thanks the INdAM-GNCS Project “Metodi numerici efficienti per problemi accoppiati in sistemi complessi” (CUP E53C24001950001). F.D. acknowledges that this study was carried out within the FAIR- Future Artificial Intelligence Research and received funding from the European Union Next-GenerationEU (PIANO NAZIONALE DI RIPRESA E RESILIENZA (PNRR) – MISSIONE 4 COMPONENTE 2, INVESTIMENTO 1.3— D.D. 1555 11/10/2022, PE00000013). This manuscript reflects only the authors’ views and opinions; neither the European Union nor the European Commission can be considered responsible for them. References [1] M. Strazzullo, F. Vicini, POD-Based reduced order methods for optimal control problems governed by parametric partial differential equation with varying boundary control, Applied Mathematics and Computation 457 (2023). doi:10.1016/j.amc.2023.128191. URLhttps://w.scopus.com/inward/record.uri?eid=2-s2.0-85162087807&doi=10.1016%2fj.amc.2023.128191&partnerID=40&md5= a0a369f99a30bc8444e0108221a8258a [2] N. Bic ̧er, T. Engin, H. Yas ̧ar, E. B ̈ uy ̈ ukkaya, A. Aydın, A. Topuz, Design optimization of a shell-and-tube heat exchanger with novel three-zonal baffle by using CFD and Taguchi method, International Journal of Thermal Sciences 155 (2020) 106417. doi:10.1016/j. ijthermalsci.2020.106417. [3] C. Yu, T. Cheng, J. Chen, Z. Ren, M. Zeng, Investigation on thermal-hydraulic performance of parallel-flow shell and tube heat exchanger with a new type of anti-vibration baffle and wire coil using rsm method, International Journal of Thermal Sciences 138 (2019) 351–366. doi:10.1016/j.ijthermalsci.2018.12.035. [4] S. Berrone, C. Canuto, S. Pieraccini, S. Scial ` o, Uncertainty quantification in discrete fracture network models: Stochastic geometry, Water Resources Research 54 (2) (2018) 1338–1352. doi:10.1002/2017WR021163. 27 [5] P. Sandra, Uncertainty quantification analysis in discrete fracture network flow simulations, International Journal on Geomathematics 11 (2020) 1869–2680. doi:10.1007/s13137-020-0148-0. [6] Y. Rubin, Applied Stochastic Hydrogeology, Oxford University Press, 2003. [7] D. M. Tartakovsky, Probabilistic risk analysis in subsurface hydrology, Geophysical Research Letters 34 (5) (2007). [8] M. O. L. Hansen, Aerodynamics of Wind Turbines, Routledge, 2015. [9] T. K. Barlas, G. A. M. van Kuik, State of the art and prospectives of smart rotor control for wind turbines, Progress in Aerospace Sciences 46 (1) (2010) 1–27. [10] B. Blocken, Computational fluid dynamics for urban physics, Building and Environment 91 (2015) 219–245. [11] Q. Chen, Ventilation performance prediction for buildings: A method overview and recent applications, Building and Environment 44 (4) (2009) 848–858. [12] J. S. Hesthaven, G. Rozza, B. Stamm, Certified Reduced Basis Methods for Parametrized Partial Differential Equations, Springer, 2015. [13] A. Quarteroni, A. Manzoni, F. Negri, Reduced Basis Methods for Partial Differential Equations: An Introduction, 1st Edition, La Matematica per Il 3+2, 92, Springer International Publishing, Cham, 2016. doi:10.1007/978-3-319-15431-2. [14] F. Pichi, M. Strazzullo, Deflation-based certified greedy algorithm and adaptivity for bifurcating nonlinear PDEs, Communications in Non- linear Science and Numerical Simulation 149 (2025). doi:10.1016/j.cnsns.2025.108941. [15] G. Rozza, G. Stabile, F. Ballarin, Advanced Reduced Order Methods and Applications in Computational Fluid Dynamics, Society for Indus- trial and Applied Mathematics, Philadelphia, PA, 2022. doi:10.1137/1.9781611977257. [16] L. Saluzzi, M. Strazzullo, Dynamical low-rank approximation strategies for nonlinear feedback control problems, Journal of Numerical Mathematics (2025). doi:10.1515/jnma-2025-0005. [17] M. Strazzullo, Z. Zainib, F. Ballarin, G. Rozza, Reduced Order Methods for Parametrized Non-linear and Time Dependent Optimal Flow Con- trol Problems, Towards Applications in Biomedical and Environmental Sciences, Lecture Notes in Computational Science and Engineering 139 (2021) 841 – 850. doi:10.1007/978-3-030-55874-1_83. [18] S. Berrone, F. Della Santa, A. Mastropietro, S. Pieraccini, F. Vaccarino, Graph-informed neural networks for regressions on graph-structured data, Mathematics 10 (5) (2022) 786. doi:10.3390/math10050786. URL https://doi.org/10.3390%2Fmath10050786 [19] F. Della Santa, A. Mastropietro, S. Pieraccini, F. Vaccarino, Edge-wise graph-instructed neural networks, Journal of Computational Science 85 (2025) 102518. doi:10.1016/j.jocs.2024.102518. URL https://w.sciencedirect.com/science/article/pii/S1877750324003119 [20] F. Della Santa, Sparse implementation of versatile graph-informed layers (2024). arXiv:2403.13781. URL https://arxiv.org/abs/2403.13781 [21] M. Raissi, Z. Wang, M. S. Triantafyllou, G. E. Karniadakis, Deep learning of vortex-induced vibrations, Journal of Fluid Mechanics 861 (2019) 119–137. doi:10.1017/jfm.2018.872. [22] M. Raissi, H. Babaee, P. Givi, Deep learning of turbulent scalar mixing, Physical Review Fluids 4 (12) (2019). doi:10.1103/ PhysRevFluids.4.124501. [23] M. Raissi, A. Yazdani, G. E. Karniadakis, Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations, Science 367 (6481) (2020) 1026–1030. doi:10.1126/science.aaw4741. [24] X. Jin, S. Cai, H. Li, G. E. Karniadakis, NSFnets (Navier-Stokes flow nets): Physics-informed neural networks for the incompressible Navier-Stokes equations, Journal of Computational Physics 426 (2021) 109951. doi:10.1016/j.jcp.2020.109951. [25] T. D. Ryck, A. D. Jagtap, S. Mishra, Error estimates for physics-informed neural networks approximating the Navier–Stokes equations, IMA Journal of Numerical Analysis 44 (1) (2024) 83–119. doi:10.1093/imanum/drac085. [26] B. Bonev, T. Kurth, C. Hundt, J. Pathak, M. Baust, K. Kashinath, A. Anandkumar, Spherical fourier neural operators: Learning stable dynamics on the sphere, Vol. 202, 2023, p. 2806 – 2823. [27] Z. Hao, Z. Wang, H. Su, C. Ying, Y. Dong, S. Liu, Z. Cheng, J. Song, J. Zhu, Gnot: A general neural operator transformer for operator learning, Vol. 202, 2023, p. 12556 – 12569. [28] S. Cao, Choose a Transformer: Fourier or Galerkin, Vol. 30, 2021, p. 24924 – 24940. [29] Y. Khoo, J. Lu, L. Ying, Solving parametric PDE problems with Artificial Neural Networks, European Journal of Applied Mathematics (2017). doi:https://doi.org/10.1017/S0956792520000182. [30] S. Berrone, F. Della Santa, S. Pieraccini, F. Vaccarino, Machine learning for flux regression in discrete fracture networks, GEM - International Journal on Geomathematics 12 (2021) 9. doi:10.1007/s13137-021-00176-0. URL https://doi.org/10.1007/s13137-021-00176-0https://link.springer.com/10.1007/s13137-021-00176-0 [31] C. Millevoi, C. Zoccarato, M. Ferronato, A deep learning-based surrogate model for seismic data assimilation in fault activation modeling, International Journal for Numerical Methods in Engineering 126 (4 2025). doi:10.1002/nme.70040. [32] F. Aglietti, F. Della Santa, A. Piano, V. Aglietti, Gradient-informed neural networks: Embedding prior beliefs for learning in low-data scenarios, Neural Networks (2026) 108681doi:https://doi.org/10.1016/j.neunet.2026.108681. URL https://w.sciencedirect.com/science/article/pii/S0893608026001437 [33] F. Aglietti, A. Piano, F. Della Santa, A. Capra, M. P. Centini, M. Rimondi, F. Millo, A novel prior-informed machine learning model for diesel engine emission estimation, Fuel 407 (2026) 137435. doi:https://doi.org/10.1016/j.fuel.2025.137435. URL https://w.sciencedirect.com/science/article/pii/S0016236125031618 [34] S. Mousavi, S. Mishra, L. D. Lorenzis, Imposing boundary conditions on neural operators via learned function extensions (2026). arXiv: 2602.04923. URL https://arxiv.org/abs/2602.04923 [35] M. Barrault, N. C. Nguyen, Y. Maday, A. T. Patera, An “Empirical Interpolation” Method: Application to Efficient Reduced-Basis Discretization of Partial Differential Equations, Comptes Rendus de l’Acad ́ emie des Sciences - Series I 339 (2004) 667–672. doi: 10.1016/j.crma.2004.08.006. [36] T. Bui-Thanh, M. Damodaran, K. Willcox, Proper orthogonal decomposition extensions for parametric applications in compressible aerody- 28 namics, 2003. doi:10.2514/6.2003-4213. [37] N. Demo, M. Tezzele, G. Rozza, A non-intrusive approach for the reconstruction of POD modal coefficients through active subspaces, Comptes Rendus - Mecanique 347 (11) (2019) 873 – 881. doi:10.1016/j.crme.2019.11.012. [38] J. Hestaven, S. Ubbiali, Non-intrusive reduced order modeling of nonlinear problems using neural networks, Journal of Computational Physics 363 (2018) 55–78. doi:10.1016/j.jcp.2018.02.037. [39] C. Greif, K. Urban, Decay of the Kolmogorov N-width for wave problems, Applied Mathematics Letters 96 (2019) 216–222. doi:10.1016/ j.aml.2019.05.013. [40] D. Amsallem, M. J. Zahr, C. Farhat, Nonlinear model order reduction based on local reduced-order bases, International Journal for Numerical Methods in Engineering (2012). doi:10.1002/nme.4371. [41] P. D ́ ıez, A. Muix ́ ı, S. Zlotnik, A. Garc ́ ıa-Gonz ́ alez, Nonlinear dimensionality reduction for parametric problems: A kernel proper orthogonal decomposition, International Journal for Numerical Methods in Engineering 122 (24) (2021) 7306 – 7327. doi:10.1002/nme.6831. [42] T. Taddei, A registration method for model order reduction: Data compression and geometry reduction, SIAM Journal on Scientific Comput- ing 42 (2) (2020) A997 – A1027. doi:10.1137/19M1271270. [43] J. Reiss, P. Schulze, J. Sesterhenn, V. Mehrmann, The Shifted Proper Orthogonal Decomposition: A Mode Decomposition for Multiple Transport Phenomena, SIAM Journal on Scientific Computing 40 (3) (2018) A1322–A1344. doi:10.1137/17M1140571. [44] N. R. Franco, S. Fresca, F. Tombari, A. Manzoni, Deep learning-based surrogate models for parametrized PDEs: Handling geometric vari- ability through graph neural networks, Chaos 33 (12) (2023). doi:10.1063/5.0170101. [45] S. Fresca, L. Dede’, A. Manzoni, A Comprehensive Deep Learning-Based Approach to Reduced Order Modeling of Nonlinear Time- Dependent Parametrized PDEs, Journal of Scientific Computing 87 (2) (2021). doi:10.1007/s10915-021-01462-7. [46] Q. Hernandez, A. Bad ́ ıas, D. Gonz ́ alez, F. Chinesta, E. Cueto, Deep learning of thermodynamics-aware reduced-order models from data, Computer Methods in Applied Mechanics and Engineering 379 (2021). doi:10.1016/j.cma.2021.113763. [47] K. Lee, K. T. Carlberg, Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders, Journal of Computational Physics 404 (2020). doi:10.1016/j.jcp.2019.108973. [48] F. Romor, G. Stabile, G. Rozza, Non-linear Manifold Reduced-Order Models with Convolutional Autoencoders and Reduced Over- Collocation Method, Journal of Scientific Computing 94 (3) (2023). doi:10.1007/s10915-023-02128-2. [49] F. Pichi, B. Moya, J. S. Hesthaven, A graph convolutional autoencoder approach to model order reduction for parametrized PDEs, Journal of Computational Physics 501 (2024). doi:10.1016/j.jcp.2024.112762. [50] M. Raissi, G. Em.Karniadakis, Machine Learning of Linear Differential Equations using Gaussian Processes, Journal of Computational Physics (2017). doi:10.1016/j.jcp.2017.07.050. [51] S. Wang, X. Yu, P. Perdikaris, When and why PINNs fail to train: A neural tangent kernel perspective, Journal of Computational Physics 449 (2022) 110768. doi:10.1016/j.jcp.2021.110768. [52] N. Demo, M. Strazzullo, G. Rozza, An extended physics informed neural network for preliminary analysis of parametric optimal control problems, Computers & Mathematics with Applications 143 (2023) 383–396. doi:10.1016/j.camwa.2023.05.004. [53] S. Wang, Y. Teng, P. Perdikaris, Understanding and mitigating gradient flow pathologies in physics-informed neural networks, SIAM Journal on Scientific Computing 43 (5) (2021) A3055–A3081. doi:10.1137/20M1318043. [54] N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stuart, A. Anandkumar, Neural Operator: Learning Maps Between Function Spaces With Applications to PDEs, Journal of Machine Learning Research 24 (89) (2023) 1–97. URL http://jmlr.org/papers/v24/21-1524.html [55] N. R. Franco, A. Manzoni, P. Zunino, Mesh-Informed Neural Networks for Operator Learning in Finite Element Spaces, Journal of Scientific Computing 97 (3) (2023). doi:10.1007/s10915-023-02331-1. [56] Q. Hern ́ andez, A. Bad ́ ıas, F. Chinesta, E. Cueto, Thermodynamics-informed graph neural networks, IEEE Transactions on Artificial Intelli- gence 5 (3) (2024) 967–976. doi:10.1109/TAI.2022.3179681. [57] F. D. A. Belbute-Peres, T. Economon, Z. Kolter, Combining differentiable pde solvers and graph neural networks for fluid flow prediction, in: international conference on machine learning, PMLR, 2020, p. 2402–2411. [58] O. M. Morrison, F. Pichi, J. S. Hesthaven, GFN: A graph feedforward network for resolution-invariant reduced operator learning in multifi- delity applications, Computer Methods in Applied Mechanics and Engineering 432 (2024) 117458. doi:https://doi.org/10.1016/j. cma.2024.117458. [59] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016, w.deeplearningbook.org. [60] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2016-Decem (2016) 770–778. arXiv:1512.03385, doi:10.1109/CVPR.2016.90. [61] A. Logg, K. Mardal, G. Wells, Automated Solution of Differential Equations by the Finite Element Method, Springer-Verlag, Berlin, 2012. [62] F. Chollet, et al., Keras, https://keras.io (2015). [63] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Man ́ e, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Vi ́ egas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, TensorFlow: Large-scale machine learning on heterogeneous systems, software available from tensorflow.org (2015). URL https://w.tensorflow.org/ [64] TensorFlow, Early stopping - tensorflow callbacks, (Accessed on October 2023). URL https://w.tensorflow.org/api docs/python/tf/keras/callbacks/EarlyStopping [65] TensorFlow, Reduce learning rate on plateau - tensorflow callbacks, (Accessed on October 2023). URL https://w.tensorflow.org/api docs/python/tf/keras/callbacks/ReduceLROnPlateau [66] D. P. Kingma, J. L. Ba, Adam: A method for stochastic optimization, 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings (2015) 1–15arXiv:1412.6980. 29