Paper deep dive

Automated Grammar-based Algebraic Multigrid Design With Evolutionary Algorithms

Dinesh Parthasarathy, Wayne Mitchell, Arjun Gambhir, Harald Köstler, Ulrich Rüde

Year: 2026Venue: arXiv preprintArea: cs.CEType: PreprintEmbeddings: 69

Abstract

Abstract:Although multigrid is asymptotically optimal for solving many important partial differential equations, its efficiency relies heavily on the careful selection of the individual algorithmic components. In contrast to recent approaches that can optimize certain multigrid components using deep learning techniques, we adopt a complementary strategy, employing evolutionary algorithms to construct efficient multigrid cycles from proven algorithmic building blocks. Here, we will present its application to generate efficient algebraic multigrid methods with so-called \emph{flexible cycling}, that is, level-specific smoothing sequences and non-recursive cycling patterns. The search space with such non-standard cycles is intractable to navigate manually, and is generated using genetic programming (GP) guided by context-free grammars. Numerical experiments with the linear algebra library, \emph{hypre}, demonstrate the potential of these non-standard GP cycles to improve multigrid performance both as a solver and a preconditioner.

PDF

Open source PDF →Open local PDF →

Intelligence

Status: not_run | Model: - | Prompt: - | Confidence: 0%

Entities (0)

No extracted entities yet.

Relation Signals (0)

No relation signals yet.

Cypher Suggestions (0)

No Cypher suggestions yet.

Full Text

68,927 characters extracted from source content.

Expand or collapse full text

AUTOMATED GRAMMAR-BASED ALGEBRAIC MULTIGRID DESIGN WITH EVOLUTIONARY ALGORITHMS DINESH PARTHASARATHY ∗ , WAYNE MITCHELL † , ARJUN GAMBHIR † , HARALD K ̈ OSTLER ∗ ,ANDULRICH R ̈ UDE ∗‡ Abstract.Although multigrid is asymptotically optimal for solving many important partial differential equations, its efficiency relies heavily on the careful selection of the individual algorithmic components. In contrast to recent approaches that can optimize certain multigrid components using deep learning techniques, we adopt a complementary strategy, employing evolutionary algorithms to construct efficient multigrid cycles from proven algorithmic building blocks. Here, we will present its application to generate efficient algebraic multigrid methods with so-calledflexible cycling, that is, level-specific smoothing sequences and non-recursive cycling patterns. The search space with such non-standard cycles is intractable to navigate manually, and is generated using genetic programming (GP) guided by context-free grammars. Numerical experiments with the linear algebra library,hypre, demonstrate the potential of these non-standard GP cycles to improve multigrid performance both as a solver and a preconditioner. Key words.Evolutionary Algorithms, Genetic Programming, Multigrid Methods, Artificial Intelligence, Linear Solvers MSC codes.65M55, 68W50, 35-04 1. Introduction.Solving partial differential equations (PDEs) numerically on modern computers has become ubiquitous in science and engineering. However, de- veloping scientific codes for this purpose demands a highly interdisciplinary effort, requiring experts from physics, numerical analysis, and high-performance computing. Over the past several decades, significant progress in developing such codes has pro- duced a rich software ecosystem for solving PDEs. In this article we will discuss tools to synthesize such codes in the framework of time-dependent PDEs of the form (1.1)u t +N(u)(⃗x,t) =f(⃗x,t) in Ω×[0,T], whereNis a spatial derivative operator.Domain-specific languagesallow such sys- tems to be expressed in a high-levelpen-and-paperform, and can be automatically translated into efficient implementations [1, 12, 52, 39]. Furthermore,multi-physics frameworksallow users to tackle complex, coupled systems of PDEs, providing a vari- ety of space and time discretization choices [10, 3, 7, 11]. For example, a semi-discrete form of (1.1) could be (1.2)M h ̇ U(t) +N h (U)(t) =F h (t) in Ω h ×[0,T], whereM h is the mass matrix andN h represents the discretized differential operator. Considering a special case of linear PDEs,N h (U) =L h U, with an implicit backward Euler time integrator, each time step involves solving a large sparse linear system (M h + ∆tL h ) | z A U (n+1) |z x =M h U (n) + ∆tF h |z b , ∗ Friedrich-Alexander-Universit ̈atErlangen-N ̈urnberg(FAU),Erlangen,Germany(di- nesh.parthasarathy@fau.de, harald.koestler@fau.de, ulrich.ruede@fau.de). † Lawrence Livermore National Laboratory, USA (mitchell82@llnl.gov, arjun@llnl.gov). ‡ CERFACS, Toulouse, France and Department of Applied Mathematics, VSB-Technical Univer- sity of Ostrava, Ostrava, Czech Republic 1 arXiv:2603.17641v1 [cs.CE] 18 Mar 2026 2D. PARTHASARATHY of the form (1.3)Ax=b. In practice the solution is often offloaded to a set of specializedsolver libraries, such as those in [32, 28, 2, 44]. Consequently, designing efficient linear solvers is crucial, as these kernels typically dominate the overall execution time of simulation codes. Although the PDE software stack is often designed around the principle of separa- tion of concerns, spanning physical modeling, algorithm design, and parallelization techniques, in practice these aspects remain deeply intertwined. Superior perfor- mance can, in fact, often only be achieved with an integrated co-design of application and solver [36, 16]. However, it is challenging for application users to design effi- cient, application-specific solvers without a multi-disciplinary background. Algebraic multigrid (AMG) methods have become one of the standard workhorses for solving systems as in (1.3): they can achieve linear complexity with respect to the number of unknowns, and are therefore well suited for large-scale simulations. We provide a concise overview of AMG in subsection 2.1. However, an effective method (i) ex- ploits the algebraic properties of the problem, such as the matrix sparsity pattern, condition number, strength of connections between unknowns; (i) has good stability and convergence properties with the chosen numerical precision; and (i) is easy to parallelize and scale to large problems. Hence, designing efficient AMG methods for a specific problem requires expert knowledge and can be time-intensive. We address the scope of designing efficient AMG methods automatically, tailored to the problem at hand, so that non-specialists can deploy them easily within their simulation codes. As a possible solution, we propose an automated approach, using context-free grammars (CFGs) that encode the syntactic and algorithmic constraints of AMG methods, and guide an evolutionary algorithm to design solvers for a specific problem. This is cast as a program-synthesis task using grammar-guided genetic programming (GP) (see subsection 2.3 for a short primer), where AMG programs are generated not only by tuning parameters but also by altering the algorithmic scheme within the limits imposed by the grammar. In particular, we formulate a grammar that allows i)arbitraryandnon-recursive cycling, i)use of different relaxation schemesat each step along the cycle. This approach relies solely on the solution time and convergence rates of the candidate solvers as fitness without additional gradient information, and thus, unlike deep learning approaches, can be integrated non-intrusively with existing solver frameworks likehypre(Figure 1). We apply this approach to design efficient multigrid cycles for a given AMG setup inhypre, i) as a solver in a stationary 3D anisotropic Poisson problem (sec- tion 4), and i) as a preconditioner for conjugate gradient (CG), in a time-dependent radiation-diffusion/electron-conduction benchmark problem (section 5). The GP- designed AMG methods outperform both the default and a tuned AMG. The GP- generated methods also generalize well across problem sizes (subsection 4.2.2) and changes in the system matrix during time-stepping (subsection 5.2.1). We also show that our generated AMG methods remain compatible with otherhypreinterfaces; more specifically, we testhypre’shybrid solver interface 1 that switches adaptively during the simulation from a cheap diagonal preconditioner to a more expensive AMG preconditioner, based on the solver convergence at the current iterate (sub- section 5.2.3). The main contributions of this work are: 1 https://hypre.readthedocs.io/en/latest/solvers-hybrid.html GRAMMAR-BASED AMG WITH EVOLUTIONARY ALGORITHMS3 Fig. 1: A high-level overview of the proposed automated AMG design workflow: given an input problem, the approach uses grammars, evolutionary algorithms, and high- performance solvers to generate flexible AMG cycles. Nodes in the resulting cycles are color-coded to indicate different smoothing operations. •An evolutionary approach for automated AMG design that integrates non- intrusively with existing linear solver libraries likehypre, where programs are constructed in a bottom-up approach, hard-constrained by AMG rules defined in a CFG, and therefore automatically respecting algorithmic integrity and software compatibility. •Discovery of novel, flexible cycling strategies that outperform default and tuned AMG configurations. 4D. PARTHASARATHY •Demonstration of good weak scalability and robustness to system matrix perturbations for the GP-designed flexible AMG methods. Previous works have explored using deep learning to learn individual multigrid components, including smoothers [33], intergrid operators [30, 41, 35, 34], coarsening schemes [57], and strong threshold parameters [19, 46]. Other approaches attempt to learn the entire multigrid solver, for instance, by modeling it using U-Nets with learnable parameters [31]. We view these works as complementary to ours; while they focus on learning individual multigrid components, our approach constructs efficient AMG cycles from existing and proven algorithmic building blocks. Also, GP operates on a representation of the solver as acomputer program, agnostic to the underlying technology; whether neural-network-based, mesh-based, or both, and thus can co- exist with and complement other learning approaches. The use of CFGs with GP for multigrid design was first introduced by Schmitt et al. [55, 54] for geometric multigrid, and we build on these concepts in our current work for AMG design. 2. Background. 2.1. Algebraic Multigrid (AMG).Iterative methods are well-suited to solve large sparse linear systems of the form in (1.3). At every iterationi, the solution is updated as (2.1)x (i+1) = I−B −1 A | z T x (i) +B −1 b, whereBis an approximation toAresulting in an iteration operatorT, determined by the choice of the underlying method. When the exact solutionx ∗ is unknown, the norm of the residualr (i) =b−Ax (i) can be used as a measure of solution accuracy. The iteration stops when a prescribed stopping criterion, for example,∥r (i) ∥ 2 ≤ε is fulfilled. Multigrid methods are a class of algorithms that apply such iterative schemes on a hierarchy of grids,L,L−1,...2,1,0, withL+ 1 levels ordered from the finest to the coarsest. At a given grid levell, two complementary operations are applied: i)smoothing, or the application of an iteration operatorT=S l that eliminates high-frequency error modes, and i)coarse-grid correction, whereT=C l , and involves resolving lower-frequency errors by transferring information to a coarser grid, solving a coarser system, and using this to improve the fine grid solution. The multigrid operator is given by (2.2)M l =S ν 2 l C l S ν 1 l , whereν 1 ,ν 2 are the number ofpre-smoothingandpost-smoothingsteps. For a two-grid methodC l =I−P l (A l−1 ) −1 R l A l , withP l ,R l denoting the prolongation and restric- tion operators. In a multi-level method, (A l−1 ) −1 is approximated using multigrid M l−1 , and applied recursively across multiple levels. In AMG, the intergrid operators and the coarse-grid operatorsA l at all levels are constructed directly from the linear system during thesetup phase. The resulting multigrid hierarchy is then employed with a suitable smoother and a cycle type (V-, W-, or F-cycles) in thesolve phase. The latter is the focus of this paper. The cycle type, for example, determines the or- der in which information at different grid levels is accessed and exchanged. Standard cycle types are recursive, but in this paper, we explore non-standard ones, referred to asflexible cycles(subsection 3.1). The efficiency of the AMG solve phase for a given system to a specified residual tolerance may be measured by the i) time to solution GRAMMAR-BASED AMG WITH EVOLUTIONARY ALGORITHMS5 T, i) number of AMG iterations,N, or i) the convergence rate,ρ= ∥r (N) ∥ 2 ∥r (0) ∥ 2 1/N . For additional details on AMG concepts please refer to [18, 56, 27]. 2.2. Parallel smoothers inhypre.When solving linear systems on a parallel computer withpprocessors inhypre, the matrix equation (1.3) is partitioned as (2.3)    A 11 ·A 1p . . . . . . . . . A p1 ·A p       x 1 . . . x p    =    b 1 . . . b p    , where each processor stores a single block row. Our GP-designed AMG methods choose from several parallel smoothers of the formB= diag(B 1 ,B 2 ,...,B p ) (as notated in (2.1)). For each blockA k , considering a standard splittingA k =D k − L k −U k , (whereD k is the diagonal and−L k ,−U k are the lower and upper triangular parts ofA k ) a parallelhybridGauss-Seidel forward (GSF) smoother, for example, is defined as (2.4)B ω i ,ω o GSF = 1 ω o diag 1 ω i D 1 −L 1 , 1 ω i D 2 −L 2 , ..., 1 ω i D p −L p , whereω i is the inner relaxation parameter for local updates in each processor, and ω o is the outer relaxation parameter for global Jacobi-like updates at the processor boundaries [63]. Parallel versions of hybrid Gauss-Seidel backward (GSB) and Gauss- Seidel symmetric (GSS) can similarly be constructed, and these hybrid smoothers are later denoted with their iteration operatorsS H GSF , S H GSB , S H GSS . A parallel weighted- Jacobi smoother can be constructed using a single relaxation parameterωacross the entire domain, and is denoted byS Jacobi . Instead of using relaxation param- eters, an alternative way to improve convergence is by adding a diagonal matrix D l 1 = diag(d 11 ,d 22 ,...,d p ) toB, whered l 1 i = P j∈Γ |a ij |and Γ are the sets of columns in the off-diagonal blocks ofA. These are calledl 1 smoothers, compatible with the relaxation schemes mentioned earlier, and the respective iteration operators are denoted byS l 1 GSF , S l 1 GSB , S l 1 GSS , S l 1 Jacobi . Each of theseweighted smoothersandl 1 smoothersis applied lexicographically by default, but can also applyC-F smoothing; that is, smoothing first on the coarse points and then on the fine points [6]. 2.3. Grammar-guided genetic programming (G3P).Evolutionary algo- rithms (EAs) are a class of optimization techniques inspired by the principles of evo- lution by natural selection. Genetic programming (GP) is a subclass of EAs focussed on evolving computer programs. The process begins with an initial population of ρ 0 randomly generated computer programs that incrementally become better over generationst= 0,...,t max by applying a fitness measuref. The populationP t at every generationthasμprograms, with each programx∈ P t =x t 1 ,...,x t μ being evaluated on a fitness measuref(x), over a set of problemsD. This is the most expen- sive step in the optimization, and hence the computation is distributed across MPI processes in a high-performance computing cluster.N pop is the number of MPI pro- cesses for thepopulation-level parallelism, wherein, individuals are distributed across multiple compute nodes, andn p is the number of MPI processes for theprogram-level parallelism, wherein, each program is evaluated in parallel within a compute node. Using the population fitness collectionF t =f t 1 ,...,f t μ , the fittest individuals are se- lected to produce offspring ̃ P t of sizeλby applying a genetic operatorV, which either performs i)crossoverwith probabilityp c : exchanges parts of the program between 6D. PARTHASARATHY Algorithm 2.1Grammar-Guided Genetic Programming (G3P) 1:Input:G,D,E params ,N pop ,n p 2:Output:P t max 3:P 0 =x 0 1 ,...,x 0 ρ 0 ←I(G,ρ 0 )// grammar-based initialization 4:F 0 =f 0 1 ,...,f 0 ρ 0 ←Evaluate(P 0 ,D,N pop ,n p )// measure population fitness 5:fort= 0 tot max −1do 6: ̃ P t =x t 1 ,...,x t λ ←V G (P t ,F t ,p c ,λ)// grammar-constrained offspring 7: ̃ F(t)←Evaluate( ̃ P t ,D,N pop ,n p )// measure offspring fitness 8:P t+1 =x t+1 1 ,...,x t+1 μ ←S(P t , ̃ P t ,F t , ̃ F t ,μ)// select new population 9:F t+1 =Evaluate(P t+1 )// measure population fitness 10:end for 11:ReturnP t max two fit parents, or i)mutationwith probability (1−p c ): introduces changes to an existing program for exploration. From the current populationP t and the offspring ̃ P t the next generation ofμprograms is chosen with a suitable selection mechanism S. This process is repeated iteratively until the last generationt max [51, 9, 8, 49]. However, the initial population and genetic operations could generate invalid solu- tions within the population. When programs are required to conform to a specific structure or a predefined set of rules (for example, AMG programs) such information can be injected into the evolutionary process using CFGs [20]. Such a grammarG provides a formal system to embed a set of pre-defined rules that, when applied, gen- eratevalidcandidate programs. G3P is an extension of traditional GP systems that uses such a grammarG(Algorithm 2.1).E params =μ,λ,ρ 0 ,S,p c ,t max is the set of parameters used for the evolutionary algorithm. G3P uses a population initialiserI that generatesρ 0 programs by applying rules from the grammarG. Furthermore, the genetic operatorVis augmented toV G , wherein crossover and mutation operations are restricted to respect the rules of the grammar and generate valid offspring. Using such a grammarGcan help inject domain knowledge within GP systems. This im- proves search efficiency, helps discover better solutions, and guarantees the validity of generated programs [62, 43], in the sense of: (i)syntactic correctness, ensuring that the programs compile without errors, and (i) adherence toalgorithmic structures based on expert domain knowledge. In this paper, we formulate a grammarGfor generating AMG methods inhypre(Figure 2). However, the numerical convergence of these methods is not guaranteed by the grammar, and instead, optimized by the GP search. 3. Method.LetD=(A i ,b i ) N i=1 be a set of linear systems from a discretized parametric PDE and/or time-stepping sequence in a simulation code. LetHbe a set of AMG algorithms generated from a grammarG.E(h;A,b) is thewall-clock time for solving a system (A,b) using an algorithmhon the target hardware. We want to find the fastest algorithmh ⋆ ∈HforD, which can be formally stated as (3.1)h ⋆ = arg min h∈H X (A,b)∈D E(h;A,b). We exploreHusing G3P with a suitable fitness measure. A naive choice for the fitness would be thewall-clock timeE, but we define a vector-valued fitness F= [T/N,ρ] T and perform a multi-objective optimization. Considering thecost per GRAMMAR-BASED AMG WITH EVOLUTIONARY ALGORITHMS7 iterationT/Nandconvergence rateρas two separate objectives, creates diversity in the population, encourages exploration, and generates better solutions [22, 26]. G3P can be viewed as the following operation (3.2)G3P: ( ̃ D,G,F,E params )−→ P ⋆ ⊂P t max , where, ̃ Dis a set of cheap proxy problem(s) forD, with| ̃ D| ≪ |D|, hence directly impacting the optimization cost (lines 4 and 7 in Algorithm 2.1). From the Pareto front or a set of optimal solutionsP ⋆ ⊂P t max , several algorithms are hand-picked by the user to find the best algorithm ̃ hfor solvingDsuch that (3.3) ̃ h= arg min h∈P ⋆ X (A,b)∈D |E(h;A,b)−E(h ⋆ ;A,b)|= arg min h∈P ⋆ X (A,b)∈D E(h;A,b). 3.1. Flexible AMG cycles.We extend equation (2.2) and define aflexible multigrid operator as (3.4)M flex l = Y j T j l , T j l ∈S j l ,C j l , l >0, representing an arbitrary sequence of smoothing and coarse-grid corrections steps. Here,S j l is any valid smoother,C j l =I−α (j) P l (B j l−1 ) −1 R l A l is the coarse-grid correction step with a scaling factorα, and (B j l−1 ) −1 is the application of a flexible multigridM flex,j l−1 =I−(B j l−1 ) −1 A l−1 to approximate the coarse system (A l−1 ) −1 . This can be extended to multiple levels, to obtain a multi-levelflexiblemultigrid operator. Why is genetic programming required?Let us restrict ourselves to a V-cycle- like topology withflexible smoothing. LetSbe a finite set of admissible smoothers, m=|S|. Letν max be the maximum number of pre-/ post-smoothing steps allowed inM flex l , that is,ν pre l ,ν post l ≤ν max . The number of distinct smoothing sequences at each level isN smooth l = ( P ν max ν=1 m ν ) 2 = ( m ν max +1 −1 m−1 ) 2 ≈m 2·ν max . Thus, the number of AMG cyclesN cycles for a hierarchy withL+ 1 levels is (3.5)N cycles ≈m 2·L·ν max . For example, with,L= 5,ν max = 3,m= 6,N cycles ≈2×10 23 . This is a highly conservative estimate assuming a fixed cycle topology, and is already intractable to design manually by hand. Hence, we use genetic programming to automatically design efficient, flexible AMG cycles. 3.2. Design constraints.Since the search space scales exponentially with the number of grid levelsL(equation (3.5)), we limitflexible cyclingonly for the topN flex levelsL,L−1,...,l std wherel std =L−N flex + 1. A standard V-cycle operatorM V is used for the remaining levels. This can be formally defined as (3.6)M flex l = ( Q j T j l , T j l ∈S j l ,C j l , l > l std , M V l ,0< l≤l std . In addition to reducing the optimization cost, the modified formulation (3.6) also allows us to scale our generated solvers to changes in the number of grid levels, for 8D. PARTHASARATHY example, during weak scaling (subsection 4.2.2) or solving different linear systems in a time-stepping code (subsections 5.2.1 and 5.2.2). We chooseN flex = 5, a set of smoothers (3.7)S=S H GSF ,S H GSB ,S Jacobi |z weighted smoothers ∪S l 1 GSF ,S l 1 GSB ,S l 1 Jacobi | z l 1 smoothers , that can be applied in eitherlexicographicalorC-Fordering, and relaxation pa- rameters for weighted smoothersω i ,ω o ,ωand scaling factors for coarse-grid correction αsampled from0.1,0.15,...,1.9. The V-cycle froml=l std usesS l 1 GSF for pre- smoothing,S l 1 GSB for post-smoothing, and Gaussian elimination atl= 0. The AMG hierarchy is generated using HMIS coarsening strategy [24], an extended long-range interpolation operator [23], and a strength threshold of 0.25. 3.3. Context-free grammar forflexibleAMG.We formulate a context-free grammar (CFG) comprising a list of production rules (Figure 2). Each production rule contains a production symbol⟨.⟩(to the left) with the corresponding expression (to the right). The production rules are applied from the start symbol,⟨G⟩, recursively until the resultant expression contains onlyterminals. The final expression maps to a unique instance of an AMG cycle.⟨s L ⟩,⟨s l std ⟩, and⟨s l ⟩represent astate of approx- imationin the AMG cycle at the finest level, the last flexible level and intermediate levelsl std < l < L, respectively (Fig. 2, left). For example, in order to reach the state, ⟨s L ⟩, one of the following operations must have been applied: (i) any sort of smoothing atl=L, (i) coarse-grid correction froml=L−1, or (i) starting initial guess (cycle terminates). Similarly, for the intermediate state levels,⟨s l ⟩, apart from smoothing or coarse-grid correction,restrictionfrom levell+ 1 is allowed, and at⟨s l std ⟩a standard recursive multigrid is applied. These production rules interweave different grid levels in the AMG hierarchy with valid multigrid operations, generatingflexiblecycle struc- tures. The smoothing choices, smoothing ordering, relaxation weights, and coarse-grid correction scaling factors are specified asterminal symbols, (Figure 2, right), that is, concrete algorithmic choices which cannot be expanded further in the grammar. ⟨G⟩ |=⟨s L ⟩ ⟨s L ⟩ |=l 1 Relax(⟨S l 1 ⟩,⟨R⟩,⟨s L ⟩) HybridRelax(⟨S H ⟩,⟨w i ⟩,⟨w o ⟩,⟨R⟩,⟨s L ⟩) JacobiRelax(S Jacobi ,⟨w⟩,⟨R⟩,⟨s L ⟩) CoarseGridCorrection(⟨α⟩,⟨s L−1 ⟩) InitialGuess ⟨s l std ⟩ |=RestrictAndSolve(M V ,⟨s l std +1 ⟩) ⟨s l ⟩ |=l 1 Relax(⟨S l 1 ⟩,⟨R⟩,⟨s l ⟩) HybridRelax(⟨S H ⟩,⟨w i ⟩,⟨w o ⟩,⟨R⟩,⟨s l ⟩) JacobiRelax(S Jacobi ,⟨w⟩,⟨R⟩,⟨s l ⟩) Restrict(⟨s l+1 ⟩) CoarseGridCorrection(⟨α⟩,⟨s l−1 ⟩) Terminals ⟨S l 1 ⟩ |=S l 1 GSF ,S l 1 GSB ,S l 1 Jacobi ⟨S H ⟩ |=S H GSF ,S H GSB ⟨R⟩ |=Lexicographic C-F smoothing ⟨w i ⟩ |= 0|0.15|...|1.9 ⟨w o ⟩ |= 0|0.15|...|1.9 ⟨w⟩ |= 0|0.15|...|1.9 ⟨α⟩ |= 0|0.15|...|1.9 Fig. 2: Production rules for the generation offlexibleBoomerAMG methods. GRAMMAR-BASED AMG WITH EVOLUTIONARY ALGORITHMS9 Algorithm 3.1Evaluate(P, ̃ D,N pop ,n p ) 1:P= S N pop k=1 P k // split populationPacrossN pop processes 2:F=∅// fitness collection for the population 3:for allk= 1,...,N pop in parallel do 4:for allx∈P k do 5:θ←GetHypreArgs(x)// convert genotype to phenotype 6:(T,N,ρ)← 1 | ̃ D| P (A,b)∈ ̃ D RunHypre(θ,A,b,n p ) 7:f x = T N ,ρ T 8:F ←F ∪f x 9:end for 10:end for 11:ReturnF 3.4. Software.The CFG has been implemented usingEvoStencils 2 , a Python package for the automated design of multigrid methods. Additional interfaces have been added and implemented inhypreto incorporateflexible cyclingin BoomerAMG, the AMG implementation inhypre 3 . The CFG generates AMG expressions, which are then transformed into corresponding BoomerAMG inputsθfor these interfaces. The linear systems are imported via theLinear-Algebraic System Interfaceinhypre 4 and solved with BoomerAMG argumentsθ(Algorithm 3.1). Evolutionary algorithms from the DEAP library [29] are used, and thepopulation-level parallelismN pop is implemented using Python bindings for MPI 5 (Figure 3). To eliminate redundant AMG setup times, a batch of individuals is lumped within a single program with a common AMG setup. The evolutionary parametersE params are set toμ,λ= 256,ρ 0 = 2048, the NSGA-I algorithm [25] for selectionS,p c = 0.7,t max = 100. Fig. 3: An overview of the software setup for the automated AMG design. 2 https://github.com/jonas-schmitt/evostencils 3 https://github.com/dinesh-parthasarathy/hypre 4 https://hypre.readthedocs.io/en/latest/ch-ij.html 5 https://mpi4py.readthedocs.io/en/stable/ 10D. PARTHASARATHY 4. Automated AMG solver design for a 3D anisotropic Poisson prob- lem.First, we use our methodology to design AMG solvers for the anisotropic Poisson problem on a unit cube domain: (4.1) −∇·(D∇u) =fin Ω = (0,1) 3 , withu= 0 on∂Ω, whereD= diag(c 1 ,c 2 ,c 3 ), c 1 ,c 2 ,c 3 >0. The problem is discretized on an equally spaced 3D mesh havingN 3 d unknowns. For each PDE formulation withφ= (c 1 ,c 2 ,c 3 ,N d ), the matrixA(φ)∈R N 3 d ×N 3 d is obtained using a finite-difference scheme with a standard 7-point Laplacian stencil, leading to a linear system of the form (1.3). The resulting matrix is sparse, symmetric positive definite (SPD), and has a banded structure with at most 7 nonzero entries per row. From (4.1), we are interested in the problem set (4.2) D= A(φ) φ= (c 1 ,c 2 ,c 3 ,N d ),(c 1 ,c 2 ,c 3 )∈[10 −5 ,1] 3 , N d ∈100,101...,1000 , for automated AMG design. We setf= 0, use a random initial guess, and run the AMG solver forNiterations until∥r (N) ∥ 2 ≤ε= 10 −8 . AMG is designed with G3P forDusing a proxy ̃ D=A(φ proxy ) :φ proxy = (0.001,1,1,100)solved overn p = 8 MPI processes (See line 6, Algorithm 3.1). 4.1. Reference solvers.To benchmark the optimization results, we use a set of reference AMG solvers tuned for ̃ D. Only V-cycles are considered, as W-cycles with the default parameters were found to be not as competitive. For a V-cycle, all possible choices of pre-/post-smoothers, that is,S pre ∈ S ∪S H GSS , S l 1 GSS , S post ∈ S ∪S H GSS , S l 1 GSS (refer (3.7)), were evaluated. Amongweighted smoothers, the relax- ation weights,ω i ,ω o ,ω, in symmetric smoothers were obtained by applying Lanczos or CG iterations as presented in [63], while an exhaustive search was performed for the non-symmetric variants. They were further fine-tuned for different relaxation or- dering; C-F and lexicographic relaxation, and for different pre-/post-smoothing steps, ν 1 , ν 2 . The resulting six fastest AMG solvers alongside the default BoomerAMG configuration, are selected as reference solvers (Table 1). solverorderingS pre ν 1 S post ν 2 w i w o defaultlexicographicS l 1 GSF 1S l 1 GSB 1n/an/a tuned 1C-FS l 1 GSF 1S l 1 GSF 1n/an/a tuned 2C-FS H GSF 1S H GSF 11.10.9 tuned 3lexicographicS l 1 GSF 1S l 1 GSF 1n/an/a tuned 4C-FS l 1 Jacobi 1S l 1 Jacobi 1n/an/a tuned 5lexicographicS H GSF 1S H GSF 11.10.9 tuned 6lexicographicS l 1 GSF 2S l 1 GSB 1n/an/a Table 1: BoomerAMG parameters for default and tuned reference AMG solvers. 4.2. Performance Evaluation and Generalizability.The optimization starts with a random collection of AMG solvers generated from the grammar rules, GRAMMAR-BASED AMG WITH EVOLUTIONARY ALGORITHMS11 Fig. 4: Evolution of AMG individuals with respect tocost per iterationandcon- vergence rate, progressing from an initial random population to the final generation (clockwise from top left:initial population,generation 1,generation 10,generation 100). The dots represent the individuals, and are colored separately, representing each of the five independent runs of the G3P algorithm. and many of them diverge or converge slowly (top-left, Fig. 4). Applying genetic op- erators (crossover, mutation) and downselecting for the most fit individuals to this ini- tial population produces a collection of cheaper AMG solvers with faster convergence (top-right, Fig. 4). Repeating this iteratively evolves a population of solvers aligned/- clustered close to the Pareto front. This optimization is independently repeated five times, each corresponding to different point colorings, demonstrating reproducibil- ity (Fig. 4). Six GP solvers are picked from the final population (Figure 5) and evaluated for generalization to the problem setDby picking problems with different anisotropies (subsection 4.2.1) and different number of unknowns (subsection 4.2.2), as compared to the proxy ̃ D. Finally, we do a more detailed comparison between a flexibleGP-AMG cycle and a recursiveV(1,1) cycle by examining cycle complexity and convergence properties (subsection 4.2.3). 4.2.1. Generalization to different anisotropies.FromD, the problem size is fixed toN d = 100, and the anisotropy coefficients,c 1 ,c 2 ,c 3 are varied. We choose six problems, one isotropic, and the others with anisotropies of different strengths and direction, a subset of the complete evaluation set, but representative of different variants inDwithN d = 100. The solve timeTand the number of iterationsNfor each 12D. PARTHASARATHY (a) GP-22 (b) GP-27(c) GP-28 (d) GP-31(e) GP-33(f) GP-36 Fig. 5: AMG cycling structure for the selected GP solvers, excluding the depiction of weighting parametersω i ,ω 0 ,ω, andα. configurationdefaulttuned 1tuned 2tuned 3tuned 4tuned 5tuned 6 c 1 c 2 c 3 T(ms)NT(ms)NT(ms)NT(ms)NT(ms)NT(ms)NT(ms)N 11138122375193511734921403163241837918 111e-33991927611289112791429592801325810 111e-53981927511288112781429892801325610 11e-31e-336521190919492191317872361323611 11e-31e-536521189919392191317872351323611 11e-51e-536521191919492191317872321321510 configurationGP-22GP-27GP-28GP-31GP-33GP-36speedup c 1 c 2 c 3 T(ms)NT(ms)NT(ms)NT(ms)NT(ms)NT(ms)Nη 1 η 2 1113221334016270142081224516259211.831.56 111e-3236824410253112101122313237171.901.23 111e-5236824410252112101122213238171.901.22 11e-31e-32289233112371221412, 24616245191.710.83 11e-31e-5228923311236122141224416257201.710.83 11e-51e-5201823311235122141224316257201.820.89 Table 2: Performance of reference solvers (top) and GP solvers (bottom) for different anisotropy configurations from (4.1) with 10 6 unknowns. GRAMMAR-BASED AMG WITH EVOLUTIONARY ALGORITHMS13 configuration are evaluated for the reference and GP solvers. The speedup columns, η 1 andη 2 , measure the speedup of the fastest GP solver relative to the default solver and the fastest tuned solver, respectively (Table 2). GP solvers outperform the default solver in all configurations in both the solve time and iteration count. Compared to the tuned solvers, GP solvers are superior for isotropic problems, and for problems with weak couplings in one direction. For problems with weak couplings in multiple directions, some of the tuned solvers (tuned 1, 2, 4) are faster. This is possibly due to the choice of the proxy ̃ Dwhich has weak couplings only in one direction. defaulttuned 1tuned 2tuned 3tuned 4tuned 5tuned 6 N d T(ms)NT(ms)NT(ms)NT(ms)NT(ms)NT(ms)NT(ms)N 1004021925911275112761428692781325810 20048520336123241134215387103171333011 30053021380133681235215448113501440613 40057222460154521439516496123611444814 50059122573174981541917550134041548915 60059322623185671642417607144341652416 70061122610185751648719666154911856717 80064023675196211752420703155351961218 90065323726207111956221743165792065419 100065723772217201962722837175902169820 GP-22GP-27GP-28GP-31GP-33GP-36speedup N d T(ms)NT(ms)NT(ms)NT(ms)NT(ms)NT(ms)Nη 1 η 2 100230824410247112341222313224161.801.16 200305931411285112751227914297181.761.15 3003501035312323123071331015307181.721.14 4004311239213336123181332015337191.801.14 5004791343914346123271333415353191.811.24 6005321448515379133611436816371191.641.18 7005741552216419143661437916386191.671.33 8006261656017436143781438216405201.691.39 9006721759718465153881441017422201.681.45 100011861864319508163851440917416201.701.53 Table 3: Weak scaling results of the anisotropic Poisson problem, from the initial problem ̃ Dwith 10 6 unknowns scaled to 10 9 unknowns, for reference solvers (top) and GP solvers (bottom). 4.2.2. Weak scaling.Now, we fix the anisotropy coefficients as set in the proxy ̃ D, and varyN d to generate more problems fromD. We perform a weak scaling study with processor countsn p =k 3 , k∈ 2,4,6,8,10,12,14,16,18,20,and choose the global problem size withN d = 50k. The processor topology is constrained to a cubic domain. The scaling experiment is conducted on the Ruby compute cluster at Lawrence Livermore National Laboratory (LLNL) 6 with MPI parallelism and the results are summarized in Table 3. Except GP-22 (forN d = 900,1000), all GP solvers outperform the default solver across all problem sizes. GP-31 is the fastest solver among all GP and reference solvers, and scales the best algorithmically (12 iterations to 14 iterations). Hence, the speedup (η 2 ) of GP-31 relative to the fastest tuned solver (tuned 5) improves with an increase in the problem size. For the largest problem with 10 9 unknowns, GP-28, -31, -33, -36, are the four fastest solvers. 6 https://hpc.llnl.gov/hardware/compute-platforms/ruby—decommissioned 14D. PARTHASARATHY 4.2.3. Comparison of aflexibleAMG cycle with a standard V-cycle. G3P generates efficient AMG methods based on two fitness measures:cost per itera- tionT/Nandconvergence rateρ. Here, we derive analytical estimates for both, and compare generatedflexibleAMG cycles with standardV(1,1) cycles. Cost per iteration.LetW(R) be the computational cost of one iteration of a methodRas given in (2.1). We will measure cost in work units (WUs) and define 1 WU :=W(A L ), that is, one application of the fine-grid operatorA L . Let nnz(A) denote the number of nonzeros of the matrixA, hence, the cost of applyingA l is approximated by nnz(A l )/nnz(A L ) WUs.For the smoothers considered in this article, one relaxation sweep can be assumed to be equivalent to one application ofA l . Thus, the cost of one multigrid cycleMis approximated by W(M)≈ L X l=1 ν l nnz(A l ) nnz(A L ) WUs, where,ν l denotes the number of smoothing steps on levell. This can be further sim- plified for our proxy problem ̃ D, solved using BoomerAMG with the setup parameters stated in subsection 3.2, as W(M)≈ν 1 (1.2×10 −5 ) +ν 2 (9.5×10 −5 ) +·+ν 8 (1.9) +ν 9 (1.0) WUs. Substituting the smoothing parameters gives (4.3)W(M GP-31 )≈6.7 WUs,W(M V )≈8.5 WUs. The cost reduction in GP-31 is obtained by eliminating smoothing on the most ex- pensive levell= 8 withν 8 = 0, and reallocating it to the finest level withν 9 = 4 (see Figure 5d), whereas the standardV(1,1) cycle usesν 8 =ν 9 = 2 maintaining a recursive structure. Convergence rate.The eigenvalues of the iteration matrices for GP-31, and stan- dardV(1,1) cycles with the tuned (‘tuned-1’) and default configurations are plotted on a smaller problem with 10 3 unknowns, but problem properties identical to ̃ D. We measure the spectral radius as a measure of theasymptotic convergence, that is, the worst error reduction over many iterations. The right side of Figure 6 indicates that GP-31 has better asymptotic convergence properties compared to standardV(1,1) cycles, despite a lowercost per iteration, as shown in (4.3). This aligns qualitatively with the numerical results on ̃ D(Figure 6, left). 5. Automated design of an AMG preconditioner for a multiphysics problem.Next, we employ AMG within a problem from the ARES multiphysics code at LLNL [21, 13, 47]. Solving the radiation diffusion equation within ARES is often computationally expensive, and its general form in relation to (1.1) can be written as (5.1)E t + (cdκT)E−∇·(D(E)∇E) =fin Ω×[0,T]. Here,Eis the radiation energy per unit volume,cis the speed of light,dis the density, κis the opacity,Tis the matter temperature, andDis the diffusion coefficient. This general form is non-linear due to the dependency of the diffusion coefficientD(E) GRAMMAR-BASED AMG WITH EVOLUTIONARY ALGORITHMS15 51015 Iterations 0.0 0.1 0.2 0.3 Convergence 0.10.00.1 Eigenvalue (imag.) 0.1 0.0 0.1 0.2 Eigenvalue (real) V(1,1) default V(1,1) tuned GP 31 Fig. 6: Convergence plots of GP-31, defaultV(1,1), and ‘tuned-1’V(1,1) AMG solver on problem ̃ D(left), and the eigenvalue distribution of the respective iteration matrices on a smaller 10 3 variant (right). on the radiation energy densityEfrom flux limiting. After making a set of spatial discretization choices and treatingDas fixed, (5.1) becomes linear, and we arrive at a semi-discrete form overNcontrol-volume cells as (5.2)M h E(t) +K h E+C h E=F h in Ω h ×[0,T], withM h :=diag(V 1 ,V 2 ,...,V N ),K h E≈ ∇ h ·(D∇ h E), andC h := diag ((cd 1 κ 1 T 1 ),...,(cd N κ N T N )), whereV i is the control-volume cell measure as- sociated with degree of freedom (DOF)i. By employing the backward Euler method as the implicit time integration scheme, one solves a sparse linear system at each step, of the form (5.3)(M h + ∆t(K h +C h )) |z A E (n+1) | z x =M h E (n) + ∆tF (n) h | z b . We now employ this discretized system on an application problem that provides a stress test for the underlying linear solver, a benchmark dubbed thezrad3Dprob- lem. The equations being solved in this problem are the radiation diffusion and electron conduction equations, the latter discretized in a similar fashion to the ra- diation diffusion equation. The problem involves radiation flow through a 3D block withN 3 d unknowns. Although topologically simple, this problem is set up with a large difference between radiation and matter temperatures, and with an apprecia- ble temperature gradient across the block (Figure 7). This results in SPD matri- ces with strong connections between unknowns corresponding to off-diagonal matrix entries, resulting in ill-conditioned systems that can cause a diagonally scaled con- jugate gradient (CG) method to converge very slowly, even for a relatively small problem size. The solver and physics packages stress tested in thezrad3Dprob- lem are identically deployed in inertial confinement fusion (ICF) simulations, there- fore, results regarding the efficacy of an AMG preconditioner inzrad3Dmay help inform the solver strategy and performance of production ICF calculations. Hence, we design an efficient AMG preconditioner forzrad3Dusing G3P for linear systems 16D. PARTHASARATHY (a) Initial radiation temperature of zrad3D problem (in units of KeV). (b) Initial matter temperature of zrad3D problem (in units of KeV). Fig. 7: Initial conditions of thezrad3Dproblem. A source is applied to the left side of the block as it is pictured above. As the problem evolves, radiation flows through the block, and the matter and radiation fields come into equilibrium with each other. D= n A (k) 40 k=1 A (k) ∈R N 3 d ×N 3 d , N d ∈100,101,...,400 o from the first 40 time steps of the simulation, alternating between (5.3) for radiation diffusion whenk is oddand its corresponding discretized form for electron conduction whenk is even. We choose a right-hand side (RHS) with random entries, to avoid the special case where the RHS lies in a low-dimensional subspace of the eigenmodes, potentially leading to artificial early Krylov convergence. The preconditioner is applied once per CG iteration, and we solve to a relative residual tolerance ofε= 10 −6 . A proxy ̃ D=A (1) |A (1) ∈R 100×100 , being the most ill-conditioned system inDfor the smallest problemN d = 100, is used for fitness computation in G3P and solved across n p = 8 MPI processes. We measuretimeTfor thepreconditioned CG solve(exclud- ing AMG setup time),number of CG iterationsN, preconditioned CG convergence rateρto solve ̃ D; for each individual AMG preconditionerx, and assign its fitness f x = [T/N,ρ] T . 5.1. Reference preconditioners.Similar to choosing competitive reference solvers for the anistropic Poisson problem as seen earlier in subsection 4.1, AMG preconditioners are tuned for the proxyzrad3Dproblem ̃ D. AV(1,1) AMG precon- ditioner with al 1 Jacobi smoother is found to be the fastest preconditioned conjugate gradient (PCG) solver. This PCG solver will be henceforth called ‘tuned’; and the PCG solver with the default AMG preconditioner (listed as default in Table 1) is called ‘default’. 5.2. Performance Evaluation and Generalizability.The PCG programs with the CFG generated AMG preconditioners are evolved for 100 generations, re- sulting in a final population as shown in Figure 8. 8 GP-PCG programs, denoted by green squares, are picked from different regions of the final Pareto front. The default GRAMMAR-BASED AMG WITH EVOLUTIONARY ALGORITHMS17 convergence rate cost per iteration (ms) SolverT(ms)N GP-572145 GP-772106 GP-872307 GP-952308 GP-1042449 GP-10826310 GP-11430512 GP-13842121 tuned38713 default44810 Fig. 8: (Left) Final population of evolved PCG programs, shown as blue dots, with the resulting Pareto front highlighted in red line. (Right) Comparison of solve time and CG iterations between the selected GP-PCG programs and the reference PCG configurations for the proxy ̃ D. and the tuned PCG programs are represented by a brown cross, and their positions lie outside the Pareto front, implying their suboptimal performance compared to the generated GP-PCG programs. The performance of the picked GP-PCG programs with the reference programs, on the proxy problem is also listed in Figure 8 (on the right). The fastest GP-PCG programs achieve speedups of approximately 2.1×and 1.8×compared to the default and tuned PCG programs, respectively. Furthermore, Fig. 9: Visual representation of the GP-57 AMG preconditioner cycle. these programs are evaluated for generalizability to the problem setD, in subsec- tions 5.2.1 and 5.2.2. We also show that our GP solvers with novel AMGflexible cycling can be used with other wrappers inhypre(subsection 5.2.3), demonstrating its compatibility as a valid building block in the larger software framework ofhypre. 5.2.1. Generalization to different time-steps.The selected GP-PCG pro- grams, generated based on the proxy ̃ D, are now evaluated for matrices from the first 40 time steps of thezrad3Dproblem with 10 6 DOFs. The solve time and number of CG iterations are averaged over these 40 time steps for GP-PCG solvers and the 18D. PARTHASARATHY reference solvers (Figure 10). The solvers are ordered from fastest to slowest. As shown, 6 out of the 8 selected GP-PCG programs outperform both reference solvers, while 7 outperform the default solver. On average, the fastest GP-PCG program, GP-57(Figure 9), achieves a speedup of approximately 1.54×and 1.42×over the default and tuned solvers, respectively. Also, GP-57 shows a narrower distribution of solve times and CG iteration counts throughout the simulation, compared to the reference solvers, highlighting its robustness, despite being optimized only on a single linear system in ̃ D. 0200400 Avg. solve time per time step (ms) GP-138 default GP-114 tuned GP-77 GP-104 GP-108 GP-95 GP-87 GP-57 0369121518212427 Avg. CG iterations per time step MinMax Range Fig. 10: Performance of GP-PCG solvers compared to reference solvers, measured over the first 40 time steps of thezrad3Dproblem with 10 6 DOFs. 5.2.2. Weak scaling.We now assess the performance of GP-57, tuned, and default solvers on the full problem setD, by performing a weak scaling study with N d ∈ 100,200,400evaluated over the first 40 time steps of thezrad3Dproblem. The weak scaling study is done on a different CPU platform,RZHound 7 at LLNL, demonstrating solver portability. The local problem is fixed to 25,000 DOFs per MPI process, and uniformly refined in all three directions. The weak scalability results are summarized in Table 4 with their corresponding total solve times and total CG iterations over the first 40 time steps. GP-57 achieves faster solve times with fewer CG iterations compared to both default and tuned solvers. Also, the net speedup of GP-57,η 1 andη 2 , relative to the default and tuned solvers, respectively, remains consistent and generalizes well to larger problem sizes. 5.2.3. Compatibility of GP solvers with other interfaces inhypre.To assess how strongly the system matrixAwith entriesa ij is dominated by its diagonal, 7 https://hpc.llnl.gov/hardware/compute-platforms/rzhound GRAMMAR-BASED AMG WITH EVOLUTIONARY ALGORITHMS19 defaulttunedGP-57speedup DOFsT(s)NT(s)NT(s)Nη 1 η 2 1M9.02678.83735.41721.661.63 8M12.432311.84337.72101.601.53 64M15.238714.75199.22401.661.60 Table 4: Weak scaling results of default, tuned and GP preconditioned CG solvers for thezrad3Dproblem evaluated over the first 40 time steps on theRZHoundsystem. we define a quantity, η(A) := N 3 d X i=1 |a i | N 3 d X i=1 N 3 d X j=1 j̸=i |a ij | , which measures the relative contribution of the diagonal entries compared to the off- diagonals.η(A (k) ), whenk is even, that is, for the electron conduction equation, increases during the simulation (Figure 11, left) and hence, using a simple diagonal preconditioner is sufficient for a time step k whenη(A (k) ) is sufficiently large, (Fig- ure 11, right). This helps avoid expensive AMG setup costs. However, an AMG is still required for solving the radiation diffusion problem, and for the earlier part of the electron conduction problem. Thehybrid solverinterface in hypre 8 is designed for such cases by automatically switching from a simple diagonal scaling to a more expensive AMG preconditioner based on a user-defined convergence threshold, that is, if the convergence rate of CG with diagonal scaling degrades beyond this prescribed threshold, the solver sets up an AMG preconditioner; but otherwise continues using a diagonal preconditioner. Since we use GP guided by a CFG that embeds the al- gorithmic components of BoomerAMG (Figure 2), the compatibility of the generated GP solvers with other interfaces in hypre is preserved. Hence, we can directly plug in our generated GP solvers to thehybrid solverinterface for our simulation, as shown, in Figure 12. GP-57 is used to accelerate CG convergence beyond simple diagonal scaling, hence avoiding excessively large iteration counts. For the electron conduction problem (Figure 12, right), the use of AMG preconditioners is reduced as the system’s conditioning improves, and beyond the 22 nd time step, AMG is no longer required, thereby saving expensive setup costs. 6. Cost of AMG evolution.The generation of AMG methods for problems in section 4 and section 5 requires approximately 29 min and 55 min, respectively, when distributed over 32 nodes across 64 MPI processes. This corresponds to a total computational cost of 15.5 and 29.3 node-hours, respectively. Evolving a single generation of AMG solvers in section 4 takes 13–24 s, while evolving a single generation of AMG preconditioners for the time-stepping simulation in section 5 requires 23– 49 s. It is worth reiterating that our approach works entirely without prior data or labeled examples of well-performing solvers, in contrast to many supervised learning techniques that depend on well-curated training datasets. Hence, in essence, the 8 https://hypre.readthedocs.io/en/latest/solvers-hybrid.html 20D. PARTHASARATHY 010203040 Time step 10 0 10 1 10 2 10 3 | A i | | A ij |( i j ) Radiation Diffusion Electron Conduction 02040 Time step 0 100 200 300 CG Iterations Electron Conduction Fig. 11: (Left) The relative contribution of diagonal versus off-diagonal elements in the system matrix throughout the simulation. (Right) CG convergence of the electron conduction problem with a simple diagonal preconditioner. 010203040 Time step 0 3 6 9 12 15 18 21 CG Iterations 010203040 Time step DiagonalGP-57 Fig. 12: CG convergence on the radiation diffusion (left) and electron conduction (right) systems across the simulation, using thehybrid solverinterface with the GP- 57 preconditioner, and a convergence threshold set to 0.65. intent behind our approach is to offload the manual effort and cost associated with designing application-specific solvers onto computational resources. 7. Conclusions.We have demonstrated an automated approach to design AMG methods that can be integrated non-intrusively with existing solver frameworks such ashypre. By formulating a CFG that encodes the algorithmic and syntactic con- straints ofBoomerAMG, the AMG implementation inhypre, we generate novelflexi- blemultigrid cycles for a fixed AMG setup using G3P. TheseflexibleAMG methods are generated for a stationary test problem (section 4) and a time-stepping simulation code (section 5). In both cases, our approach generatesflexibleAMG methods that outperform not only the default AMG method but also hand-tuned AMG V-cycles, with faster solve times and improved convergence rate. Furthermore, these methods generalize well, maintaining their superior performance across different variants of the stationary problem (subsection 4.2.1), under system matrix perturbations during GRAMMAR-BASED AMG WITH EVOLUTIONARY ALGORITHMS21 time-stepping (subsection 5.2.1), when scaled to larger problem sizes (subsections 4.2.2 and 5.2.2), and also when embedded within otherhypreinterfaces (subsection 5.2.3). The problems chosen are diffusion-dominated, where AMG is well understood and performs strongly, and this choice was deliberate for rigorous benchmarking. Our intent was to test whether a set of automatically evolved AMG programs, starting from a random initial population, could remain competitive with highly optimized multigrid baselines. Remarkably, not only do we remain competitive, but we are also able to surpass them, possibly attributable to the added cycling flexibility, a feature that is intractable to tune by hand. A natural extension of our approach is to customize the grammar to include the setup parameters, thereby generating complete AMG methods. Also, there is potential to leverage our framework to design AMG methods for modern GPU clusters since non- standard cycle types likeκ-cycles have already been proven to be beneficial for such platforms [5]. Adding more AMG ingredients can bring further improvements on GPUs, by finding algorithms that effectively exploit multiple precisions [58, 60], and incur low communication scaling costs [45]. Beyond classical multigrid approaches, the grammar could be extended with additional algorithmic building blocks to design composite solvers for coupled systems such as thermo elasticity [50, 14] or Stokes prob- lems [37, 61, 38]. Since our framework is agnostic to the underlying PDE technology, it could also be tailored to design hybrid approaches that combine neural networks with classical mesh-based methods [59, 64, 17], by training the networks themselves usingneuroevolution 9 [53, 4]. Furthermore, there remains considerable scope to im- prove the quality and cost of the solver generation pipeline, for instance, by learning a better initial population using large-language models for faster convergence [48], and by limiting the evaluation cost for more sophisticated fitness functions (for example, solving multiple right-hand sides per candidate) through statistical techniques like racing procedures[15, 42]. Since our approach is predominantly an offline procedure exploring a very large search space, future work could also focus on coupling it with online auto-tuning frameworks such asGPTune[40], by identifying and transferring a reduced set of critical parameters to such online frameworks, for highly sensitive and non-stationary simulations. Acknowledgments.This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL release number: LLNL-JRNL-2013976-DRAFT. The United States Government retains, and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The research of the 5th author was co-funded by the financial support of the Eu- ropean Union under the REFRESH –Research Excellence For Region Sustainability and High-Tech Industries– Project No. CZ.10.03.01/00/22 003/0000048 via the Op- erational Programme Just Transition. REFERENCES 9 An evolutionary technique that evolves both the topology and weights of neural networks (https://neuroevolutionbook.com). 22D. PARTHASARATHY [1]M. S. Alnæs, A. Logg, K. B. Ølgaard, M. E. Rognes, and G. N. Wells,Unified form language, ACM Transactions on Mathematical Software, 40 (2014), p. 1–37, https://doi. org/10.1145/2566630. [2]P. Amestoy, A. Buttari, J.-Y. L’Excellent, and T. Mary,Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures, ACM Transac- tions on Mathematical Software, 45 (2019), p. 2:1–2:26. [3]R. Anderson, J. Andrej, A. Barker, J. Bramwell, J.-S. Camier, J. Cerveny, V. Dobrev, Y. Dudouit, A. Fisher, T. Kolev, W. Pazner, M. Stowell, V. Tomov, I. Akkerman, J. Dahm, D. Medina, and S. Zampini,Mfem: A modular finite element methods library, Computers & Mathematics with Applications, 81 (2021), p. 42–74, https://doi.org/10. 1016/j.camwa.2020.06.009, http://dx.doi.org/10.1016/j.camwa.2020.06.009. [4]F. Assunc ̧ ̃ ao, N. Lourenc ̧o, P. Machado, and B. Ribeiro,DENSER: deep evolutionary net- work structured representation, Genetic Programming and Evolvable Machines, 20 (2019), p. 5–35, https://doi.org/10.1007/s10710-018-9339-y, https://arxiv.org/abs/1801.01563. [5]O. Avnat and I. Yavneh,On the recursive structure of multigrid cycles, SIAM J. Sci. Comput., 45 (2023), p. S103–S126. [6]A. H. Baker, R. D. Falgout, T. V. Kolev, and U. M. Yang,Multigrid smoothers for ultraparallel computing, SIAM Journal on Scientific Computing, 33 (2011), p. 2864–2887, https://doi.org/10.1137/100798806, http://dx.doi.org/10.1137/100798806. [7]W. Bangerth, R. Hartmann, and G. Kanschat,deal.i—a general-purpose object-oriented finite element library, ACM Trans. Math. Softw., 33 (2007), p. 24–es, https://doi.org/10. 1145/1268776.1268779, https://doi.org/10.1145/1268776.1268779. [8]W. Banzhaf,Artificial Intelligence: Genetic Programming, Elsevier, 2001, p. 789–792, https: //doi.org/10.1016/b0-08-043076-7/00557-x, http://dx.doi.org/10.1016/B0-08-043076-7/ 00557-X. [9]W. Banzhaf, F. D. Francone, R. E. Keller, and P. Nordin,Genetic programming: an in- troduction: on the automatic evolution of computer programs and its applications, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1998. [10]P. Bastian, M. Blatt, A. Dedner, N.-A. Dreier, C. Engwer, R. Fritze, C. Gr ̈ aser, C. Gr ̈ uninger, D. Kempf, R. Kl ̈ ofkorn, M. Ohlberger, and O. Sander,The dune framework: Basic concepts and recent developments, Computers & Mathematics with Applications, 81 (2021), p. 75–112, https://doi.org/10.1016/j.camwa.2020.06.007, http: //dx.doi.org/10.1016/j.camwa.2020.06.007. [11]M. Bauer, S. Eibl, C. Godenschwager, N. Kohl, M. Kuron, C. Rettinger, F. Schorn- baum, C. Schwarzmeier, D. Th ̈ onnes, H. K ̈ ostler, and U. R ̈ ude,walberla: A block- structured high-performance framework for multiphysics simulations, Computers & Math- ematics with Applications, 81 (2021), p. 478–501, https://doi.org/10.1016/j.camwa.2020. 01.007, http://dx.doi.org/10.1016/j.camwa.2020.01.007. [12]M. Bauer, J. H ̈ otzer, D. Ernst, J. Hammer, M. Seiz, H. Hierl, J. H ̈ onig, H. K ̈ ostler, G. Wellein, B. Nestler, and U. R ̈ ude,Code generation for massively parallel phase-field simulations, Proceedings of the International Conference for High Performance Comput- ing, Networking, Storage and Analysis, (2019), p. 1–32, https://doi.org/10.1145/3295500. 3356186. [13]J. D. Bender, O. Schilling, K. S. Raman, R. A. Managan, B. J. Olson, S. R. Copeland, C. L. Ellison, D. J. Erskine, C. M. Huntington, B. E. Morgan, and et al.,Simulation and flow physics of a shocked and reshocked high-energy-density mixing layer, Journal of Fluid Mechanics, 915 (2021), p. A84, https://doi.org/10.1017/jfm.2020.1122. [14]T. Bevilacqua, A. Gumenyuk, N. Habibi, P. Hartwig, A. Klawonn, M. Lanser, M. Reth- meier, L. Scheunemann, and J. Schr ̈ oder,Large-scale thermo-mechanical simula- tion of laser beam welding using high-performance computing: A qualitative reproduc- tion of experimental results, arXiv, (2025), https://doi.org/10.48550/arxiv.2503.09345, https://arxiv.org/abs/2503.09345. [15]M. Birattari, T. St ̈ utzle, L. Paquete, and K. Varrentrapp,A racing algorithm for con- figuring metaheuristics, in Proceedings of the 4th Annual Conference on Genetic and Evo- lutionary Computation, GECCO’02, San Francisco, CA, USA, 2002, Morgan Kaufmann Publishers Inc., p. 11–18. [16]F. B ̈ ohm, D. Bauer, N. Kohl, C. L. Alappat, D. Th ̈ onnes, M. Mohr, H. K ̈ ostler, and U. R ̈ ude,Code generation and performance engineering for matrix-free finite element methods on hybrid tetrahedral grids, SIAM Journal on Scientific Computing, 47 (2025), p. B131–B159. [17]N. Bouziani, D. A. Ham, and A. Farsi,Differentiable programming across the PDE and machine learning barrier, arXiv, (2024), https://doi.org/10.48550/arxiv.2409.06085, https: GRAMMAR-BASED AMG WITH EVOLUTIONARY ALGORITHMS23 //arxiv.org/abs/2409.06085. [18]W. L. Briggs, V. E. Henson, and S. F. McCormick,A multigrid tutorial (2nd ed.), Society for Industrial and Applied Mathematics, USA, 2000. [19]M. Caldana, P. F. Antonietti, and L. Dede’,A deep learning algorithm to accelerate alge- braic multigrid methods in finite element solvers of 3d elliptic PDEs, Computers & Mathe- matics with Applications, 167 (2024), p. 217–231, https://doi.org/10.1016/j.camwa.2024. 05.013, https://arxiv.org/abs/2304.10832. [20]A. Cremers and S. Ginsburg,Context-free grammar forms, Journal of Computer and System Sciences, 11 (1975), p. 86–117, https://doi.org/10.1016/s0022-0000(75)80051-1, http://dx. doi.org/10.1016/S0022-0000(75)80051-1. [21]R. M. Darlington, T. L. McAbee, and G. Rodrigue,A study of ALE simulations of Rayleigh-Taylor instability ∗ , Computer Physics Communications, 135 (2001), p. 58–73, https://doi.org/10.1016/S0010-4655(00)00216-2. [22]E. D. de Jong, R. A. Watson, and J. B. Pollack,Reducing bloat and promoting diversity us- ing multi-objective methods, in Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation, GECCO’01, San Francisco, CA, USA, 2001, Morgan Kauf- mann Publishers Inc., p. 11–18. [23]H. De Sterck, R. D. Falgout, J. W. Nolting, and U. M. Yang,Distance-two interpolation for parallel algebraic multigrid, Numerical Linear Algebra with Applications, 15 (2007), p. 115–139, https://doi.org/10.1002/nla.559, http://dx.doi.org/10.1002/nla.559. [24]H. De Sterck, U. M. Yang, and J. J. Heys,Reducing complexity in parallel algebraic multigrid preconditioners, SIAM Journal on Matrix Analysis and Applications, 27 (2006), p. 1019–1039, https://doi.org/10.1137/040615729, http://dx.doi.org/10.1137/040615729. [25]K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan,A fast and elitist multiobjective genetic algorithm: Nsga-i, IEEE Transactions on Evolutionary Computation, 6 (2002), p. 182– 197, https://doi.org/10.1109/4235.996017. [26]A. Ek ́ art and S. Z. N ́ emeth,Selection based on the pareto nondomination criterion for controlling code growth in genetic programming, Genetic Programming and Evolvable Ma- chines, 2 (2001), p. 61–73, https://doi.org/10.1023/a:1010070616149, http://dx.doi.org/ 10.1023/A:1010070616149. [27]R. Falgout,An introduction to algebraic multigrid, Computing in Science & Engineering, 8 (2006), p. 24–33, https://doi.org/10.1109/mcse.2006.105. [28]R. D. Falgout, J. E. Jones, and U. M. Yang,Numerical solution of partial differential equations on parallel computers, Lecture Notes in Computational Science and Engineering, (2006), p. 267–294, https://doi.org/10.1007/3-540-31619-1 8. [29]F.-A. Fortin, F.-M. De Rainville, M.-A. Gardner, M. Parizeau, and C. Gagn ́ e,DEAP: Evolutionary algorithms made easy, Journal of Machine Learning Research, 13 (2012), p. 2171–2175. [30]D. Greenfeld, M. Galun, R. Basri, I. Yavneh, and R. Kimmel,Learning to optimize multigrid PDE solvers, in Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov, eds., vol. 97 of Proceedings of Machine Learning Research, PMLR, 09–15 Jun 2019, p. 2415–2423, https://proceedings.mlr.press/ v97/greenfeld19a.html. [31]X. Han, F. Hou, and H. Qin,Ugrid: an efficient-and-rigorous neural multigrid solver for linear pdes, in Proceedings of the 41st International Conference on Machine Learning, ICML’24, JMLR.org, 2024. [32]M. A. Heroux and J. M. Willenbring,A new overview of the trilinos project, Scientific Programming, 20 (2012), p. 83–88, https://doi.org/10.3233/spr-2012-0355. [33]R. Huang, R. Li, and Y. Xi,Learning optimal multigrid smoothers via neural networks, SIAM Journal on Scientific Computing, 0 (2022), p. S199–S225, https://doi.org/10.1137/ 21m1430030. [34]A. Katrutsa, T. Daulbaev, and I. Oseledets,Deep multigrid: learning prolongation and restriction matrices, 2017, https://arxiv.org/abs/1711.03825. [35]A. Katrutsa, T. Daulbaev, and I. Oseledets,Black-box learning of multigrid parameters, Journal of Computational and Applied Mathematics, 368 (2020), p. 112524, https://doi. org/10.1016/j.cam.2019.112524, http://dx.doi.org/10.1016/j.cam.2019.112524. [36]N. Kohl, D. Bauer, F. B ̈ ohm, and U. R ̈ ude,Fundamental data structures for matrix-free finite elements on hybrid tetrahedral grids, International Journal of Parallel, Emergent and Distributed Systems, 39 (2024), p. 51–74. [37]N. Kohl and U. R ̈ ude,Textbook efficiency: Massively parallel matrix-free multigrid for the Stokes system, SIAM Journal on Scientific Computing, 44 (2022), p. C124–C155, https: //doi.org/10.1137/20m1376005. 24D. PARTHASARATHY [38]C. Kruse, M. Sosonkina, M. Arioli, N. Tardieu, and U. R ̈ ude,Parallel solution of saddle point systems with nested iterative solvers based on the golub-kahan bidiagonalization, Concurrency and Computation: Practice and Experience, 33 (2021), https://doi.org/10. 1002/cpe.5914. [39]C. Lengauer, S. Apel, M. Bolten, S. Chiba, U. R ̈ ude, J. Teich, A. Gr ̈ oßlinger, F. Han- nig, H. K ̈ ostler, L. Claus, A. Grebhahn, S. Groth, S. Kronawitter, S. Kuckuk, H. Rittich, C. Schmitt, and J. Schmitt,ExaStencils: Advanced Multigrid Solver Gen- eration, Springer International Publishing, 2020, p. 405–452, https://doi.org/10.1007/ 978-3-030-47956-5 14, http://dx.doi.org/10.1007/978-3-030-47956-514. [40]Y. Liu, W. M. Sid-Lakhdar, O. Marques, X. Zhu, C. Meng, J. W. Demmel, and X. S. Li, GPTune, Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, (2021), p. 234–246, https://doi.org/10.1145/3437801.3441621. [41]I. Luz, M. Galun, H. Maron, R. Basri, and I. Yavneh,Learning algebraic multigrid using graph neural networks, in Proceedings of the 37th International Conference on Machine Learning, ICML’20, JMLR.org, 2020. [42]M. L ́ opez-Ib ́ a ̃ nez, J. Dubois-Lacoste, L. P. C ́ aceres, M. Birattari, and T. St ̈ utzle,The irace package: Iterated racing for automatic algorithm configuration, Operations Research Perspectives, 3 (2016), p. 43–58, https://doi.org/10.1016/j.orp.2016.09.002. [43]D. Manrique, J. R ́ ıos, and A. Rodr ́ ıguez-Pat ́ on,Grammar-Guided Genetic Programming, IGI Global, 2009, p. 767–773, https://doi.org/10.4018/978-1-59904-849-9.ch114, http:// dx.doi.org/10.4018/978-1-59904-849-9.ch114. [44]R. T. Mills, M. F. Adams, S. Balay, J. Brown, and A. Dener,Toward performance-portable PETSc for GPU-based exascale systems, Parallel Computing, 108 (2021), p. 102831, https://doi.org/https://doi.org/10.1016/j.parco.2021.102831, https://w.sciencedirect. com/science/article/pii/S016781912100079X. [45]W. B. Mitchell, R. Strzodka, and R. D. Falgout,Parallel performance of algebraic multigrid domain decomposition, Numerical Linear Algebra with Applications, 28 (2021), https://doi.org/10.1002/nla.2342. [46]N. S. Moore, E. C. Cyr, P. Ohm, C. M. Siefert, and R. S. Tuminaro,Graph neural networks and applied linear algebra, SIAM Review, 67 (2025), p. 141–175, https://doi. org/10.1137/23m1609786. [47]B. E. Morgan,Scalar flux transport models for self-similar turbulent mixing, Phys. Rev. E, 112 (2025), p. 025103, https://doi.org/10.1103/vcs5-7ymc, https://link.aps.org/doi/10. 1103/vcs5-7ymc. [48]T. N. Mundhenk, M. Landajuela, R. Glatt, C. P. Santiago, D. M. Faissol, and B. K. Petersen,Symbolic regression via neural-guided genetic programming population seeding, in Proceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ’21, Red Hook, NY, USA, 2021, Curran Associates Inc. [49]M. Orlov, M. Sipper, and A. Hauptman,Genetic and Evolutionary Algorithms and Pro- gramming: General Introduction and Application to Game Playing, Springer New York, 2009, p. 4133–4145, https://doi.org/10.1007/978-0-387-30440-3 243, http://dx.doi.org/10. 1007/978-0-387-30440-3 243. [50]D. Parthasarathy, T. Bevilacqua, M. Lanser, A. Klawonn, and H. K ̈ ostler,To- wards automated algebraic multigrid preconditioner design using genetic programming for large-scale laser beam welding simulations, in Proceedings of the Platform for Ad- vanced Scientific Computing Conference, PASC ’25, New York, NY, USA, 2025, As- sociation for Computing Machinery, p. 1–11, https://doi.org/10.1145/3732775.3733589, https://doi.org/10.1145/3732775.3733589. [51]R. Poli, W. B. Langdon, and N. F. McPhee,A Field Guide to Genetic Programming, Lulu Enterprises, UK Ltd, 2008. [52]F. Rathgeber, D. A. Ham, L. Mitchell, M. Lange, F. Luporini, A. T. T. Mcrae, G.- T. Bercea, G. R. Markall, and P. H. J. Kelly,Firedrake, ACM Transactions on Mathematical Software, 43 (2016), p. 1–27, https://doi.org/10.1145/2998441, https:// arxiv.org/abs/1501.01809. [53]S. Risi, Y. Tang, D. Ha, and R. Miikkulainen,Neuroevolution: Harnessing Creativity in AI Model Design, MIT Press, Cambridge, MA, 2025, https://neuroevolutionbook.com. [54]J. Schmitt and H. K ̈ ostler,Evolving generalizable multigrid-based helmholtz preconditioners with grammar-guided genetic programming, in Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’22, ACM, July 2022, https://doi.org/10.1145/3512290. 3528688, http://dx.doi.org/10.1145/3512290.3528688. [55]J. Schmitt, S. Kuckuk, and H. K ̈ ostler,Evostencils: a grammar-based genetic programming approach for constructing efficient geometric multigrid methods, Genetic Programming and GRAMMAR-BASED AMG WITH EVOLUTIONARY ALGORITHMS25 Evolvable Machines, 22 (2021), p. 511–537, https://doi.org/10.1007/s10710-021-09412-w, http://dx.doi.org/10.1007/s10710-021-09412-w. [56]K. St ̈ uben,Algebraic multigrid: An introduction with applications, in Multigrid, Academic Press, 2000, p. 413–532, https://ci.nii.ac.jp/naid/10018481746. [57]A. Taghibakhshi, S. P. MacLachlan, L. Olson, and M. West,Optimization-based algebraic multigrid coarsening using reinforcement learning, ArXiv, abs/2106.01854 (2021), https: //api.semanticscholar.org/CorpusID:235313621. [58]Y.-H. Tsai,Portable mixed precision algebraic multigrid on high performance gpus, 2024, https: //doi.org/10.5445/IR/1000168914, https://publikationen.bibliothek.kit.edu/1000168914. [59]K. Um, R. Brand, Y. R. Fei, P. Holl, and N. Thuerey,Solver-in-the-loop: learning from differentiable physics to interact with iterative pde-solvers, in Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY, USA, 2020, Curran Associates Inc. [60]P. Vacek,Multigrid Methods for Large-Scale Problems: Approximate Coarsest-Level Solves and Mixed Precision Computation, PhD thesis, Charles University, 2024, https://dspace. cuni.cz/bitstream/handle/20.500.11956/196035/140125741.pdf. Ph.D. thesis. [61]A. Voronin, G. Harper, S. MacLachlan, L. N. Olson, and R. S. Tuminaro,Monolithic multigrid preconditioners for high-order discretizations of stokes equations, SIAM Journal on Scientific Computing, 0 (2025), p. S207–S231, https://doi.org/10.1137/24m1675588. [62]P. A. Whigham,Grammatically-based genetic programming, May 1995, https://api. semanticscholar.org/CorpusID:14127572. [63]U. M. Yang,On the use of relaxation parameters in hybrid smoothers, Numer. Linear Algebra Appl., 11 (2004), p. 155–172, https://doi.org/10.1002/NLA.375, https://doi.org/10.1002/ nla.375. [64]E. Zhang, A. Kahana, A. Kopani ˇ c ́ akov ́ a, E. Turkel, R. Ranade, J. Pathak, and G. E. Karniadakis,Blending neural operators and relaxation methods in PDE numerical solvers, Nature Machine Intelligence, 6 (2024), p. 1303–1313, https://doi.org/10.1038/ s42256-024-00910-x.