← Back to papers

Paper deep dive

Approximate Subgraph Matching with Neural Graph Representations and Reinforcement Learning

Kaiyang Li, Shihao Ji, Zhipeng Cai, Wei Li

Year: 2026Venue: arXiv preprintArea: cs.LGType: PreprintEmbeddings: 63

Abstract

Abstract:Approximate subgraph matching (ASM) is a task that determines the approximate presence of a given query graph in a large target graph. Being an NP-hard problem, ASM is critical in graph analysis with a myriad of applications ranging from database systems and network science to biochemistry and privacy. Existing techniques often employ heuristic search strategies, which cannot fully utilize the graph information, leading to sub-optimal solutions. This paper proposes a Reinforcement Learning based Approximate Subgraph Matching (RL-ASM) algorithm that exploits graph transformers to effectively extract graph representations and RL-based policies for ASM. Our model is built upon the branch-and-bound algorithm that selects one pair of nodes from the two input graphs at a time for potential matches. Instead of using heuristics, we exploit a Graph Transformer architecture to extract feature representations that encode the full graph information. To enhance the training of the RL policy, we use supervised signals to guide our agent in an imitation learning stage. Subsequently, the policy is fine-tuned with the Proximal Policy Optimization (PPO) that optimizes the accumulative long-term rewards over episodes. Extensive experiments on both synthetic and real-world datasets demonstrate that our RL-ASM outperforms existing methods in terms of effectiveness and efficiency. Our source code is available at this https URL.

Tags

ai-safety (imported, 100%)cslg (suggested, 92%)preprint (suggested, 88%)

Links

Your browser cannot display the PDF inline. Open PDF directly →

Intelligence

Status: not_run | Model: - | Prompt: - | Confidence: 0%

Entities (0)

No extracted entities yet.

Relation Signals (0)

No relation signals yet.

Cypher Suggestions (0)

No Cypher suggestions yet.

Full Text

62,978 characters extracted from source content.

Expand or collapse full text

Approximate Subgraph Matching with Neural Graph Representations and Reinforcement Learning IEEE Publication Technology This paper was produced by the IEEE Publication Technology Group. They are in Piscataway, NJ.Manuscript received April 19, 2021; revised August 16, 2021. Abstract Approximate subgraph matching (ASM) is a task that determines the approximate presence of a given query graph in a large target graph. Being an NP-hard problem, ASM is critical in graph analysis with a myriad of applications ranging from database systems and network science to biochemistry and privacy. Existing techniques often employ heuristic search strategies, which cannot fully utilize the graph information, leading to sub-optimal solutions. This paper proposes a Reinforcement Learning based Approximate Subgraph Matching (RL-ASM) algorithm that exploits graph transformers to effectively extract graph representations and RL-based policies for ASM. Our model is built upon the branch-and-bound algorithm that selects one pair of nodes from the two input graphs at a time for potential matches. Instead of using heuristics, we exploit a Graph Transformer architecture to extract feature representations that encode the full graph information. To enhance the training of the RL policy, we use supervised signals to guide our agent in an imitation learning stage. Subsequently, the policy is fine-tuned with the Proximal Policy Optimization (PPO) that optimizes the accumulative long-term rewards over episodes. Extensive experiments on both synthetic and real-world datasets demonstrate that our RL-ASM outperforms existing methods in terms of effectiveness and efficiency. Our source code is available at https://github.com/KaiyangLi1992/RL-ASM. I Introduction Graph analysis, a sub-field of data mining, has gained popularity in recent years as graph structured data becomes increasingly ubiquitous in a broad range of domains [3, 8]. One of the fundamental problems of graph analysis is subgraph matching, which identifies the presence of a query graph in a large target graph. Subgraph matching has a wide variety of applications, including database retrieval [46], knowledge graph mining [16], biomedical analysis [35], social group finding [12], and privacy protection [26]. In most of the subgraph matching studies, the basic assumption is that graphs are noise-free and accurate. In order to identify an occurrence of a query graph, all nodes and edges of the query graph must occur in the target graph. In other words, the matching has to be exact. However, noise commonly exists in many real-world subgraph matching scenarios. Therefore, the Approximate Subgraph Matching (ASM) has emerged as a more practical task in graph analysis. For example, noise is prevalent in protein-protein interaction (PPI) networks [44] due to errors in data collection and varying experimental thresholds. By discovering and analyzing approximate matches, biologists can still validate their hypotheses based on noisy protein interaction network data. Another example is social network de-anonymization [17], which involves identifying anonymous individuals within a social network by correlating nodes from an auxiliary network with those from a target network, where the auxiliary network is typically a noisy subgraph. Effective matching between these two graphs enables adversaries to launch de-anonymization attacks. Therefore, research into ASM is important to evaluate data vulnerability and develop privacy-preserving solutions. Figure 1: An example of approximate subgraph matching, where (u1,v1),(u2,v2),\(u_1,v_1),(u_2,v_2), (u3,v4),(u4,v3)(u_3,v_4),(u_4,v_3)\ achieves the best approximate matching from query graph GqG^q to target graph GtG^t with the smallest graph edit distance of 1. Formally, given a query graph GqG^q, ASM aims to find a subgraph in target graph GtG^t that has the smallest graph edit distance (GED) to GqG^q. An example is illustrated in Fig. 1, where the mapping (u1,v1),(u2,v2),\(u_1,v_1),(u_2,v_2), (u3,v4),(u4,v3)(u_3,v_4),(u_4,v_3)\ achieves the best approximate match from GqG^q to GtG^t with the smallest GED of 1. The existing ASM algorithms can be mainly categorized into two classes: (1) the methods that convert ASM to exact subgraph matching [29, 45], and (2) the methods that employ a branch-and-bound algorithm for combinatorial optimization [37]. The first category of methods specifies a threshold k and reduces the ASM problem to identifying all exact matches between any prototype graphs and target graph, where the prototype graphs are all graphs that can be derived from the query graph with the GED smaller than k. Therefore, the exact matches of the prototype graphs are the approximate matches of the query graph. However, these methods only consider prototypes by adding edges to query graphs, excluding those that are derived by altering nodes’ labels. This limitation renders these methods unsuitable for scenarios where nodes’ labels are noisy. One of the reasons why these methods exclude prototype graphs generated by altering nodes’ labels is that doing so would significantly increase the number of prototype graphs. For example, consider a graph GqG^q, which is a cycle graph C8C_8, where each node has 16 possible label categories. With a GED threshold k=3k=3, considering only edge-induced distances, it results in 1,351 prototype graphs. However, if both edges and node labels are considered in GED, this number escalates to 347,971. On the other hand, the branch-and-bound based ASM methods match one pair of nodes from query graph and target graph one at a time and extend intermediate results iteratively. For example, Tu et al. [37] calculate the GED lower bounds of the best possible solutions within each branch of the search tree, then compare these lower bounds with the GED of the current best match, and prune the branches that are not possible to yield an optimal solution, i.e., the branches with lower bounds greater than the GED of the current best match. However, these methods select mapping node pairs with a greedy strategy based on heuristics, i.e., selecting the node pair that leads to the branch with the minimized GED lower bound at each step by utilizing nodes’ local structural and label information. Therefore, let alone its greedy nature, the method lacks the ability of fully utilizing the graph information in the two input graphs, leading to sub-optimal solutions. To address these limitations of the existing approaches, we introduce a Reinforcement Learning based Approximate Subgraph Matching (RL-ASM) method. RL-ASM employs a Graph Transformer [28] to extract feature representations from graphs, whose performances in graph representation learning have been extensively validated both theoretically and empirically, allowing the use of full graph information for ASM. To mitigate the issues induced by greedy algorithms, we train our neural network based agent with RL algorithms to optimize an accumulative long-term reward over episodes. As a result, our RL-ASM achieves the solutions that are closer to the optimal ones, outperforming the existing search algorithms by a significant margin. The contributions of our work are summarized as follows: • To the best of our knowledge, RL-ASM is the first work that leverages reinforcement learning for high-quality approximate subgraph matching. • Our RL-based Graph Transformer model can fully utilize the graph information and select the node pairs by optimizing an long-term reward instead of being greedy. • Extensive experiments conducted on synthetic and real-world graph datasets demonstrate that RL-ASM outperforms existing methods in terms of effectiveness and efficiency. I Related Work Traditional ASM Methods. Being an NP-hard problem [13], ASM has been tackled in various approaches. Tong et al. [36] investigate ASM in social networks and propose a method that samples the matched subgraph via random walk. Tian et al. [35] and Yuan et al. [43] break the query graph into small fragments and assemble the matches of these fragments to generate the entire match of the query graph. Other works [9, 2, 1] utilize the chi-square statistics to measure the node similarity based on the label distribution of the node’s neighbors and apply the similarity measure to search for ASM. Sussman et al. [34] propose to calculate the permutation matrix on adjacency matrix and match query graph to target graph via the permutation matrix. Recently, Reza et al. [29] and Zhang et al. [45] identify all prototype graphs whose distances from query graph are below a specified threshold, then reduce the ASM problem to the exact subgraph matching problem by searching for exact matches between the prototype graphs and target graphs. Tu et al. [37] approach the ASM problem by formulating it as a tree search problem, employing lower bounds and cutoffs to reduce the search space. Given a sufficient running time, this method can find an optimal solution, i.e., the subgraph of target graph that is the most similar to query graph. Unfortunately, most of the existing methods utilize hand-crafted features to search for the mapped node pairs, and cannot fully exploit the graph information for matching. What’s more, they all exploit greedy algorithms that overlook the potential better matches in future steps, leading to sub-optimal solutions. Reinforcement Learning for NP-hard Graph Problems. There are many prior works focusing on RL algorithms to solve NP-hard problems on graphs, including Minimum Vertex Cover [18], Network Dismantling [11], and Maximum Common Subgraph detection [24, 5]. As of exact subgraph matching, Wang et al. [40] utilize RL to determine the node mapping order on query graphs, while Bai et al. [6] utilize RL to determine the node mapping order on target graphs. Compared with exact subgraph matching, ASM represents an even more challenging task since ASM allows discrepancies between query graph GqG^q and the matched subgragh of GtG^t. Therefore, hard constraints that typically can be used to reduce the search space (e.g., requiring mapped nodes to have the same label) are not applicable. To address this, our RL-ASM employs a branch-and-bound algorithm to reduce the search space and leverages a Graph Transformer to extract graph features and select actions from a large action space. I Preliminary I-A Problem Definition Let G=(V,E)G=(V,E) denote a graph with a set of nodes V connected by a set of edges E. The approximate subgraph matching problem can be defined as follows. Definition 1 Approximate Subgraph Matching (ASM): Given a query graph Gq=(Vq,Eq)G^q=(V^q,E^q) and a target graph Gt=(Vt,Et)G^t=(V^t,E^t), ASM aims to identify a one-to-one mapping M:Vq→VtM:V^q→ V^t such that the graph edit distance C​(M;Gq,Gt)C(M;G^q,G^t) is minimized, where C​(M;Gq,Gt)C(M;G^q,G^t) measures the discrepancies between query graph GqG^q and the subgraph of GtG^t induced by M​(Vq)M(V^q). Following [37, 15], we adopt the graph edit distance C​(M;Gq,Gt)C(M;G^q,G^t) over nodes and edges as C​(M;Gq,Gt)=∑u∈VqDV​(u,M​(u))+∑e∈EqDE​(e,M​(e)),C(M;G^q,G^t)\!=\!\! _u∈ V^q\!D_V(u,M(u))+\! _e∈ E^q\!D_E(e,M(e)), (1) where the node distance is DV​(u,M​(u))=0​(u)=​(M​(u))1​(u)≠​(M​(u)),D_V(u,M(u))= cases0&A(u)=A(M(u))\\ 1&A(u) (M(u)) cases, (2) and the edge distance is DE​(e,M​(e))=0e∈Eq,M​(e)∈Et1e∈Eq,M​(e)∉Et.D_E(e,M(e))= cases0&e∈ E^q,M(e)∈ E^t\\ 1&e∈ E^q,M(e)∉ E^t cases. (3) Here ​(u)A(u) denotes the label of node u. Let e be the edge between nodes u1u_1 and u2u_2 from GqG^q, and M​(e)M(e) refers to the edge between M​(u1)M(u_1) and M​(u2)M(u_2) from GtG^t. Figure 2: An illustration of the search process of ASM on (Gq,Gt)(G^q,G^t). The branch-and-bound search algorithm (Algorithm 1) produces a tree structure, where each node represents a state (sts_t), the node ID reflects the order in which the state is visited, and each direct edge represents an action (ata_t), which adds a node-pair to the current node-node mapping MtM_t. The search is essentially depth-first with pruning through the lower bound check. The policy (Line 12 of Algorithm 1) refers to a node-pair selection strategy, i.e., which state to visit next? The visiting order affects the performance of searching in the tree. For example, if state 6 can be visited before state 1, a better solution can be found in fewer iterations. This means the GED of the current best match c​u​r​r​e​n​t​M​i​n​i​D​i​s​tcurrentMiniDist will be smaller, allowing the algorithm to prune more branches in subsequent search steps. Hence, the search efficiency will be higher. When the search completes or a pre-defined search iteration budget is exhausted, the best solution identified by then will be returned. For the clarity of visualization, some nodes and edges in the search tree are omitted. I-B The Search Algorithm for ASM Since the branch-and-bound search algorithm serves as the foundation of our proposed method, we first review a typical branch-and-bound search algorithm for ASM [37]. We then discuss its limitations and constraints, which lead to our proposed RL-ASM. As shown in Algorithm 1 and Fig. 2, the branch-and-bound search algorithm, as proposed in [37], starts from an initial state with an empty mapping M0M_0, and adds one pair of nodes to the mapping each time, while maintaining the best solution found so far. At each search step, denote the current search state as sts_t, which consists of GqG^q, GtG^t and the current node-node mapping MtM_t. The algorithm attempts to select a node pair (u,v)(u,v), where u∈Gqu∈ G^q and v∈Gtv∈ G^t, as action ata_t, and add the pair to MtM_t. As shown in Fig. 2, each action (represented by a directed edge in the search tree) updates one state to another. For example, action (u2u_2, v5v_5) updates state 0 to state 6, starting a branch that includes states 7-11. In [37], the authors propose a method to calculate the lower bound of C​(M;Gq,Gt)C(M;G^q,G^t) for action selection. At Line 7 of Algorithm 1, the algorithm calculates the lower bound for each branch resulting by an action from action space AtA_t. For example, as shown in Fig. 2, when the current state is state 0, the algorithm calculates the lower bounds corresponding to the branches starting from states 1, 6, and 12. Then the algorithm compares these lower bounds with c​u​r​r​e​n​t​M​i​n​i​D​i​s​tcurrentMiniDist – the GED of the current best match. Any actions leading to the branches with the lower bounds greater than c​u​r​r​e​n​t​M​i​n​i​D​i​s​tcurrentMiniDist will be eliminated from action space AtA_t (Line 8). If all the actions in AtA_t are eliminated, the algorithm will backtrack to the parent search state (Lines 9-11), i.e., the current branch will be cut off. When all of the nodes in GqG^q have been mapped, the algorithm compares its C​(Mt;Gq,Gt)C(M_t;G^q,G^t) with c​u​r​r​e​n​t​M​i​n​i​D​i​s​tcurrentMiniDist. If C​(Mt;Gq,Gt)C(M_t;G^q,G^t) is smaller, indicating a better solution than current best mapping is identified, the algorithm sets the C​(Mt;Gq,Gt)C(M_t;G^q,G^t) as the new c​u​r​r​e​n​t​M​i​n​i​D​i​s​tcurrentMiniDist and updates MtM_t to M∗M^* (Lines 17-20). Note that if the method can find good matches early, c​u​r​r​e​n​t​M​i​n​i​D​i​s​tcurrentMiniDist will be small, which means a lot of branches would be pruned and the search space could be reduced substantially. Algorithm 1 Branch-and-Bound for Approximate Subgraph Matching [37] 1:query graph GqG^q and target graph GtG^t 2:optimal node-node mapping M∗M^* 3:Initalize s0s_0 ← (GqG^q, GtG^t, M0=∅M_0= ) 4:Initalize s​t​a​c​kstack ← new Stack(s0s_0) 5:Initalize c​u​r​r​e​n​t​M​i​n​i​D​i​s​tcurrentMiniDist ← ∞ 6:while s​t​a​c​kstack ≠ ∅ do 7: sts_t ← s​t​a​c​kstack.pop() 8: AtA_t = sts_t.get_actionspace() 9: L​b​o​u​n​d​sLbounds ← sts_t.get_lowerbounds(AtA_t) 10: AtA_t ← prune_actionspace(AtA_t,L​b​o​u​n​d​sLbounds,c​u​r​r​M​i​n​i​D​i​s​tcurrMiniDist) 11: if |At||A_t| = 0 then 12: continue 13: end if 14: ata_t ← policy(sts_t, AtA_t) 15: AtA_t ← At−atA_t-\a_t\ 16: MtM_t ← sts_t.get_mapping() 17: MtM_t ← MtM_t + ata_t 18: if |Mt||M_t| = |Vq||V^q| then 19: if C​(Mt;Gq,Gt)<c​u​r​r​M​i​n​i​D​i​s​tC(M_t;G^q,G^t)<currMiniDist then 20: c​u​r​r​M​i​n​i​D​i​s​tcurrMiniDist ← C​(Mt;Gq,Gt)C(M_t;G^q,G^t) 21: M∗M^* ← MtM_t 22: end if 23: continue 24: end if 25: s​t​a​c​kstack.push(sts_t) 26: st+1s_t+1 ← environment.update(sts_t, ata_t) 27: s​t​a​c​kstack.push(st+1s_t+1) 28:end while The above method leverages a heuristic for action selection, denoted as “policy” at Line 12. Typically, a greedy policy is adopted, where the action leading to the branch with the minimum lower bound of GED is selected. A significant limitation of this method is that the lower bound based heuristics is not adaptive to the complex real-world graph structures because the method cannot fully exploit the information embedded in the graphs. More importantly, the greedy algorithm focuses on a locally optimal objective for action selection at the current step without considering potential better matches in the future steps, leading to sub-optimal solutions. IV Proposed method In this section, we introduce our Reinforcement Learning-based Approximate Subgraph Matching (RL-ASM). We provide a high-level overview of RL-ASM in Section IV-A, including the definitions of state, action, and reward. Section IV-B describes the details of our framework, focusing on the design of representation learning with Graph Transformer for ASM. This is followed by Sections IV-C and IV-D, where the model training is elaborated. In Section IV-E, we discuss why structural encoding features and Graph Transformer are required for ASM. IV-A Overview We formulate the ASM as a Markov Decision Process (MDP). The state sts_t consists of two components: (1) query graph GqG^q and target graph GtG^t, and (2) the current node-node mapping MtM_t from VqV^q to VtV^t. Action at:u→va_t:u→ v is defined as adding a node pair (u,v)(u,v) to MtM_t, where u∈Vqu∈ V^q and v∈Vtv∈ V^t. Our policy assigns a score to each action in action space AtA_t, and is modeled as a neural network Pθ​(at|st)P_θ(a_t|s_t), parameterized by θ, that computes a probability distribution over AtA_t given the current state sts_t. For subgraph matching, the immediate reward rt​(st,at)=rtnode+rtedger_t(s_t,a_t)=r_t^node+r_t^edge consists of two components: the node-matching reward rtnoder_t^node and the edge-matching reward rtedger_t^edge. The node-matching reward rtnoder^node_t is determined by the label compatibility of the nodes added by at:u→va_t:u→ v, yielding +1 for identical labels and -1 otherwise. The edge-matching reward rtedger^edge_t is defined as |Eq+|−|Eq−||E_q^+|-|E_q^-|. Here, let EqE_q be the existing edges between node u and all the mapped nodes of GqG^q at state sts_t. Eq+E_q^+ consists of all edges e∈Eqe∈ E_q whose mapped edge Mt​(e)M_t(e) exists in EtE^t. Conversely, Eq−E^-_q includes those edges e∈Eqe∈ E_q whose mapped edge does not exit in EtE^t. For instance, at state 10 of Fig. 2 with M10=(u1,v1),(u2,v5),(u4,v2)M_10=\(u_1,v_1),(u_2,v_5),(u_4,v_2)\ and a10:u3→v4a_10:u_3→ v_4, Eq+=(u2,u3)E^+_q=\(u_2,u_3)\ since (v4,v5)(v_4,v_5), the mapped edge, is in EtE^t, while Eq−=(u3,u4)E^-_q=\(u_3,u_4)\ since (v2,v4)(v_2,v_4), the mapped edge, is not in EtE^t. Thus, r10edge=0r_10^edge=0. Essentially, rt​(st,at)r_t(s_t,a_t) measures the change of GED after adding node pair (u,v)(u,v) (a.k.a taking action ata_t) to the current mapped subgraphs of query graph and target graph at state sts_t. Figure 3: Overview of RL-ASM. RL-ASM consists of two major components: encoder and decoder. The encoder processes node label, mapping info, positional and/or structural encodings by alternating intra- and inter-components L times (with L layers) to extract powerful node representations. The decoder leverages self-attention to node embeddings of GqG^q to generate a global state representation x_s_t. The action representations are then generated by the product of embeddings of node un​e​x​t∈Gqu_next∈ G^q (according to ϕφ), learnable weight tensors W_3, and embeddings of unmapped candidate nodes v1,v2,v3v_1,v_2,v_3, and v4v_4 from GtG^t. Subsequently, the representations of state and actions are concatenated, which is fed to a MLP and a softmax classifier to calculate the probability distribution over the actions Pθ​(at|st)P_θ(a_t|s_t). If we do not impose any constrains to the action space, the cardinality of the action space will be O​(|Vq|×|Vt|)O(|V^q|×|V^t|), which can be prohibitively huge. To reduce the action space size, we generate ϕφ, the mapping order of nodes in GqG^q, before searching for subgraph matches. At each step, we take one node from GqG^q according to the order ϕφ, and map this node to one of the nodes in GtG^t. We generate the order ϕφ with the method proposed in [7], which is designed for the exact subgraph matching [33]. The algorithm starts by identifying the node u∈Vqu∈ V^q that possesses the maximum degree, and adds it to ϕφ. In the subsequent step, the algorithm prefers to add those nodes that have a larger number of edges with the nodes already present in ϕφ. This is an effective method to create ϕφ because the node with the highest degree and the ones with more mapped neighbors typically encode substantial structural information, and thereby facilitate accurate pattern matching to initiate the search. To enhance the efficiency in the tree search, we further design a mechanism that caches the lower bounds of branches and the policy scores of actions evaluated in previous steps. This allows our method to reuse cached results when backtracking to earlier states. Because of the strong correlation between the lower bounds and the GED of current best solution c​u​r​r​M​i​n​D​i​s​tcurrMinDist [37], we delete the cached results whenever c​u​r​M​i​n​D​i​s​tcurMinDist is changed, and recalculate the lower bounds and the policy scores when backtracking to these states. IV-B The RL-ASM Framework As shown in Fig. 3, our RL-ASM framework consists of an encoder that produces the node embeddings for VqV^q and VtV^t, and a decoder that transforms the embeddings into a probabilistic distribution over actions Pθ​(at|st)P_θ(a_t|s_t) for action selection. To design this model, we face several challenges: (i) many existing graph neural networks (GNNs), which primarily follow a message passing paradigm, are inherently local and their expressiveness cannot exceed the 1-Weisfeiler-Lehman test [41]. Hence, these models lack the capability to tackle the complex approximate subgraph matching problem (see Section IV-E for justifications); (i) as the task involves both query and target graphs, effectively sharing information between the two graphs presents a significant challenge; and (i) each state sts_t contains GqG^q, GtG^t, and the node mapping MtM_t; it is challenging to effectively encapsulate the entire state’s information into a representation that can be utilized to predict Pθ​(at|st)P_θ(a_t|s_t). These challenges have guided the design of our model. IV-B1 Node Features We employ three distinct features for each node: (1) Label encoding: A one-hot vector that specifies the label of each node; (2) Mapping status: A binary number indicating the node’s mapping status - 1 if mapped, and 0 otherwise; this feature is useful to alleviate Challenge (i); (3) Positional and structural encodings: For node’s positional information, we utilize the Laplacian Positional Encoding (LapPE), which involves the eigenvectors associated with the k smallest non-zero eigenvalues of the graph Laplacian [20]. For structural information, we apply Random Walk Structural Encoding (RWSE), derived from the diagonal of the m-step random walk matrix [10]. Rampášek et al. [28] and Kreuzer et al. [20] have demonstrated theoretically and empirically that using positional encoding and structural encoding in GNNs can enhance the model’s expressiveness. Depending on the characteristics of the graph, we choose either positional or structural encoding or both as node features. This alleviates Challenge (i). Positional encoding is particularly effective for image graphs [14], while structural encoding is well suited for molecular graphs [32]. Each of the three features is processed by a distinct linear layer and subsequently concatenated as the feature representations, which are fed to the encoder. IV-B2 Encoder The encoder consists of an intra-component module and an inter-component module as shown in Fig. 3. The intra-component module processes and aggregates messages within query graph GqG^q and target graph GtG^t, separately. The inter-component module shares messages across the graphs based on the relationships of nodes in GqG^q and GtG^t. To alleviate Challenge (i) and effectively utilize both local and global graph information, we employ the GraphGPS layer [28] as the intra-component. This hybrid layer combines a Message Passing Neural Network (MPNN) and a global attention mechanism, and offers greater expressiveness than traditional MPNNs, such as GCN [19], GAT [39], and GraphSAGE [41]. The MPNN propagates messages along the edges, whereas the global attention spreads information throughout the entire graph, as illustrated by the solid and dotted blue lines in Fig. 3. The global attention mechanism enhances this capability by passing messages across all nodes, and therefore overcomes the expressiveness bottleneck associated with over-smoothing and over-squashing [28, 4]. In each GraphGPS layer, node features are updated by integrating outputs from both the MPNN and the global attention instances. Specifically, the GraphGPS layer is formulated as ℓ+=GPSℓ​(ℓ,), +1_intra=GPS (X ,A), (4) which is implemented as ℓ+=MPNNℓ​(ℓ,), _M +1=MPNN (X ,A), (5) ℓ+=GlobalAttnℓ​(ℓ), _H +1=GlobalAttn (X ), (6) ℓ+=MLPℓ​(ℓ++ℓ+), +1_intra=MLP (X_M +1+X_H +1), (7) where ∈ℝN×NA ^N× N is the adjacency matrix of a graph of N nodes; ℓ∈ℝN×dX ^N× d denotes the d-dimensional node features for N nodes at the ℓ -th layer; MLPℓMLP is a 2-layer multilayer perceptron (MLP) block; and ℓ+∈ℝN×dX +1_intra ^N× d represents the output of the intra-component at the ℓ -th layer. MPNNℓMPNN and GlobalAttnℓGlobalAttn are modular components, which are implemented as GatedGCN [23] and Self-Attention [38], respectively. As for the inter-component module of the encoder, we employ a cross-attention mechanism that enables the model to learn how to share messages between GqG^q and GtG^t. This mechanism is established based on the node mapping relationships. For state sts_t, the current mapping MtM_t is u→v∣u∈Vtq,v∈Vt\u→ v u∈ V^q_t,v∈ V^t\, where VtqV^q_t represents the set of nodes of GqG^q that have been mapped at step t. Our method allows the nodes of GqG^q to be mapped to nodes in GtG^t even if their labels differ. Consequently, for any node u′∈Gqu ∈ G^q that remains unmapped, we define its candidate set Cu′C_u as v′∈Vt∣v′​ is unmapped in ​Gt\v ∈ V^t v is unmapped in G^t\. Therefore, the mapping describing the candidate relationships between nodes can be formulated as Mt′=u′→Cu′,∀u′∈Vq∖VtqM _t=\u → C_u ,∀ u ∈ V_q V^q_t\. Let Mt~ M_t denote the union of MtM_t and Mt′M _t, and Mt~−1 M_t^-1 denote the reverse mapping of Mt~ M_t. The messages passing along MtM_t and Mt′M _t are shown as solid and dotted red lines in Fig. 3, respectively. Specifically, the message passing from GqG^q to GtG^t is implemented as follows: ℓ​[u,:]=1ℓ​ℓ​[u,:]+∑v∈Mt~​(u)αu​v​2ℓ​ℓ​[v,:],X _inter[u,:]\!=\!W_1 X _intra[u,:]+\!\!\!\! _v∈ M_t(u)\!\!\!\! _uvW_2 X _intra[v,:], (8) where ℓ​[u,:]X _inter[u,:] and ℓ​[u,:]∈ℝdX _intra[u,:] ^d represent inter- and intra-embeddings of node u∈Vqu∈ V_q at the ℓ -th layer, respectively; the attention coefficients αu​v _uv are derived using multi-head dot-product attention [38], and 1ℓW_1 and 2ℓW_2 are the learnable matrices associated with the ℓ -th layer. Additionally, we implement the message passing from GtG^t to GqG^q following the reverse mapping Mt~−1 M_t^-1. The inter-component module is crucial for information exchange across the graphs, which addresses Challenge (i). The output from the inter-component module is then fed to an activation function, denoted as ℓ+=fa​c​t​i​v​a​t​e​(ℓ+)X +1=f_activate(X +1_inter), yielding the output of the ℓ -th encoder layer. We construct the encoder by stacking four such layers and employ jump knowledge [42] to aggregate the outputs from these layers. IV-B3 Decoder The node embeddings of query graph ∈ℝ|Vq|×dX^q ^|V^q|× d, produced by the encoder, capture the information from GqG^q and from GtG^t through the cross-graph message passing manifested by Mt~ M_t and Mt~−1 M_t^-1. This allows X^q to gather comprehensive information required to represent the entire state. To alleviate Challenge (i), we introduce an attention-based mechanism that calculates the state embedding x_s_t based on the embeddings of the query graph nodes as follows: =Pooling​(SelfAttention​()).x_s_t=Pooling(SelfAttention(X^q)). (9) Here, we utilize a self-attention mechanism [31] on X^q, followed by aggregation of the resulting representations using a pooling operation. In our experiments, average pooling is employed for aggregation. Given that the policy is formulated as Pθ​(at|st)P_θ(a_t|s_t), it is essential to learn the representations of state sts_t and action ata_t. Consider action at:u→va_t:u→ v, with ux_u111Here, u is the next node selected from GqG^q according to the order ϕφ. and v∈ℝdx_v ^d denoting the node representations learned by the encoder, we employ a bilinear tensor product defined by a learnable tensor 3∈ℝd×d×FW_3 ^d× d× F to ensure sufficient interaction between the node embeddings involved by action. Here, F is a hyperparameter that enhances the model’s capacity by adding dimensions to learn complex relationships. Then action at:u→va_t:u→ v can be expressed as uT​3​vx_u^TW_3x_v, which is concatenated with the state embedding stx_s_t. The combined vector is then fed into a multi-layer perceptron (MLP), followed by a softmax layer to generate Pθ​(at|st)=SoftMax​(MLP​(CONCAT​(st,uT​3​v))).\!\!P_θ(a_t|s_t)\!=\!SoftMax(MLP(CONCAT(x_s_t,x_u^TW_3x_v))). (10) IV-C Policy Training The goal of policy training is to maximize the accumulative long-term reward: Rt=∑i=tTγi−t​riR_t= _i=t^Tγ^i-tr_i, where T is the total number of time steps of an episode, γ is the discount factor, and rir_i is the immediate reward incurred by action aia_i at the i-th step. To enhance training efficiency, we disable the backtracking mechanism during the policy training. That is, once our algorithm reaches to a leaf node in the search tree, the episode ends. We employ the Proximal Policy Optimization (PPO) [30], one of the most effective policy gradient methods, to train our model. PPO mitigates excessively large policy changes by clipping policy probability ratios, which helps maintain stability in the learning process. The core term of the PPO loss function is defined as Lc​l​i​p​(θ)=−^t​[min⁡(ρ​(θ)​R^t,clip⁡(ρ​(θ),1−ε,1+ε)​R^t)],L^clip(θ)\!=\!- E_t\!\! [ \! (\!ρ(θ)\! R_t,clip\! (ρ(θ),\!1\!-\! ,\!1\!+\! )\! R_t\! )\! ], (11) where ^t​[⋅] E_t[·] indicates the empirical average over a batch of samples, ρ​(θ)=Pθ​(at|st)Pθo​l​d​(at|st)ρ(θ)= P_θ(a_t|s_t)P_ _old(a_t|s_t) is the ratio of the action probabilities under current and old policy, and R^t R_t denotes the accumulated reward, normalized across the batch. The operation clip⁡(ρ​(θ),1−ε,1+ε)​R^tclip(ρ(θ),1- ,1+ ) R_t modifies the objective by clipping the probability ratio, thereby moderating the influence of extreme policy changes during the training. To ensure sufficient exploration and prevent premature convergence to a local optimal policy, we also incorporate an entropy bonus term to the loss function: Le​n​t​r​o​p(θ)=−^t[H(Pθ(⋅|st))],L^entrop(θ)=- E_t[H(P_θ(·|s_t))], (12) where H​(⋅)H(·) is the entropy of the probability distribution over action space AtA_t given state sts_t. The overall objective function of PPO is LP​P​O​(θ)=Lc​l​i​p​(θ)+c​Le​n​t​r​o​p​(θ),L^PPO(θ)=L^clip(θ)+cL^entrop(θ), (13) where c is a hyperparameter that balances the contributions from the two loss terms. IV-D Pre-training For graph pairs of large graphs, the action space can be huge. To expedite the learning process and enhance sample efficiency, we pre-train our model with imitation learning before the PPO fine-tuning. This pre-training involves sampling subgraphs from the target graph and recording the correspondence between nodes of the sampled and target graphs to define expert actions. Noise is added to these sampled graphs to generate query graphs. In this pre-training stage, the model’s predictions are compared to the expert’s choices, with training conducted using a supervised cross-entropy loss: LI​m​i​t​(θ)=−∑a∈Atya​log⁡(y^a​(θ)),L^Imit(θ)=- _a∈ A_ty_a ( y_a(θ)), (14) where yay_a indicates whether the action a aligns with the expert’s choice (1 for yes, 0 for no), and y^a y_a represents the predicted probability corresponding to action a. Note that during imitation learning, we enable Dropout and Batch Normalization in the model, whereas in the PPO fine-tuning, we disable these components to improve the training stability. IV-E Why are LapPE and Graph Transformer Needed for ASM? The encoder of RL-ASM consists of an intra-component module and an inter-component module. In the inter-component, messages are passed along M~ M and M~−1 M^-1, which denote the edges between the nodes in GqG^q and GtG^t. This implies that the inter-component can be considered as an MPNN. Therefore, if we employ traditional MPNN architectures (e.g., GCN, GAT, or GatedGCN) as the intra-component of our encoder, the entire architecture will be an MPNN. It has been proven that the expressiveness of MPNNs cannot exceed the 1-Weisfeiler-Lehman test [41]. That is, they cannot differentiate non-isomorphic graphs by using only 1-hop neighborhood aggregation. In Fig. 4, when only the labels of nodes (which correspond to node colors in the figure) are used as input features, the MPNN fails to distinguish between G1tG^t_1 and G2tG^t_2 because the nodes with the identical ID in G1tG^t_1 and G2tG^t_2 have the same 1-hop neighborhood structure. Consequently, when attempting to match query graph GqG^q with G1tG^t_1 and G2tG^t_2, an MPNN-based model will yield identical matching results. However, the optimal matching solutions for these cases are different; one is (1,1), (2,2), (3,3), (4,4) and another one is (1,5), (2,2), (3,3), (4,4). This example illustrates the expressiveness limitation of MPNNs when addressing the ASM problem. Kreuzer et al. [20] demonstrate that with the full set of Laplacian eigenvectors, a Graph Transformer model can be a universal function approximator on graphs and can provide an approximate solution to the graph isomorphism problem. This suggests that by integrating LapPE and GraphGPS (a type of Graph Transformer) into the model, our RL-ASM has the capability beyond the traditional MPNN models in solving the ASM problem. Figure 4: Example graphs that are indistinguishable by MPNN. V Experiments In this section, we compare our RL-ASM with state-of-the-art approaches for ASM, and demonstrate its effectiveness and efficiency. All our experiments were conducted using a server equipped with Intel Xeon Gold 6242 CPUs and NVIDIA Tesla V100 GPUs. For reproducible research, we provide our source code at https://github.com/KaiyangLi1992/RL-ASM. V-A Datasets Our experiments utilize one synthetic dataset and three real-world datasets from diverse domains: biology, computer vision, and social networks. These datasets are described as follows: • SYNTHETIC [27]: This dataset comprises 300 synthetic graphs generated by a statistical model. Each graph consists of 100 nodes and 196 edges, and each node is endowed with a normally distributed scalar label. • AIDS [27]: This dataset contains 2,000 graphs, each representing a molecular compound. Nodes in these graphs correspond to atoms, while edges represent the chemical bonds between them. • MSRC_21 [27]: As a semantic image processing benchmark, this dataset includes 563 graphs, each representing the graphical model of an image. Nodes in the graph represent objects in an image, and edges illustrate the relationships among the objects. • EMAIL [22]: Originating from an email communication network at a European research institution, this dataset uses nodes to represent employees, and edges to represent the email interactions between employees. The node label indicates the department to which each employee belongs. For datasets SYNTHETIC, AIDS, and MSRC_21, which contain multiple graphs, we randomly select graphs from these collections to serve as the target graphs. For the EMAIL dataset, which contains a single graph, we use this graph consistently as the target graph. We employ the Breadth-First Search (BFS) on these target graphs to sample nodes, ceasing the sampling when the node count reaches the parameters specified in Table I. From these sampled nodes, induced subgraphs are generated to form the seed query graphs. We then introduce noise to these seed query graphs by adding edges and altering node labels, thereby producing query graphs with noise levels of 0%, 5%, and 10%. Specifically, we calculate the total number of nodes and edges in each query graph, multiply this total number by the noise level to determine the noise intensity, Nn​o​i​s​eN_noise. We then randomly select an integer Nn​o​i​s​en​o​d​eN^node_noise between 0 and Nn​o​i​s​eN_noise, change the labels of Nn​o​i​s​en​o​d​eN^node_noise nodes, and add Nn​o​i​s​e−Nn​o​i​s​en​o​d​eN_noise-N^node_noise edges to the original query graph, thereby generating a noisy query graph. Each pair of a query graph and its corresponding original target graph constitutes a graph pair for ASM. After eliminating duplicate graph pairs, we partition the dataset into training, validation, and test sets following an 8:1:1 ratio. The statistics of the query and target graphs are reported in Table I. Dataset SYNTHETIC AIDS MSRC_21 Email Target size (nodes) 100 17.03 77.48 1005 Target size (edges) 196 17.58 198.18 25571 Query size (nodes) 15.75 8.09 24.12 13.19 Query size (edges) 18.13 8.24 57.29 60.04 Num. of Node Labels 13 40 27 47 Sampling range of |Vq||V^q| [10, 20] [6, 10] [16, 32] [8, 16] TABLE I: Statistics of query graphs and target graphs. To closely simulate real-world conditions, we operate under the assumption that the noise level in query graphs remains unknown. To ensure our model adapts broadly, our training and validation sets for each dataset encompass query graphs from all three noise levels: 0%, 5%, and 10%. However, for the purpose of evaluation, we specifically test our model’s performances on these noise levels to report its performances under different conditions. Training Stage Hyperparameter AIDS SYNTHETIC MSRC_21 EMAIL Hidden Dimension 32 32 32 32 # Encoder Layers 4 4 4 4 Interaction Dimension F 32 32 32 32 Positional and Structural Encodings RWSE RWSE LapPE RWSE Imitation Learning Batch Size 1024 1024 1024 1024 Learning Rate 0.001 0.0005 0.001 0.001 #Epochs 1000 1000 1000 1000 Weight Decay 0.01 0.01 0.01 0.01 PPO Fine-tuning Learning Rate 0.005 0.001 0.0005 0.0005 Entropy Coefficient 0.01 0.01 0.01 0.01 Batch Size 2048 512 1024 64 # Epochs 10 10 10 10 γ 0.99 0.99 0.99 0.99 Clipping Range 0.2 0.2 0.2 0.2 TABLE I: The hyperparameters of RL-ASM used for different benchmark datasets. V-B Baselines We compare our method with a neural-network based method NeuroMatch [25] and two heuristic methods ISM [37] and APM [29]. As shown in Fig. 2, a good initial matching solution is crucial for the search performance of branch-and-bound methods. Therefore, we present the first-round matching results (i.e., the results without backtracking) for both ISM and our method in this section. Additionally, we emphasize the role of imitation learning as an important pre-training stage of our method, showcasing the performances of our model when trained exclusively through imitation learning. We benchmark against the following methods: • NeuroMatch [25] decomposes query and target graphs into small subgraphs and embeds them using graph neural networks. It first predicts the probability of nodes in query graphs matching to nodes in target graphs with the embeddings of the decomposed subgraphs. It then utilizes the Hungarian algorithm [21] to map the nodes between the two graphs such that the sum of the probabilities of the matched node pairs is maximized with the constraint that each node in query graph is assigned to a different node in target graph. • APM [29] reduces ASM to exact subgraph matching. It focuses on identifying exact matches between prototype graphs (i.e., those derived from the query graph with GED below a specific threshold) and target graphs. To avoid an excessively large search space, the method only considers prototypes that are derived by adding edges to query graphs, excluding those that are derived by altering nodes’ labels. • ISM [37] formulates ASM as a tree search problem. It calculates the lower bound of the GED between query graph and the corresponding subgraph in target graph within each search branch from the current step. The branches, whose lower bounds surpass the GED of the best match found so far, are pruned. The method employs a greedy strategy to select the node pair that leads to the branch with the minimal lower bound at each search step. • RL-ASM-IL showcases the first-round mapping results of our RL-ASM, which is trained exclusively through imitation learning. This illustrates the impact of imitation learning to our model. • RL-ASM-FR and ISM-FR illustrate the first-round matching results of RL-ASM and ISM without backtracking. For the experiments of ISM and RL-ASM, we limit the search tree exploration to 600 seconds. If the programs fail to traverse the entire tree within this time limit, we terminate the execution and treat the best matching result found so far as the final outcome. The hyperparameters of RL-ASM used for different benchmark datasets are provided in Table I. We select the best model according to the metric on the validation set and report the metric on the test set. Specifically, the learning rates for imitation learning and PPO fine-tuning are selected from 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2. We search for the hidden dimension of nodes from 16, 32, 64 for RL-ASM. V-C Results V-C1 Effectiveness of Approximate Subgraph Matching The graph edit distance (GED), as reported in Table I, quantifies the discrepancies between the query graph and its corresponding subgraph in the target graph, with smaller values indicating better matches. The results of RL-ASM-IL indicate that even when trained solely via imitation learning, our model’s initial matching results significantly outperform the heuristic methods: APM and ISM. This demonstrates our model’s ability to effectively learn graph information via Graph Transformer for ASM. Compared with NeuroMatch and RL-ASM-IL, RL-ASM-FR shows superior performance on the SYNTHETIC, MSRC_21, and EMAIL datasets. This improvement attributes to the RL training phase, which maximizes an accumulative long-term reward. The enhancements observed from RL training on the MSRC_21 and EMAIL datasets are more pronounced compared to the AIDS dataset. This is because the graphs in MSRC_21 and EMAIL are much larger and more challenging, requiring the model to engage in optimizing long-term rewards and explore a broader solution space, the areas where RL excels. RL-ASM delivers the best results by effectively searching the trees with backtracking to find better subgraph matches beyond the results of RL-ASM-FR. In the following subsection, we will demonstrate that our model, guided by a neural network, can potentially identify optimal solutions within a constrained time limit. TABLE I: Graph Edit Distance (GED) between query graph and its matched subgraph identified from target graph. Note that the “GroundTruth” number denotes the GED between the “noisy” query graph and its corresponding seed query graph sampled from target graph. Dataset SYNTHETIC AIDS MSRC_21 EMAIL Noise Ratio 0% 5% 10% 0% 5% 10% 0% 5% 10% 0% 5% 10% GroundTruth 0.00 1.70 3.39 0.00 1.00 1.67 0.00 3.99 8.03 0.00 3.66 7.32 NeuroMatch [25] 1.52 4.42 7.96 0.80 3.12 7.49 3.56 11.46 22.43 3.71 17.41 26.51 APM [29] 0.00 2.45 4.57 0.00 1.24 2.43 0.00 7.46 13.43 0.00 8.11 15.61 ISM-FR 8.14 9.13 11.03 0.79 1.91 2.92 18.50 21.66 26.87 4.97 11.25 13.32 ISM [37] 7.00 8.35 10.60 0.01 1.04 2.00 17.07 20.69 26.26 4.67 10.74 12.90 RL-ASM-IL 0.00 1.65 3.40 0.06 1.16 1.85 0.01 4.29 8.93 0.00 3.65 7.00 RL-ASM-FR 0.00 1.59 3.31 0.06 1.17 1.84 0.00 4.19 8.93 0.00 3.56 6.63 RL-ASM (ours) 0.00 1.54 3.18 0.00 1.04 1.77 0.00 4.07 8.77 0.00 3.13 6.09 When no noise is added to query graphs (e.g., noise ratio 0%0\%), APM can find the optimal solution because it reduces the ASM to an exact subgraph matching problem. For each graph pair, it attempts to match GqG^q and GtG^t using an exact subgraph mapping method, which can consistently identify the optimal solution when query graph GqG^q contains no noise. However, the performance of APM declines when handling noisy instances of GqG^q as it does not allow mappings between node pairs with differing labels. For ISM, which selects actions based on heuristics, it struggles to find effective solutions in the initial round, as manifested by the results of ISM-FR in Table I. This limitation hinders its ability to prune numerous branches during tree searches. Moreover, the greedy search utilizes heuristics to make decision at each step. Consequently, both its initial and final mappings are prone to sub-optimal. In addition, NeuroMatch is designed for exact subgraph matching; when noise is added to the query graph, it struggles to map nodes accurately based on the learned representations. (a) AIDS (b) SYNTHETIC (c) MSRC_21 (d) EMAIL Figure 5: The probabilities that ISM and our method find the optimal solutions within 600s. V-C2 Efficiency of Approximate Subgraph Matching Both ISM and RL-ASM utilize a branch-and-bound framework to effectively search the entire tree to find the optimal solution. Fig. 5 illustrates the efficiencies of ISM and our method by reporting the probability of each method finding the optimal solution within 600 seconds. The results reveal that our method, guided by a neural network for action selection, is significantly more efficient than ISM. While both methods perform similarly with smaller graphs, such as the instances in the AIDS dataset, the efficiency gap is more pronounced with larger graphs. For example, on the SYNTHETIC dataset, our method significantly outperforms ISM, achieving a probability of identifying the optimal solution that is about 5x higher than that of ISM. TABLE IV: Ablation study of design choices on node-sorting and GraphGPS. Dataset SYNTHETIC Noise-Ratio 0% 5% 10% RL-ASM w/o node-sorting 0 1.63 3.38 RL-ASM w/o GraphGPS 0 1.67 3.47 RL-ASM 0 1.54 3.18 V-C3 Ablation Study In Section IV, we introduce a method based on [7] to sort the nodes in GqG^q for mapping order ϕφ, and we also utilize GraphGPS [28], a type of graph transformer, as the intra-component in our encoder. This section assesses the impact of these design choices by replacing the sorted node order with a random node order or substituting GraphGPS with GatedGCN, which involves removing the global attention from the intra-component of our encoder. Table IV presents the results of these ablation studies. It is observed that both node sorting ϕφ and GraphGPS enhance the performance of our RL-ASM. The node sorting prioritizes nodes with higher degrees and more matched neighbors, leveraging the structural information to facilitate accurate pattern matching. Moreover, GraphGPS, integrating an MPNN with global attention, significantly increases the expressiveness of our model. V-C4 Generalization In the experiments above, we utilize the BFS to sample subgraphs and produce induced subgraphs (a.k.a query graphs) from the sampled nodes. In this section, we adopt a Random Walk (RW) approach for query graph sampling. Specifically, we start at a randomly selected node and move to an adjacent node with random walk, repeating this process for a pre-determined number of steps. During this process, we record all visited nodes and edges to construct the query graphs. Note that the RW sampled subgraphs may not necessarily be the induced subgraphs. Subsequently, we introduce noise to these sampled graphs to create noisy query graphs. This methodology is applied to the SYNTHETIC dataset to create training, validation, and test sets, which are used to assess the performance of ASM involving non-induced subgraphs. The GED between GqG^q and the matched subgraphs, along with the probability of identifying the optimal solutions within 600 seconds, are reported in Table V and Fig. 6, respectively. The results indicate that our model still outperforms the baseline methods in terms of effectiveness and efficiency, even when the query graphs are not induced ones, demonstrating the generalization of RL-ASM. TABLE V: Effectiveness on the random walk sampled dataset. Dataset SYNTHETIC Noise-Ratio 0% 5% 10% ASP 0 2.03 4.04 ISM 5.58 7.69 8.51 RL-ASM 0 1.47 3.42 Figure 6: Efficiency on the random walk sampled dataset. VI Conclusion As an NP-hard problem, approximate subgraph matching (ASM) is a fundamental research task in the database and graph mining communities. One significant strand of ASM methods follows the branch-and-bound search framework, where search performance is heavily influenced by the selection of actions (i.e., the mapped node pairs) in the search tree. We observe that existing methods, which utilize heuristics to select actions, lack the capability to fully exploit the structural and label information of graphs, and make sub-optimal decisions for ASM. In this paper, we introduce RL-ASM, a reinforcement learning based method for ASM that leverages Graph Transformer to extract the full graph information and optimizes an accumulative long-term rewards over episodes for ASM. Extensive experiments demonstrate the effectiveness and efficiency of our approach on four benchmark datasets. ASM is a relatively less investigated research area. We open source our code to facilitate the research in this area. References [1] S. Agarwal, S. Dutta, and A. Bhattacharya (2020) Chisel: graph similarity search using chi-squared statistics in large probabilistic graphs. Proceedings of the VLDB Endowment 13 (10), p. 1654–1668. Cited by: §I. [2] S. Agarwal, S. Dutta, and A. Bhattacharya (2024) VeNoM: approximate subgraph matching with enhanced neighbourhood structural information. In Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD), p. 18–26. Cited by: §I. [3] C. C. Aggarwal, H. Wang, et al. (2010) Managing and mining graph data. Vol. 40, Springer. Cited by: §I. [4] S. Akansha (2023) Over-squashing in graph neural networks: a comprehensive survey. arXiv preprint arXiv:2308.15568. Cited by: §IV-B2. [5] Y. Bai, D. Xu, Y. Sun, and W. Wang (2021) Glsearch: maximum common subgraph detection via learning to search. In International Conference on Machine Learning, p. 588–598. Cited by: §I. [6] Y. Bai, D. Xu, Y. Sun, and W. Wang (2022) Detecting small query graphs in a large graph via neural subgraph search. arXiv preprint arXiv:2207.10305. Cited by: §I. [7] V. Bonnici, R. Giugno, A. Pulvirenti, D. Shasha, and A. Ferro (2013) A subgraph isomorphism algorithm and its application to biochemical data. BMC bioinformatics 14, p. 1–13. Cited by: §IV-A, §V-C3. [8] D. Chakrabarti and C. Faloutsos (2022) Graph mining: laws, tools, and case studies. Springer Nature. Cited by: §I. [9] S. Dutta, P. Nayek, and A. Bhattacharya (2017) Neighbor-aware search for approximate labeled graph matching using the chi-square statistics. In Proceedings of the 26th International Conference on World Wide Web, p. 1281–1290. Cited by: §I. [10] V. P. Dwivedi, A. T. Luu, T. Laurent, Y. Bengio, and X. Bresson (2021) Graph neural networks with learnable structural and positional representations. arXiv preprint arXiv:2110.07875. Cited by: §IV-B1. [11] C. Fan, L. Zeng, Y. Sun, and Y. Liu (2020) Finding key players in complex networks through deep reinforcement learning. Nature machine intelligence 2 (6), p. 317–324. Cited by: §I. [12] W. Fan (2012) Graph pattern matching revised for social network analysis. In Proceedings of the 15th International Conference on Database Theory, ICDT ’12, New York, NY, USA, p. 8–21. External Links: ISBN 9781450307918, Link, Document Cited by: §I. [13] M. R. Garey and D. S. Johnson (1979) Computers and intractability: a guide to the theory of np-completeness. W. H. Freeman and Company, New York. External Links: ISBN 9780716710455 Cited by: §I. [14] K. Han, Y. Wang, J. Guo, Y. Tang, and E. Wu (2022) Vision gnn: an image is worth graph of nodes. Advances in neural information processing systems 35, p. 8291–8303. Cited by: §IV-B1. [15] H. He and A. K. Singh (2006) Closure-tree: an index structure for graph queries. In 22nd International Conference on Data Engineering (ICDE’06), p. 38–38. Cited by: §I-A. [16] S. Hu, L. Zou, J. X. Yu, H. Wang, and D. Zhao (2018) Answering natural language questions by subgraph matching over knowledge graphs. IEEE Transactions on Knowledge and Data Engineering 30 (5), p. 824–837. External Links: Document Cited by: §I. [17] S. Ji, P. Mittal, and R. Beyah (2016) Graph data anonymization, de-anonymization attacks, and de-anonymizability quantification: a survey. IEEE Communications Surveys & Tutorials 19 (2), p. 1305–1326. Cited by: §I. [18] E. Khalil, H. Dai, Y. Zhang, B. Dilkina, and L. Song (2017) Learning combinatorial optimization algorithms over graphs. Advances in neural information processing systems 30. Cited by: §I. [19] T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §IV-B2. [20] D. Kreuzer, D. Beaini, W. Hamilton, V. Létourneau, and P. Tossou (2021) Rethinking graph transformers with spectral attention. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. W. Vaughan (Eds.), Vol. 34, p. 21618–21629. External Links: Link Cited by: §IV-B1, §IV-E. [21] H. W. Kuhn (1955) The Hungarian method for the assignment problem. Naval research logistics quarterly 2 (1-2), p. 83–97. Cited by: 1st item. [22] J. Leskovec, J. Kleinberg, and C. Faloutsos (2007) Graph evolution: densification and shrinking diameters. ACM transactions on Knowledge Discovery from Data (TKDD) 1 (1), p. 2–es. Cited by: 4th item. [23] Y. Li, R. Zemel, M. Brockschmidt, and D. Tarlow (2016) Gated graph sequence neural networks. In Proceedings of ICLR’16, Cited by: §IV-B2. [24] Y. Liu, C. Li, H. Jiang, and K. He (2020) A learning based branch and bound for maximum common subgraph related problems. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, p. 2392–2399. Cited by: §I. [25] Z. Lou, J. You, C. Wen, A. Canedo, J. Leskovec, et al. (2020) Neural subgraph matching. arXiv preprint arXiv:2007.03092. Cited by: 1st item, §V-B, TABLE I. [26] G. Lu, K. Li, X. Wang, Z. Liu, Z. Cai, and W. Li (2024) Neural-based inexact graph de-anonymization. High-Confidence Computing 4 (1), p. 100186. Cited by: §I. [27] C. Morris, N. M. Kriege, F. Bause, K. Kersting, P. Mutzel, and M. Neumann (2020) TUDataset: a collection of benchmark datasets for learning with graphs. In ICML 2020 Workshop on Graph Representation Learning and Beyond (GRL+ 2020), External Links: 2007.08663, Link Cited by: 1st item, 2nd item, 3rd item. [28] L. Rampášek, M. Galkin, V. P. Dwivedi, A. T. Luu, G. Wolf, and D. Beaini (2022) Recipe for a general, powerful, scalable graph transformer. Advances in Neural Information Processing Systems 35, p. 14501–14515. Cited by: §I, §IV-B1, §IV-B2, §V-C3. [29] T. Reza, M. Ripeanu, G. Sanders, and R. Pearce (2020) Approximate pattern matching in massive graphs with precision and recall guarantees. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, p. 1115–1131. Cited by: §I, §I, 2nd item, §V-B, TABLE I. [30] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. Cited by: §IV-C. [31] P. Shaw, J. Uszkoreit, and A. Vaswani (2018) Self-attention with relative position representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), p. 464–468. Cited by: §IV-B3. [32] R. Sun, H. Dai, and A. W. Yu (2022) Does gnn pretraining help molecular representation?. Advances in Neural Information Processing Systems 35, p. 12096–12109. Cited by: §IV-B1. [33] S. Sun and Q. Luo (2020) In-memory subgraph matching: an in-depth study. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, p. 1083–1098. Cited by: §IV-A. [34] D. L. Sussman, Y. Park, C. E. Priebe, and V. Lyzinski (2019) Matched filters for noisy induced subgraph detection. IEEE transactions on pattern analysis and machine intelligence 42 (11), p. 2887–2900. Cited by: §I. [35] Y. Tian, R. C. Mceachin, C. Santos, D. J. States, and J. M. Patel (2007) SAGA: a subgraph matching tool for biological graphs. Bioinformatics 23 (2), p. 232–239. Cited by: §I, §I. [36] H. Tong, C. Faloutsos, B. Gallagher, and T. Eliassi-Rad (2007) Fast best-effort pattern matching in large attributed graphs. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, p. 737–746. Cited by: §I. [37] T. K. Tu, J. D. Moorman, D. Yang, Q. Chen, and A. L. Bertozzi (2020) Inexact attributed subgraph matching. In 2020 IEEE international conference on big data (big data), p. 2575–2582. Cited by: §I, §I, §I, §I-A, §I-B, §I-B, §IV-A, 3rd item, §V-B, TABLE I, Algorithm 1. [38] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017) Attention is all you need. Advances in neural information processing systems 30. Cited by: §IV-B2, §IV-B2. [39] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio (2017) Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §IV-B2. [40] H. Wang, Y. Zhang, L. Qin, W. Wang, W. Zhang, and X. Lin (2022) Reinforcement learning based query vertex ordering model for subgraph matching. In 2022 IEEE 38th International Conference on Data Engineering (ICDE), p. 245–258. Cited by: §I. [41] K. Xu, W. Hu, J. Leskovec, and S. Jegelka (2019) How powerful are graph neural networks?. In International Conference on Learning Representations, External Links: Link Cited by: §IV-B2, §IV-B, §IV-E. [42] K. Xu, C. Li, Y. Tian, T. Sonobe, K. Kawarabayashi, and S. Jegelka (2018) Representation learning on graphs with jumping knowledge networks. In International conference on machine learning, p. 5453–5462. Cited by: §IV-B2. [43] Y. Yuan, G. Wang, J. Y. Xu, and L. Chen (2015) Efficient distributed subgraph similarity matching. The VLDB Journal 24 (3), p. 369–394. Cited by: §I. [44] M. Zaslavskiy, F. Bach, and J. Vert (2009) Global alignment of protein–protein interaction networks by graph matching methods. Bioinformatics 25 (12), p. i259–1267. Cited by: §I. [45] S. Zhang, J. Yang, and W. Jin (2010) Sapper: subgraph indexing and approximate matching in large graphs. Proceedings of the VLDB Endowment 3 (1-2), p. 1185–1194. Cited by: §I, §I. [46] L. Zou, J. Mo, L. Chen, M. T. Özsu, and D. Zhao (2011-05) GStore: answering sparql queries via subgraph matching. Proc. VLDB Endow. 4 (8), p. 482–493. External Links: ISSN 2150-8097, Link, Document Cited by: §I.