Paper deep dive

Mixture of Demonstrations for Textual Graph Understanding and Question Answering

Yukun Wu, Lihui Liu

Year: 2026Venue: arXiv preprintArea: cs.IRType: PreprintEmbeddings: 31

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 95%

Last extracted: 3/26/2026, 1:34:30 AM

Summary

MixDemo is a novel GraphRAG framework designed to improve textual graph-based question answering by addressing noisy subgraphs and optimizing demonstration selection. It utilizes a Mixture-of-Experts (MoE) mechanism to select contextually relevant demonstrations and a query-specific graph encoder to filter irrelevant information, significantly outperforming existing baselines on the GraphQA benchmark.

Entities (5)

GraphRAG · methodology · 100%MixDemo · framework · 100%G-Retriever · baseline-model · 95%GraphQA · benchmark · 95%Mixture-of-Experts · mechanism · 95%

Relation Signals (3)

MixDemo → uses → Mixture-of-Experts

confidence 100% · we propose MixDemo, a novel GraphRAG framework enhanced with a Mixture-of-Experts (MoE) mechanism

MixDemo → evaluatedon → GraphQA

confidence 95% · We evaluate our method on the GraphQA benchmark.

MixDemo → outperforms → G-Retriever

confidence 95% · MixDemo consistently achieves superior performance... It outperforms the strongest baseline, G-Retriever

Cypher Suggestions (2)

Find all frameworks that utilize a specific mechanism · confidence 90% · unvalidated

MATCH (f:Framework)-[:USES]->(m:Mechanism {name: 'Mixture-of-Experts'}) RETURN f.name

Compare performance of models on a specific benchmark · confidence 85% · unvalidated

MATCH (m1:Model)-[r:OUTPERFORMS]->(m2:Model), (m1)-[:EVALUATED_ON]->(b:Benchmark {name: 'GraphQA'}) RETURN m1.name, m2.name

Abstract

Abstract:Textual graph-based retrieval-augmented generation (GraphRAG) has emerged as a powerful paradigm for enhancing large language models (LLMs) in domain-specific question answering. While existing approaches primarily focus on zero-shot GraphRAG, selecting high-quality demonstrations is crucial for improving reasoning and answer accuracy. Furthermore, recent studies have shown that retrieved subgraphs often contain irrelevant information, which can degrade reasoning performance. In this paper, we propose MixDemo, a novel GraphRAG framework enhanced with a Mixture-of-Experts (MoE) mechanism for selecting the most informative demonstrations under diverse question contexts. To further reduce noise in the retrieved subgraphs, we introduce a query-specific graph encoder that selectively attends to information most relevant to the query. Extensive experiments across multiple textual graph benchmarks show that MixDemo significantly outperforms existing methods.

PDF

Open source PDF →Open local PDF →

Full Text

30,723 characters extracted from source content.

Expand or collapse full text

Mixture of Demonstrations for Textual Graph Understanding and Question Answering Yukun Wu, Lihui Liu yukun.wu.mail@gmail.com, hw6926@wayne.edu Independent Researcher, Wayne State UniversityDetroitUSA (2023) Abstract. Textual graph-based retrieval-augmented generation (GraphRAG) has emerged as a powerful paradigm for enhancing large language models (LLMs) in domain-specific question answering. While existing approaches primarily focus on zero-shot GraphRAG, selecting high-quality demonstrations is crucial for improving reasoning and answer accuracy. Furthermore, recent studies have shown that retrieved subgraphs often contain irrelevant information, which can degrade reasoning performance. In this paper, we propose MixDemo, a novel GraphRAG framework enhanced with a Mixture-of-Experts (MoE) mechanism for selecting the most informative demonstrations under diverse question contexts. To further reduce noise in the retrieved subgraphs, we introduce a query-specific graph encoder that selectively attends to information most relevant to the query. Extensive experiments across multiple textual graph benchmarks show that MixDemo significantly outperforms existing methods. Knowledge graph question answering †journalyear: 2023†copyright: acmlicensed†conference: Proceedings of the ACM Web Conference 2023; May 1–5, 2023; Austin, TX, USA†booktitle: Proceedings of the ACM Web Conference 2023 (W ’23), May 1–5, 2023, Austin, TX, USA†price: 15.00†doi: 10.1145/3543507.3583316†isbn: 978-1-4503-9416-1/23/04†ccs: Computing methodologies Reasoning about belief and knowledge†ccs: Information systems Data mining 1. Introduction Large language models (LLMs) have achieved remarkable success in recent years. Yet, most LLMs are trained on open-domain data before fixed cut-off dates (Radford et al., 2018), which inevitably limits their performance when facing domain-specific questions due to outdated or missing knowledge. To address this issue, Retrieval-Augmented Generation (RAG) (Shi et al., 2023; Edge et al., 2025) has emerged as a promising solution, where relevant knowledge is retrieved and incorporated to help LLMs generate accurate responses. Most existing GraphRAG methods (Edge et al., 2025; He et al., 2024) rely on a simple yet strong assumption, that the generated response should include all relevant facts retrieved from the graph. However, this assumption overlooks a critical problem: the retrieved textual graph might contain unnecessary noise or irrelevant information. Consider Figure 1(a), where a user asks “What is a good source of nutrients for a mushroom?” While the textual graph contains correct ’a cut peony’, it also contains useless information ’a flying eagle’ which may mislead the model. How to mitigate low-quality information from the retrieved textual graph to effectively guide answer generation is a problem. Furthermore, existing GraphRAG methods mainly utilize zero shot learning. Selecting high-quality demonstrations is crucial for improving reasoning and answer accuracy. In this paper, we propose a GraphRAG framework that focuses on enhancing both textual graph understanding and QA performance by learning to select high-quality demonstrations and query specific information learning. Instead of assuming all retrieved subgraph content is useful, our approach adaptively identifies the most informative and context-relevant node-level and edge-level evidence. Specifically, we design a specialized, query-specific GraphEncoder to model the complex interactions between nodes within the retrieved subgraphs. This encoder generates a dense graph prompt embedding that captures relational patterns and serves as a bridge between structured knowledge and the LLM’s input space. Additionally, we incorporate a Mixture-of-Experts (MoE) (Dai et al., 2024) paradigm to select the most informative demonstrations to enhance in-context learning. We evaluate our method on the GraphQA benchmark. Extensive experiments show that our model significantly outperforms existing baselines. 2. Problem Definition We study the task of answering queries over a textual graph using large language models (LLMs). A textual graph G=(V,E)G=(V,E) contains natural-language content on both nodes and edges. Given a query q, the goal is to generate an answer agena_gen by retrieving a relevant subgraph S⊆GS G and using it to guide LLM-based reasoning. We adopt in-context learning (ICL), where a set of demonstration examples =(qi,ai)D=\(q_i,a_i)\ is prepended to the query to prompt the LLM. At test time, a subset Sub(D)Sub(D) is selected from D to construct the prompt. The LLM then predicts: a^=arg⁡maxa⁡PLM(a∣Sub(D),q). a= _aP_LM(a (D),q). Final Problem Definition. Our goal is to answer a query q on a textual graph G by: (1) retrieving a relevant subgraph S, and (2) prompting an LLM via ICL with demonstrations (Si,qi,ai)(S_i,q_i,a_i) to generate an accurate answer. 3. Proposed Method Figure 1. (a) An example of a retrieved subgraph. (b) Overview of MixDemo. Our method tackles two key challenges in textual graph question answering: noisy subgraphs and limited examples. Given a query q and textual graph G=(V,E)G=(V,E), we retrieve a subgraph S using GRetriever (He et al., 2024). To reduce noise in S, we apply a query-aware graph attention network that emphasizes relevant nodes and edges. To further enhance reasoning, we use few-shot in-context learning with selected (qi,Si,ai)(q_i,S_i,a_i) examples, enabling the model to learn from both textual and structural patterns. The framework is shown in Figure 1. 3.1. Subgraph Retrieval Given a query q, we use a pretrained language model (Sentence-BERT (Reimers and Gurevych, 2019)) LM(⋅)LM(·) to encode q, nodes, and edges in the textual graph G=(V,E)G=(V,E): (1) q _q =LM(q),vi=LM(vi),ei,j=LM(ei,j) =LM(q),z_v_i=LM(v_i),z_e_i,j=LM(e_i,j) We compute cosine similarities between qz_q and all node/edge embeddings and retrieve the top-k nodes VkV_k and edges EkE_k with highest similarity. A connected subgraph is then constructed using the Prize-Collecting Steiner Tree (PCST) algorithm (He et al., 2024). Each retrieved item is assigned a prize based on its rank, and PCST selects a subgraph S that maximizes total prize while minimizing edge cost: S=argmaxS⊆G(∑vi∈Vkprize(vi)+∑ei,j∈Ekprize(ei,j)−c⋅|ES|). S= S Gargmax( _v_i∈ V_kprize(v_i)+ _e_i,j∈ E_kprize(e_i,j)-c·|E_S|). 3.2. Demonstration Retrieval Building on our subgraph retrieval method, we introduce an approach to select informative few-shot examples for improved language model reasoning. Given a query q, we convert its retrieved subgraph S into text using textualize(⋅)textualize(·), which flattens all node and edge attributes. The textualized graph is concatenated with q to form the prompt x: x=textualize(S)||qx=textualize(S)||q. While nearest-neighbor retrieval in embedding space is common, it may miss globally relevant examples. Inspired by recent work (Wang et al., 2024a), we instead cluster demonstrations by semantic similarity and select a representative from each cluster for more diverse and complementary examples. Specifically, We apply K-means clustering to partition the example pool =(Si,qi,ai)i=1nD=(S_i,q_i,a_i)_i=1^n into C clusters C1,C2,…,CC_1,C_2,…,C_C, treating each cluster as an expert. Clustering is performed on the Sentence-BERT embeddings of the augmented prompts xi=textualize(Si)||qix_i=textualize(S_i)||q_i, following prior work (He et al., 2024). To adaptively determine the optimal number of clusters C, we minimize a regularized objective that balances within-cluster variance and model complexity: (2) C∗=minC∑k=1C∑xi∈Ck|f(xi)−k|22+λC,C^*= _C _k=1^C _x_i∈ C_k|f(x_i)- μ_k|_2^2+λ C, where f(⋅)f(·) denotes the embedding function, k μ_k is the centroid of cluster CkC_k, and λ controls the regularization strength. At inference time, a test query q is augmented into xq=textualize(Sq)||qx_q=textualize(S_q)||q, embedded via f(xq)f(x_q), and assigned to its closest expert based on cosine similarity with cluster centroids: (3) c(q)=arg⁡maxi=1,…,C∗⁡cos⁡(f(xq),i).c(q)= _i=1,…,C^* (f(x_q), μ_i). The selected expert then provides a set of representative demonstrations, which are combined with the input to generate the model’s final prediction. 3.3. Noise Mitigation Building on our retrieval and demonstration selection framework (Section 3.2), we now describe how the final answer is generated using the retrieved subgraphs. Specifically, we encode each subgraph S with a GraphEncoder that transforms its structural and semantic information into a graph-prompt representation. This encoding serves two purposes: (1) preserving relational patterns critical to the query, and (2) filtering irrelevant information through query-sensitive attention, thereby constructing an optimized input for the language model. Prior methods like G-Retriever use GCNs (Kipf and Welling, 2017) or GATs (Veličković et al., 2018), but these suffer from over-smoothing (Chen et al., 2020), making node embeddings indistinguishable, an issue in our setting where retrieved subgraphs mix relevant and noisy content. Effective encoding thus requires query-aware representations that selectively highlight important nodes. For instance, given a query, the encoder should emphasize a cut peony and ignore irrelevant nodes like a flying eagle, even if structurally nearby. To achieve this, we design a query-conditioned GNN where both message passing and node interactions are modulated by the input query q. Specifically, we redefine the edge weight ζei,j(l) _e_i,j^(l) using a query-aware attention mechanism, allowing the model to focus on the most informative edges. At each layer l, the attention weight ζei,j(l) _e_i,j^(l) is computed by: (4) αvi(l) _v_i^(l) =LINEAR(CONCAT(zvi(l),q)) = LINEAR ( CONCAT(z_v_i^(l),q) ) (5) βvj(l) _v_j^(l) =LINEAR(CONCAT(zvj(l),q)) = LINEAR ( CONCAT(z_v_j^(l),q) ) (6) γei,j _e_i,j =LINEAR(CONCAT(zei,j,q)) = LINEAR ( CONCAT(z_e_i,j,q) ) (7) ζei,j(l) _e_i,j^(l) =tanh(αvi(l)+γei,j−βvj(l)) = tanh ( _v_i^(l)+ _e_i,j- _v_j^(l) ) where αvi(l) _v_i^(l), βvj(l) _v_j^(l), and γei,j _e_i,j are learned query-conditioned node/edge embeddings. ζei,j(l) _e_i,j^(l) serves as the attention weight for message passing along edge ei,je_i,j. Similarly, messages are generated using query-conditioned features: (8) msgei,j(l)=LINEAR(CONCAT(zvi(l),zvj(l),zei,j,q))msg_e_i,j^(l)= LINEAR ( CONCAT(z_v_i^(l),z_v_j^(l),z_e_i,j,q) ) and aggregated via attention weights: (9) zvj(l+1)=1dvj∑vi∈(vj)ζei,j(l)⋅msgei,j(l)z_v_j^(l+1)= 1d_v_j _v_i (v_j) _e_i,j^(l)·msg_e_i,j^(l) Unlike standard GCNs that apply static, query-agnostic filters, our GNN conditions node interactions and message passing on the query q. Specifically, we redefine edge weights ζei,j(l) _e_i,j^(l) using a query-aware attention mechanism, allowing the model to emphasize task-relevant nodes and edges. After L layers, each node vj∈Sv_j∈ S has an embedding zvj(L)z_v_j^(L), which we mean-pool to obtain the subgraph representation: zS=POOL(zvj(L))z_S= POOL(z_v_j^(L)). This is projected into the LLM embedding space using an MLP: pgraph=MLP(zS)p_graph= MLP(z_S), serving as the graph prompt. To incorporate multiple demonstration subgraphs zdd∈DR\z_d\_d∈ D_R, we use query-based relevance weighting: (10) λ(q,zd)=es(q,zd)∑d′es(q,zd′),zfinal=∑d=0Nλ(q,zd)zd, λ(q,z_d)= e^s(q,z_d) _d e^s(q,z_d ),z_final= _d=0^Nλ(q,z_d)z_d, where z0≡zcurrentz_0≡ z_current. Generating responses. We prepend task-specific instructions and tokenize all inputs: q=tokenize(q)q= tokenize(q), pdemo=tokenize(pdemo)p_demo= tokenize(p_demo), ptext-graph=tokenize(ptext-graph)p_text-graph= tokenize(p_text-graph), then feed the combined sequence into the frozen LLM: (11) agen=LLM(CONCAT(pdemo,pgraph,ptext-graph,q)),a_gen= LLM( CONCAT(p_demo,p_graph,p_text-graph,q)), yielding the final answer agena_gen. 4. Experiments Table 1. Statistics of datasets. FB means FreeBase. Dataset ExplaGraphs SceneGraphs WebQSP #Graphs 2,766 100,000 4,737 Average #Nodes 5.17 19.13 1370.89 Average #Edges 4.25 68.44 4252.37 Node Attribute concepts Object attributes Entities in FB Edge Attribute relations Spatial relations Relations in FB Task reasoning Scene graph QA KGQA Evaluation metrics Accuracy Accuracy Hit@1 Table 2. Performance comparison for different methods (%). Dataset (Metrics) ExplaGraphs (ACC) SceneGraphs (ACC) WebQSP (Hit@1) Zero-shot 56.50 39.74 41.06 Zero-CoT (Kojima et al., 2022) 57.04 52.60 51.30 CoT-BAG (Wang et al., 2024) 57.94 56.80 39.60 KAPING (Baek et al., 2023) 62.27 43.75 52.64 Graph-based Inference 33.93 42.17 47.22 Frozen LLM + Prompt Tuning (PT) 58.98 63.72 54.11 GraphToken (Perozzi et al., 2024) 85.08 49.03 57.05 G-Retriever 86.19 80.86 70.02 MixDemo 87.31 82.32 71.36 Datasets. We evaluate on the GraphQA benchmark (He et al., 2024), which includes ExplaGraphs, SceneGraphs, and WebQSP. Dataset stats are in Table 1 in the Appendix. Metrics. Following GRetriever, we use accuracy for ExplaGraphs and SceneGraphs, and Hit@1 for WebQSP, which allows multiple correct answers. Baselines. We compare against inference-only methods (e.g., Zero-shot (Kojima et al., 2023), CoT-BAG (Wang et al., 2024b), KAPING (Baek et al., 2023)) and prompt-tuning methods (e.g., Prompt Tuning, GraphToken (Perozzi et al., 2024), G-Retriever (He et al., 2024)). 4.1. Effectiveness of MixDemo The results are summarized in Table 2, which compares MixDemo against all baseline methods. Overall, MixDemo consistently achieves superior performance across all datasets. For example, it outperforms the strongest baseline, G-Retriever, by approximately 1.1% on ExplaGraphs and 1.5% on SceneGraphs. These improvements highlight the effectiveness of the proposed approach. Additionally, we note that naively textizing the retrieved subgraph information and using it as direct input for LLMs often yields poor results, in most cases, performance is significantly degraded. This demonstrates the importance of properly encoding subgraph structural information and integrating it into LLMs. 4.2. Ablation Study (a) Study on Demo Number (b) Study on GNN’s Layers Figure 2. Ablation study. We first evaluate how the number of few-shot examples affects MixDemo’s performance under zero-shot, 1-shot, 2-shot, and 3-shot settings. Since subgraphs in SceneGraphs and WebQSP exceed the LLM’s input limit, we perform this study only on ExplaGraphs. As shown in Figure 2(a), 2-shot learning yields the best performance, with 2-shot and 3-shot results being nearly identical. In the hyperparameter study, we assess how the number of GraphEncoder layers impacts performance. As shown in Figure 2(b), using three layers achieves the highest accuracy. Adding more layers offers no further improvement and can cause overfitting. These results highlight the importance of tuning encoder depth for optimal reasoning in MixDemo. 5. Related work Recent work has highlighted Retrieval-Augmented Generation (RAG) (Gao et al., 2023) as a powerful solution to mitigate key limitations of large language models (LLMs), particularly their tendency toward hallucinations in domain-specific or knowledge-intensive tasks. Current RAG methodologies can be broadly grouped into three paradigms. The simplest form, naive RAG (Ma et al., 2023), operates through a basic pipeline of indexing, retrieval, and generation. Building upon this foundation, advanced RAG systems incorporate optimizations during pre-retrieval, leveraging techniques like query transformation, expansion, and rewriting (Peng et al., 2024; Zheng et al., 2024), while post-retrieval enhancements often involve reranking strategies (Qin et al., 2024). The Mixture of Experts (MoE) framework (Jacobs et al., 1991) has established itself as a fundamental paradigm in machine learning for developing adaptive systems and knowledge graph reasoning (Liu et al., 2019, 2021b, 2021a, 2022a, 2022b, 2022c, 2023, 2023; Hill et al., 2024; Liu et al., 2024d, b, c, a, 2025b, 2025c; Wu et al., 2025; Liu, 2025c, b; Liu et al., 2025a; Liu, 2024; Liu and Tong, 2026d, [n. d.], a, b, a, b; Liu and Shu, 2025; Liu and Tong, 2026c, a, e; Liu, 2025a). Initial work focused on traditional machine learning implementations (Jordan et al., 1996), with subsequent breakthroughs emerging through its integration with deep neural networks (Kipf and Welling, 2017). More recently, researchers have explored applying MoE approaches to in-context learning scenarios (Wang et al., 2024a), demonstrating their potential to enhance large language model (LLM) performance. 6. Conclusion We present MixDemo, a GraphRAG framework which leverages a Mixture-of-Experts demonstration selector and a query-aware graph encoder. By dynamically selecting contextually relevant demonstrations and filtering noisy subgraph information, our approach significantly improves answer accuracy and reasoning robustness across textual graph benchmarks. Experimental results validate that MixDemo outperforms state-of-the-art baselines, demonstrating the importance of adaptive retrieval and noise reduction in GraphRAG systems. References (1) Baek et al. (2023) Jinheon Baek, Alham Fikri Aji, and Amir Saffari. 2023. Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering. In Proceedings of the 1st Workshop on Natural Language Reasoning and Structured Explanations (NLRSE), Bhavana Dalvi Mishra, Greg Durrett, Peter Jansen, Danilo Neves Ribeiro, and Jason Wei (Eds.). Association for Computational Linguistics, Toronto, Canada. Chen et al. (2020) Deli Chen, Yankai Lin, Wei Li, Peng Li, Jie Zhou, and Xu Sun. 2020. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 3438–3445. Dai et al. (2024) Damai Dai, Chengqi Deng, Chenggang Zhao, R. X. Xu, Huazuo Gao, Deli Chen, Jiashi Li, Wangding Zeng, Xingkai Yu, Y. Wu, Zhenda Xie, Y. K. Li, Panpan Huang, Fuli Luo, Chong Ruan, Zhifang Sui, and Wenfeng Liang. 2024. DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models. arXiv:2401.06066 [cs.CL] https://arxiv.org/abs/2401.06066 Edge et al. (2025) Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. 2025. From Local to Global: A Graph RAG Approach to Query-Focused Summarization. arXiv:2404.16130 [cs.CL] https://arxiv.org/abs/2404.16130 Gao et al. (2023) Luyu Gao, Xueguang Ma, Jimmy Lin, and Jamie Callan. 2023. Precise Zero-Shot Dense Retrieval without Relevance Labels. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 1762–1777. doi:10.18653/v1/2023.acl-long.99 He et al. (2024) Xiaoxin He, Yijun Tian, Yifei Sun, Nitesh V. Chawla, Thomas Laurent, Yann LeCun, Xavier Bresson, and Bryan Hooi. 2024. G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering. In Advances in Neural Information Processing Systems, A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (Eds.), Vol. 37. Curran Associates, Inc., 132876–132907. https://proceedings.neurips.c/paper_files/paper/2024/file/efaf1c9726648c8ba363a5c927440529-Paper-Conference.pdf Hill et al. (2024) Blaine Hill, Lihui Liu, and Hanghang Tong. 2024. Ginkgo-P: General Illustrations of Knowledge Graphs for Openness as a Platform. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining. 1066–1069. Jacobs et al. (1991) Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. 1991. Adaptive Mixtures of Local Experts. Neural Computation 3, 1 (1991), 79–87. doi:10.1162/neco.1991.3.1.79 Jordan et al. (1996) Michael Jordan, Zoubin Ghahramani, and Lawrence Saul. 1996. Hidden Markov Decision Trees. In Advances in Neural Information Processing Systems, M.C. Mozer, M. Jordan, and T. Petsche (Eds.), Vol. 9. MIT Press. https://proceedings.neurips.c/paper_files/paper/1996/file/6c8dba7d0df1c4a79d07646be9a26c8-Paper.pdf Kipf and Welling (2017) Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. arXiv:1609.02907 [cs.LG] https://arxiv.org/abs/1609.02907 Kojima et al. (2023) Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2023. Large Language Models are Zero-Shot Reasoners. arXiv:2205.11916 [cs.CL] https://arxiv.org/abs/2205.11916 Liu (2024) Lihui Liu. 2024. Knowledge graph reasoning and its applications: A pathway towards neural symbolic AI. Ph. D. Dissertation. University of Illinois at Urbana-Champaign. Liu (2025a) Lihui Liu. 2025a. Graph-O1: Monte Carlo Tree Search with Reinforcement Learning for Text-Attributed Graph Reasoning. arXiv preprint arXiv:2512.17912 (2025). Liu (2025b) Lihui Liu. 2025b. HyperKGR: Knowledge Graph Reasoning in Hyperbolic Space with Graph Neural Network Encoding Symbolic Path. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 25188–25199. Liu (2025c) Lihui Liu. 2025c. Monte Carlo Tree Search for Graph Reasoning in Large Language Model Agents. In Proceedings of the 34th ACM International Conference on Information and Knowledge Management. 4966–4970. Liu et al. (2023) Lihui Liu, Yuzhong Chen, Mahashweta Das, Hao Yang, and Hanghang Tong. 2023. Knowledge Graph Question Answering with Ambiguous Query. In Proceedings of the ACM Web Conference 2023. 2477–2486. Liu et al. (2025a) Lihui Liu, Jiayuan Ding, Subhabrata Mukherjee, and Carl J Yang. 2025a. MIXRAG: Mixture-of-Experts Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering. arXiv preprint arXiv:2509.21391 (2025). Liu et al. (2021a) Lihui Liu, Boxin Du, Yi Ren Fung, Heng Ji, Jiejun Xu, and Hanghang Tong. 2021a. KompaRe: A Knowledge Graph Comparative Reasoning System. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 3308–3318. Liu et al. (2021b) Lihui Liu, Boxin Du, Heng Ji, ChengXiang Zhai, and Hanghang Tong. 2021b. Neural-Answering Logical Queries on Knowledge Graphs. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1087–1097. Liu et al. (2019) Lihui Liu, Boxin Du, Hanghang Tong, et al. 2019. G-finder: Approximate attributed subgraph matching. In 2019 IEEE International Conference on Big Data (Big Data). IEEE, 513–522. Liu et al. (2022a) Lihui Liu, Boxin Du, Jiejun Xu, Yinglong Xia, and Hanghang Tong. 2022a. Joint Knowledge Graph Completion and Question Answering. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1098–1108. Liu et al. (2024a) Lihui Liu, Blaine Hill, Boxin Du, Fei Wang, and Hanghang Tong. 2024a. Conversational Question Answering with Language Models Generated Reformulations over Knowledge Graph. In Findings of the Association for Computational Linguistics ACL 2024. 839–850. Liu et al. (2022b) Lihui Liu, Houxiang Ji, Jiejun Xu, and Hanghang Tong. 2022b. Comparative Reasoning for Knowledge Graph Fact Checking. In 2022 IEEE International Conference on Big Data (Big Data). IEEE, 2309–2312. Liu et al. (2024b) Lihui Liu, Jinha Kim, and Vidit Bansal. 2024b. Can Contrastive Learning Refine Embeddings. arXiv preprint arXiv:2404.08701 (2024). Liu and Shu (2025) Lihui Liu and Kai Shu. 2025. Unifying knowledge in agentic llms: Concepts, methods, and recent advancements. ACM SIGKDD Explorations Newsletter 27, 2 (2025), 88–96. Liu and Tong ([n. d.]) Lihui Liu and Hanghang Tong. [n. d.]. Neural Symbolic Knowledge Graph Reasoning. ([n. d.]). Liu and Tong (2026a) Lihui Liu and Hanghang Tong. 2026a. Accurate Query Answering with LLMs Over Incomplete KG. In Neural Symbolic Knowledge Graph Reasoning: A Pathway Towards Neural Symbolic AI. Springer, 73–87. Liu and Tong (2026b) Lihui Liu and Hanghang Tong. 2026b. Ambiguous Query Answering with Neural Symbolic Reasoning Over Incomplete KG. In Neural Symbolic Knowledge Graph Reasoning: A Pathway Towards Neural Symbolic AI. Springer, 89–106. Liu and Tong (2026c) Lihui Liu and Hanghang Tong. 2026c. Dynamic Query Answering with Neural Symbolic Reasoning Over Incomplete KG. In Neural Symbolic Knowledge Graph Reasoning: A Pathway Towards Neural Symbolic AI. Springer, 121–136. Liu and Tong (2026d) Lihui Liu and Hanghang Tong. 2026d. Neural Symbolic Knowledge Graph Reasoning: A Pathway Towards Neural Symbolic AI. Springer Nature. Liu and Tong (2026e) Lihui Liu and Hanghang Tong. 2026e. Symbolic Reasoning for Inconsistency Detection Over Complete KG. In Neural Symbolic Knowledge Graph Reasoning: A Pathway Towards Neural Symbolic AI. Springer, 37–54. Liu et al. (2024c) Lihui Liu, Zihao Wang, Jiaxin Bai, Yangqiu Song, and Hanghang Tong. 2024c. New frontiers of knowledge graph reasoning: Recent advances and future trends. In Companion Proceedings of the ACM Web Conference 2024. 1294–1297. Liu et al. (2024d) Lihui Liu, Zihao Wang, Ruizhong Qiu, Yikun Ban, Eunice Chan, Yangqiu Song, Jingrui He, and Hanghang Tong. 2024d. Logic query of thoughts: Guiding large language models to answer complex logic queries with knowledge graphs. arXiv preprint arXiv:2404.04264 (2024). Liu et al. (2025b) Lihui Liu, Zihao Wang, and Hanghang Tong. 2025b. Neural-symbolic reasoning over knowledge graphs: A survey from a query perspective. ACM SIGKDD Explorations Newsletter 27, 1 (2025), 124–136. Liu et al. (2025c) Lihui Liu, Zihao Wang, Dawei Zhou, Ruijie Wang, Yuchen Yan, Bo Xiong, Sihong He, and Hanghang Tong. 2025c. Few-Shot Knowledge Graph Completion via Transfer Knowledge from Similar Tasks. In Proceedings of the 34th ACM International Conference on Information and Knowledge Management. 4960–4965. Liu et al. (2022c) Lihui Liu, Ruining Zhao, Boxin Du, Yi Ren Fung, Heng Ji, Jiejun Xu, and Hanghang Tong. 2022c. Knowledge Graph Comparative Reasoning for Fact Checking: Problem Definition and Algorithms. IEEE Data Eng. Bull. 45, 4 (2022), 19–38. Ma et al. (2023) Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, and Nan Duan. 2023. Query Rewriting in Retrieval-Augmented Large Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore. Peng et al. (2024) Wenjun Peng, Guiyang Li, Yue Jiang, Zilong Wang, Dan Ou, Xiaoyi Zeng, Derong Xu, Tong Xu, and Enhong Chen. 2024. Large Language Model based Long-tail Query Rewriting in Taobao Search. arXiv:2311.03758 [cs.IR] https://arxiv.org/abs/2311.03758 Perozzi et al. (2024) Bryan Perozzi, Bahare Fatemi, Dustin Zelle, Anton Tsitsulin, Mehran Kazemi, Rami Al-Rfou, and Jonathan Halcrow. 2024. Let Your Graph Do the Talking: Encoding Structured Data for LLMs. arXiv:2402.05862 [cs.LG] https://arxiv.org/abs/2402.05862 Qin et al. (2024) Zhen Qin, Rolf Jagerman, Kai Hui, Honglei Zhuang, Junru Wu, Le Yan, Jiaming Shen, Tianqi Liu, Jialu Liu, Donald Metzler, Xuanhui Wang, and Michael Bendersky. 2024. Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting. In Findings of the Association for Computational Linguistics: NAACL 2024, Kevin Duh, Helena Gomez, and Steven Bethard (Eds.). Association for Computational Linguistics, Mexico City, Mexico. Radford et al. (2018) A Radford, J Wu, and R Child. 2018. Language Models are Unsupervised Multitask Learners. (2018). Reimers and Gurevych (2019) Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, Hong Kong, China. Shi et al. (2023) Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, and Wen-tau Yih. 2023. Replug: Retrieval-augmented black-box language models. arXiv preprint arXiv:2301.12652 (2023). Veličković et al. (2018) Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. arXiv:1710.10903 [stat.ML] https://arxiv.org/abs/1710.10903 Wang et al. (2024b) Heng Wang, Shangbin Feng, Tianxing He, Zhaoxuan Tan, Xiaochuang Han, and Yulia Tsvetkov. 2024b. Can Language Models Solve Graph Problems in Natural Language? arXiv:2305.10037 [cs.CL] https://arxiv.org/abs/2305.10037 Wang et al. (2024a) Ruochen Wang, Sohyun An, Minhao Cheng, Tianyi Zhou, Sung Ju Hwang, and Cho-Jui Hsieh. 2024a. One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts. In International Conference on Machine Learning. Wu et al. (2025) Shanglin Wu, Lihui Liu, Jinho D Choi, and Kai Shu. 2025. Improving Factuality in LLMs via Inference-Time Knowledge Graph Construction. arXiv preprint arXiv:2509.03540 (2025). Zheng et al. (2024) Huaixiu Steven Zheng, Swaroop Mishra, Xinyun Chen, Heng-Tze Cheng, Ed H. Chi, Quoc V Le, and Denny Zhou. 2024. Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models. arXiv:2310.06117 [cs.LG] https://arxiv.org/abs/2310.06117