Paper deep dive
ELISA: An Interpretable Hybrid Generative AI Agent for Expression-Grounded Discovery in Single-Cell Genomics
Omar Coser
Abstract
Abstract:Translating single-cell RNA sequencing (scRNA-seq) data into mechanistic biological hypotheses remains a critical bottleneck, as agentic AI systems lack direct access to transcriptomic representations while expression foundation models remain opaque to natural language. Here we introduce ELISA (Embedding-Linked Interactive Single-cell Agent), an interpretable framework that unifies scGPT expression embeddings with BioBERT-based semantic retrieval and LLM-mediated interpretation for interactive single-cell discovery. An automatic query classifier routes inputs to gene marker scoring, semantic matching, or reciprocal rank fusion pipelines depending on whether the query is a gene signature, natural language concept, or mixture of both. Integrated analytical modules perform pathway activity scoringacross 60+ gene sets, ligand--receptor interaction prediction using 280+ curated pairs, condition-aware comparative analysis, and cell-type proportion estimation all operating directly on embedded data without access to the original count matrix. Benchmarked across six diverse scRNA-seq datasets spanning inflammatory lung disease, pediatric and adult cancers, organoid models, healthy tissue, and neurodevelopment, ELISA significantly outperforms CellWhisperer in cell type retrieval (combined permutation test, $p < 0.001$), with particularly large gains on gene-signature queries (Cohen's $d = 5.98$ for MRR). ELISA replicates published biological findings (mean composite score 0.90) with near-perfect pathway alignment and theme coverage (0.98 each), and generates candidate hypotheses through grounded LLM reasoning, bridging the gap between transcriptomic data exploration and biological discovery. Code available at: this https URL (If you use ELISA in your research, please cite this work).
Tags
Links
- Source: https://arxiv.org/abs/2603.11872v1
- Canonical: https://arxiv.org/abs/2603.11872v1
PDF not stored locally. Use the link above to view on the source site.
Intelligence
Status: failed | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 0%
Last extracted: 3/13/2026, 1:16:25 AM
OpenRouter request failed (402): {"error":{"message":"This request requires more credits, or fewer max_tokens. You requested up to 65536 tokens, but can only afford 52954. To increase, visit https://openrouter.ai/settings/keys and create a key with a higher monthly limit","code":402,"metadata":{"provider_name":null}},"user_id":"user_2shvuzpVFCCndDdGXIdfi40gIMy"}
Entities (0)
Relation Signals (0)
No relation signals yet.
Cypher Suggestions (0)
No Cypher suggestions yet.
Full Text
134,502 characters extracted from source content.
Expand or collapse full text
ELISA: An Interpretable Hybrid Generative AI Agent for Expression-Grounded Discovery in Single-Cell Genomics Omar Coser Department of Engineering, Unit of Artificial Intelligence and Computer Systems Università Campus Bio-Medico di Roma Via Alvaro del Portillo omarcoser10@gmail.com Correspondence author: Omar Coser. This manuscript has been submitted for peer review. Abstract Translating single-cell RNA sequencing (scRNA-seq) data into mechanistic biological hypotheses remains a critical bottleneck, as agentic AI systems lack direct access to transcriptomic representations while expression foundation models remain opaque to natural language. Here we introduce ELISA (Embedding-Linked Interactive Single-cell Agent), an interpretable framework that unifies scGPT expression embeddings with BioBERT-based semantic retrieval and LLM-mediated interpretation for interactive single-cell discovery. An automatic query classifier routes inputs to gene marker scoring, semantic matching, or reciprocal rank fusion pipelines depending on whether the query is a gene signature, natural language concept, or mixture of both. Integrated analytical modules perform pathway activity scoringacross 60+ gene sets, ligand–receptor interaction prediction using 280+ curated pairs, condition-aware comparative analysis, and cell-type proportion estimation all operating directly on embedded data without access to the original count matrix. Benchmarked across six diverse scRNA-seq datasets spanning inflammatory lung disease, pediatric and adult cancers, organoid models, healthy tissue, and neurodevelopment, ELISA significantly outperforms CellWhisperer in cell type retrieval (combined permutation test, p<0.001p<0.001), with particularly large gains on gene-signature queries (Cohen’s d=5.98d=5.98 for MRR). ELISA replicates published biological findings (mean composite score 0.90) with near-perfect pathway alignment and theme coverage (0.98 each), and generates candidate hypotheses through grounded LLM reasoning, bridging the gap between transcriptomic data exploration and biological discovery. Code available at: https://github.com/omaruno/ELISA-An-AI-Agent-for-Expression-Grounded-Discovery-in-Single-Cell-Genomics.git (If you use ELISA in your research, please cite this work). Keywords AI Agents, Single Cell Genomics, AI Discovery 1 Introduction Single-cell RNA sequencing (scRNA-seq) has transformed our understanding of cellular heterogeneity by enabling genome-wide transcriptional profiling at single-cell resolution Tang et al. (2009). Standardized analytical pipelines support quality control, normalization, clustering, differential expression, and trajectory inference Luecken and Theis (2019), catalyzing the construction of comprehensive cell atlases across tissues, developmental stages, and disease contexts. However a critical bottleneck persists: translating statistical outputs of differentially expressed gene lists, enriched pathways, and predicted ligand receptor interactions into mechanistic biological hypotheses remains labor-intensive, context-dependent, and difficult to scale or reproduce. Large-language models (LLMs) offer a potential solution to this problem. LLMs encode substantial biomedical knowledge and perform competitively on clinical reasoning benchmarks Singhal et al. (2023), whereas retrieval-augmented generation (RAG) improves factual accuracy by grounding outputs in external knowledge at inference time Lewis et al. (2020). These capabilities have motivated agentic AI architectures that are capable of autonomous planning, tool usage, and iterative reasoning within closed-loop workflows. Recent agentic systems span a broad range of biomedical applications (Table 1). Towards an AI Co-Scientist Gottweis et al. (2025) introduces multi-agent hypothesis generation through structured debate and evolutionary refinement, though it operates over textual knowledge without interfacing with experimental data. Biomni Huang et al. (2025) constructs a unified action space from biomedical tools and databases, enabling dynamic task orchestration including gene prioritization. GeneAgent Wang et al. (2025) and related systems Gao et al. (2024) extend LLM reasoning to gene-set analysis, whereas Virtual Lab Swanson et al. (2025) demonstrates collaborative multi-agent discovery. Within single-cell analysis, CellAgent Xiao et al. (2024) decomposes scRNA-seq workflows into agent-handled subtasks, AutoBA Zhou et al. (2023) generates executable pipelines from natural language, and BRAD Pickard et al. (2025) integrates LLMs with enrichment analysis for biomarker identification. In retrieval-augmented space, GeneGPT Jin et al. (2025) provides structured access to NCBI databases, and systems for deep phenotyping Garcia et al. (2025) and biomedical data extraction Cinquin (2024); Niyonkuru et al. (2025) have demonstrated the utility of RAG for factual grounding. CRISPR-GPT Qu et al. (2025) further illustrates agentic automation for gene-editing experiment design. However, these systems are primarily responsible for curated text and structured databases and lack the capacity to operate directly on high-dimensional transcriptomic representations. Concurrently, foundation models for single-cell biology have achieved remarkable progress in the learning of expressive latent representations from transcriptomic data. scGPT Cui et al. (2024) employs generative pre-training over millions of single-cell transcriptomes, capturing gene-gene dependencies for cell embedding, annotation transfer, and perturbation prediction. Extensions such as scWGBS-GPT Liang et al. (2025) and Tokensome Zhang et al. (2024) broaden learned representations to methylomics and multimodal settings. However, these expression embeddings are not designed for semantic querying; they capture transcriptional similarity in latent spaces that lack alignment with the natural language concepts that biologists use to formulate hypotheses. Notably, the CellWhisperer Schaefer et al. (2025) addressed part of this gap by learning joint embeddings of transcriptomes and textual annotations via contrastive training, enabling chat-based interrogation of scRNA-seq data within CELLxGENE Schaefer et al. (2025). While this establishes a compelling proof of concept for natural-language exploration, it does not incorporate built-in analytical modules for pathway scoring, interaction prediction, or condition-aware comparison. Table 1: Comparison of existing AI systems for biomedical and single-cell analysis. Expr. Emb.: uses expression-derived embeddings from foundation models; Sem. Ret.: semantic retrieval over biological annotations; L–R / Pathway: ligand–receptor interaction and pathway scoring from data; Cond. Comp.: condition-aware comparative analysis; Interp. Report: automated interpretive report generation with LLM. System Expr. Sem. L–R / Cond. Interp. Primary Emb. Ret. Pathway Comp. Report Scope AI Co-Scientist Gottweis et al. (2025) – – – – ✓ Hypothesis generation Biomni Huang et al. (2025) – ✓ – – – General biomedical GeneAgent Wang et al. (2025) – ✓ – – – Gene-set analysis Virtual Lab Swanson et al. (2025) – – – – ✓ Multi-agent discovery CellAgent Xiao et al. (2024) – – – – – scRNA-seq pipelines AutoBA Zhou et al. (2023) – – – – – Pipeline generation BRAD Pickard et al. (2025) – ✓ – – – Biomarker ID GeneGPT Jin et al. (2025) – ✓ – – – Database querying CRISPR-GPT Qu et al. (2025) – – – – – Experiment design scGPT Cui et al. (2024) ✓ – – – – Cell embeddings CellWhisperer Schaefer et al. (2025) ✓ ✓ – – – Multimodal embedding ELISA (ours) ✓ ✓ ✓ ✓ ✓ Interactive sc discovery This landscape reveals a fundamental disconnect: agentic systems and LLM-based tools excel at reasoning over text and generating interpretations but lack direct access to transcriptional data structure, while expression foundation models learn rich cellular representations that remain opaque to natural language interfaces. No existing system has unified expression-derived embeddings with semantic language representations within a single interactive framework for single-cell discovery. ELISA (Embedding-Linked Interactive Single-cell Agent) addresses this gap by integrating scGPT expression embeddings with semantic retrieval (sr) and LLM-based biological interpretation in a unified discovery platform (Fig. 1). Rather than retraining the expression foundation models, ELISA treats scGPT cluster embeddings as an expression-side representation that is explicitly combined with BioBERT-derived semantic embeddings through an automatic hybrid routing mechanism. A query classifier detects whether the input is a gene signature, a natural language concept, or a mixture of both, and routes it to the appropriate retrieval pipeline gene marker scoring, semantic cosine similarity, or reciprocal rank fusion of both enabling flexible navigation across the full spectrum of biological queries. Built-in analytical modules for condition-aware comparative analysis, ligand-receptor interaction prediction, pathway activity scoring, and cell-type proportion analysis operate directly on the embedded data, while an LLM reasoning layer translates statistical outputs into structured biological interpretations. Critically, ELISA enforces strict separation between dataset-derived evidence and LLM-generated knowledge, enabling transparent hypothesis generation. The system produces comprehensive, publication-ready reports with Nature-style visualizations, supporting the full arc from exploratory query to structured scientific output. Figure 1: Overview of the ELISA architecture. The framework comprises three stages. In data preparation (left), a single-cell dataset undergoes standard preprocessing (normalization, log-transform, highly variable gene selection, PCA, neighbor graph construction, and Leiden clustering), after which per-cluster differential expression statistics are computed, enriched with Gene Ontology (GO) and Reactome terms, and encoded into 768-dimensional semantic embeddings via BioBERT. In parallel, cell-level expression embeddings are generated through scGPT. Both representations are fused into a single serialized embedding file (.pt). In the retrieval and analysis stage (center), a query classifier routes user input—gene signatures, natural language concepts, or mixed queries—to the appropriate pipeline: gene marker scoring, semantic retrieval, or hybrid retrieval via reciprocal rank fusion (RRF). Additional analytical modules perform pathway scoring, ligand–receptor interaction prediction, comparative analysis, and proportion estimation directly on the embedded data. In the interpretation stage (right), all retrieval and analysis outputs are passed to a Groq-hosted LLM (LLaMA 3.1-8B) that generates grounded biological interpretations and structured reports. We validated the ELISA on five diverse scRNA-seq datasets spanning distinct tissues, disease contexts, and experimental designs. Through a systematic comparison with published findings, we demonstrate that ELISA recovers key biological signals differentially expressed genes, altered cell-type proportions, pathway activities, and cell cell interaction networks with high fidelity. A quantitative evaluation framework comprising five complementary metrics (gene coverage, interaction recovery, pathway alignment, proportion consistency, and qualitative theme coverage) provides a principled assessment of the capacity of the system to replicate established biological conclusions. To the best of our knowledge, scGPT embeddings have not been integrated with semantic language representations in a query-conditioned retrieval framework for single-cell genomics. In summary, this work makes the following contributions: • Multimodal discovery agent for single-cell genomics. We introduce ELISA, an interpretable AI framework that integrates transcriptomic embeddings, semantic knowledge retrieval, and large language model reasoning to enable natural-language–driven exploration and biological discovery from single-cell RNA sequencing data. • Query-adaptive hybrid retrieval architecture. ELISA employs automatic query classification and dynamic pipeline routing to combine complementary retrieval strategies including gene marker scoring, semantic similarity search, and reciprocal rank fusion allowing flexible, query-conditioned navigation of complex cellular landscapes. • Integrated biological analysis modules for expression-grounded reasoning. The system incorporates analytical components for comparative expression analysis, ligand–receptor interaction scoring, pathway activity estimation, and cell-type proportion profiling, enabling automated interpretation and contextualization of discovered signals. • Benchmarking framework for evaluating AI-assisted biological discovery. We propose a quantitative evaluation strategy that measures the ability of AI agents to recover biologically meaningful findings reported in reference studies, and apply this framework across six diverse scRNA-seq datasets. • Empirical validation of discovery performance. Across multiple datasets and evaluation metrics, ELISA consistently recovers the majority of key biological signals reported in the corresponding studies, demonstrating its potential to support interpretable and reproducible AI-assisted discovery in single-cell genomics. 2 Materials and Methods Detail about parameters and hyperparameters and software are specified in appendix 6,F.8. Detail about dataset are in E,5. Detail about the method are in F. 2.1 Datasets ELISA was validated on six publicly available scRNA-seq datasets from CZ CELLxGENE Discover (Table 5), spanning lung (cystic fibrosis)Berg et al. (2025), adrenal tumor (neuroblastoma)Yu et al. (2025), multi-cancer immune checkpoint blockadeGondal et al. (2025), lung organoid Lim et al. (2025), healthy breast tissueBhat-Nakshatri et al. (2024), and first-trimester brainMannens et al. (2025). Datasets were downloaded in AnnData format and preprocessed into a standardized embedding format. Cell type annotations from the original publications were retained without modification. 2.2 System architecture ELISA integrates four modules a hybrid retrieval engine, an analytical suite, a visualization toolkit, and an LLM chat interface operating on a shared serialized PyTorch embedding file per dataset. Each embedding file stores cluster identifiers, BioBERT semantic embeddings (768-d), optional scGPT expression embeddings, per-cluster differential expression statistics, gene ontology (GO) and Reactome enrichment terms, and metadata. This cluster-level representation eliminates the need for access to the original count matrix at query time. 2.3 Hybrid retrieval An automatic query classifier routes each input to one of the three pipelines based on token-level heuristics. Gene queries (≥ 60% gene-symbol tokens) were scored against per-cluster Differential Expression (DE) profiles using a weighted function of |log2FC|| _2FC| and expression specificity (pctin−pctoutpct_in-pct_out). Ontology queries are encoded with BioBERTLee et al. (2020) and matched to precomputed cluster description embeddings via cosine similarity, augmented by Cell Ontology name boosting (α=0.15α=0.15) and synonym expansion (β=0.10β=0.10). Mixed queries are resolved through reciprocal rank fusion (RRF) of both pipelines (k=60k=60). For benchmarking, an additive union strategy selects the higher-recall modality as primary and appends unique results from the secondary pipeline. 2.4 Analytical modules The four built-in modules operate directly on the embedded data. Ligand–receptor interaction prediction scores source–target cluster pairs using a curated database of 280+ pairs compiled from CellChatJin et al. (2025), CellPhoneDBEfremova et al. (2020), and NicheNetBrowaeys et al. (2020). Pathway activity scoring quantifies 60+ curated gene sets across five categories (immune signaling, cell biology, neuroscience, metabolism and tissue-specific). Comparative analysis stratifies clusters by condition metadata and identifies condition-biased gene expression. Proportion analysis computes per-cluster cell fractions and condition-specific fold changes. Detailed description in F.3. 2.5 LLM interpretation Retrieval and analysis outputs are interpreted by LLaMA-3.1-8B-Instant Grattafiori et al. (2024) via the Groq API (temperature 0.2)(free to use with token limit, API of chatGPT Achiam et al. (2023), gemini Team et al. (2023) and claude Anthropic (2024) are integrated and ready to use). Prompts enforce strict grounding in dataset evidence, with explicit instructions to avoid hallucination and causal claims. A discovery mode generates structured outputs comprising dataset evidence, established biology, consistency analysis, and candidate hypotheses. 2.6 Benchmarking Retrieval was evaluated using 100 queries (50 ontology, 50 expression) with curated expected clusters, assessed using Cluster Recall@k and Mean Reciprocal Rank (MRR). ELISA was compared against a CellWhisperer Schaefer et al. (2025). Analytical modules were evaluated against ground truth from source publications using interaction recovery rate, pathway alignment, proportion consistency, and gene recall. A combined permutation test (50,000 permutations) assessed overall significance across all metrics simultaneously. 3 Results 3.1 ELISA’s hybrid retrieval outperforms CellWhisperer across datasets and query types To evaluate the ability of ELISA to retrieve biologically relevant cell types from single-cell atlases, we benchmarked its retrieval performance against CellWhisperer Schaefer et al. (2025), a state-of-the-art multimodal framework for natural-language interrogation of scRNA-seq data. For each of the six datasets (Table 5), we designed paired sets of ontology queries (concept-level, e.g., “macrophage infiltration in CF (Cystic Fibrosis) airways”) and expression queries (gene-signature-based, e.g., “MARCO FABP4 APOC1 C1QB C1QC MSR1”), with curated expected cluster sets derived from the corresponding reference publications. We evaluated four retrieval modes: CellWhisperer, Semantic ELISA, scGPT ELISA (gene marker scoring pipeline), and ELISA Union (additive fusion of semantic and gene pipelines via adaptive routing). Performance was assessed using Cluster Recall@k and Mean Reciprocal Rank (MRR) across both query categories (Fig. 2; formal definitions of all retrieval and analytical evaluation metrics are provided in Supplementary Section C). Figure 2: ELISA outperforms CellWhisperer across six datasets and both query types. Radar plots showing retrieval performance on ontology (Ont) and expression (Exp) queries for each dataset. Each plot displays six axes: Cluster Recall@k at two dataset-adapted cutoffs and Mean Reciprocal Rank (MRR), evaluated separately on ontology and expression queries (see Supplementary Section C for metric definitions). Higher values (further from center) indicate better performance. Four retrieval modes are compared: CellWhisperer (pink dashed), ELISA Semantic (blue), ELISA scGPT (orange), and ELISA Union (green). The Union mode consistently achieves the largest radar footprint, matching or exceeding CellWhisperer on ontology metrics while substantially outperforming it on expression metrics. ELISA Union significantly outperformed CellWhisperer across all datasets and metrics (combined permutation test, p<0.001p<0.001; see Table 2). Across all six datasets, the ELISA mode consistently achieved the highest or near-highest performance on every metric, enveloping or matching the CellWhisperer profile on all axes of the radar plots (Fig. 2). To quantify this advantage, we performed paired statistical tests across the six datasets for each retrieval metric (Table 2). A combined permutation test aggregating all 12 metrics simultaneously confirmed that ELISA Union significantly outperformed CellWhisperer (p<0.001p<0.001; 50,000 permutations). This overall advantage was driven by large improvements on expression queries (mean Δ = +0.41, paired t-test p<0.001p<0.001, Cohen’s d = 5.98; mean Δ @5 = +0.29, p=0.006p=0.006, d = 1.57) and consistent gains on ontology queries (mean Δ = +0.15, p=0.028p=0.028, d = 1.02; mean Δ @5 = +0.08, p=0.047p=0.047, d = 0.84). Across all six datasets, the ELISA Union won 46 of 54 individual metric comparisons against CellWhisperer, with no dataset in which CellWhisperer held an overall advantage. The Semantic ELISA pipeline alone also significantly outperformed CellWhisperer (combined permutation test, p=0.003p=0.003), as did the scGPT pipeline (p=0.023p=0.023), confirming that both modalities independently contribute retrieval value beyond the CellWhisperer baseline. Table 2: Statistical comparison of ELISA Union vs. CellWhisperer retrieval performance. For each metric, Δ mean reports the average improvement of Union over CellWhisperer across datasets. Cohen’s d is the paired effect size. p-values are from one-sided paired t-tests (H1H_1: Union >> CellWhisperer). Sign indicates datasets where Union outperformed CellWhisperer. Metrics with fewer than 6 datasets reflect different Recall@k cutoffs used per dataset (see Supplementary Section B). The combined permutation test (p<0.001p<0.001) aggregates all metrics simultaneously. Category Metric Δ mean Cohen’s d p (paired t) Sign (W/L) n Expression MRR +0.409 5.98 <<0.001 6/6 6 Expression Recall@5 +0.287 1.57 0.006 5/5 5 Expression Recall@3 +0.428 5.38 0.006 3/3 3 Expression Recall@2 +0.492 3.43 0.014 3/3 3 Expression Recall@1 +0.442 1.84 0.043 3/3 3 Expression Recall@10 +0.284 1.43 0.065 3/3 3 Ontology MRR +0.152 1.02 0.028 5/6 6 Ontology Recall@5 +0.078 0.84 0.047 4/5 5 Ontology Recall@10 +0.113 2.46 0.025 3/3 3 Ontology Recall@1 +0.086 0.61 0.199 2/3 3 Ontology Recall@2 +0.046 0.73 0.166 2/3 3 Ontology Recall@3 +0.032 0.80 0.150 2/3 3 Combined (all 12 metrics) +0.237 — <<0.001† 46/54‡ 6 † permutation test (50,000 permutations). ‡ metric-level wins across all datasets. A key observation is that no single retrieval modality dominated across both query types. The Semantic pipeline consistently excelled on ontology queries, where biological concept matching benefits from BioBERT’s language understanding, synonym expansion, and Cell Ontology name boosting. In contrast, the gene marker scoring pipeline showed its strongest performance on expression queries, where matching transcriptomic signatures to cluster DE profiles is essential. This complementarity was particularly pronounced in the CF Airways dataset, where the Semantic pipeline achieved high ontology Recall@10 (∼ 0.95) but lower expression recall, while the gene pipeline showed the inverse pattern. Similar modality-specific advantages were visible across all datasets: in the Breast Tissue Atlas, Semantic and Union nearly overlapped on ontology metrics while the gene pipeline lagged; in Immune Checkpoint Blockade (ICB) Multi-Cancer, the gene pipeline outperformed Semantic on expression MRR while underperforming on ontology axes. CellWhisperer showed competitive performance on ontology queries in several datasets, particularly CF Airways and High-Risk Neuroblastoma, where its ontology MRR approached that of the ELISA Semantic pipeline. However, CellWhisperer’s performance dropped substantially on expression queries across all six datasets, with a mean MRR of 0.397 ± 0.049 compared to 0.806 ± 0.061 for ELISA Union a twofold difference (Table 2) Cohen (2013); Casella and Berger (2024). This gap was most severe in the ICB Multi-Cancer and First-Trimester Brain datasets, where CellWhisperer’s expression recall fell well below both ELISA pipelines. The expression query deficit reflects a fundamental architectural difference: CellWhisperer’s contrastive text transcriptome alignment is optimized for natural-language cell type descriptions but does not incorporate a dedicated gene marker scoring mechanism for queries formulated as gene signatures, a query type that is common in exploratory single-cell analysis. The ELISA Union mode resolves the tension between ontology and expression retrieval through its adaptive routing mechanism. For each query, the automatic classifier identifies whether the input is a gene list, a natural-language concept, or a mixture, and routes it to the appropriate pipeline. The additive union strategy then combines the full ranked output of the primary pipeline with unique clusters from the secondary pipeline, ensuring that relevant cell types captured by either modality are not lost. This yielded consistent gains: in the CF Airways dataset, Union achieved a larger and more balanced radar footprint than any single modality; in the Breast Tissue Atlas, Union matched the near-perfect ontology performance of Semantic while substantially improving expression recall; and in the First-Trimester Brain, Union compensated for Semantic’s lower expression scores by incorporating the gene pipeline’s matching strength. Notably, the performance advantage of ELISA was robust across datasets with very different structural properties. The CF Airways dataset (30 cell types, casecontrol design) and the First-Trimester Brain atlas (160 clusters, developmental trajectory without disease contrast) represent opposite ends of the complexity spectrum, however the ELISA Union outperformed CellWhisperer in both settings. Similarly, the ICB Multi-Cancer dataset, which integrates nine cancer types across 223 patients, poses a challenging retrieval scenario owing due to its heterogeneous cell type nomenclature, yet ELISA maintains its performance advantage. In summary, ELISA’s hybrid retrieval architecture combining semantic language matching, gene marker scoring, and adaptive fusion provides a significantly superior retrieval framework compared to text-only multimodal approaches (combined permutation test, p<0.001p<0.001). The systematic advantage on expression queries, where dedicated gene scoring compensates for the limitations of language-only embeddings (Cohen’s d = 5.98 for MRR), establishes that both retrieval modalities contribute essential and non-redundant information for comprehensive single-cell atlas interrogation. 3.2 ELISA replicates key biological findings across six diverse datasets To evaluate whether ELISA could recover published biological conclusions through automated analysis alone, we compared ELISA-generated reports with the main-text results of six reference publications (Table 5). For each dataset, ELISA was provided only with the preprocessed embedding file and no prior knowledge of the expected findings. We assessed replication across five quantitative metrics: gene coverage, pathway alignment, interaction recovery, proportion consistency, and theme coverage, and obtained an independent domain expert evaluation score (Table 3). Across all six datasets, ELISA achieved a mean composite score of 0.90 (range 0.82–0.96). Pathway alignment and theme coverage were near-perfect (mean 0.98 each), while gene coverage averaged 0.85 and interaction recovery 0.77. Independent biological evaluation scores (mean 0.88) confirmed strong agreement with published findings. The computation of these metrics is presented in the appendix B. Airways with Cystic fibrosis. ELISA was used to recover the major epithelial and immune cell populations, as described by Berg et al. Berg et al. (2025), including correct proportion shifts and IFN-γ/type I interferon programs (pathway alignment: 1.0). Gene coverage reached 0.80, capturing markers such as IFNG, CD69, and HLA-E. Interaction recovery was 0.20, reflecting partial detection of the HLA-E/NKG2A and CALR–LRP1 axes (composite: 0.82). High-risk neuroblastoma. ELISA identified all major cellular compartments and correctly detected the HB-EGF/ERBB4 paracrine axis (interaction recovery: 1.00) as described by Yu et al. Yu et al. (2025). Pathway alignment was perfect and with mTOR, MAPK, and ErbB programs identified. Gene coverage was 0.84, with partial recovery of therapy-induced markers (composite: 0.95). Immune checkpoint blockade across cancers. Using the ICB dataset, Gondal et al. Gondal et al. (2025), ELISA captured checkpoint molecules (CD274, PDCD1, CTLA4), exhaustion markers, and all major ligand–receptor axes including PD-L1/PD-1 and TIGIT/NECTIN2 (gene coverage: 0.77; pathway and interaction recovery: 1.00; composite: 0.93). Healthy breast tissue atlas. ELISA achieved its highest composite score (0.96) on the dataset of Bhat-Nakshatri et al. Bhat-Nakshatri et al. (2024), accurately resolving the epithelial hierarchy with a gene coverage of 0.96, perfect pathway alignment, and interaction recovery of 0.80. Ancestry-related transcriptional programs were not captured, reflecting a limitation of ELISA’s pathway-centric framework. Fetal lung Alveolar Type (AT2) organoids. ELISA achieved perfect gene coverage (1.00) on the dataset of Lim et al. Lim et al. (2025), detecting all canonical surfactant genes and correctly identifying surfactant metabolism, Wnt, and Fibroblast Growth Factor (FGF) programs. Interaction recovery was lower (0.40), as SFTPC trafficking mechanisms were outside transcriptomic scope (composite: 0.91). First-trimester human brain. Despite operating solely on the transcriptomic component of this multimodal atlas Mannens et al. (2025), ELISA identified major neuronal populations with gene coverage of 0.85 and perfect pathway and interaction recovery. Chromatin accessibility analyses were correctly identified as outside scope (composite: 0.95). Summary. ELISA demonstrated robust replication across all six datasets (mean composite 0.90), with the strongest performance for pathway-level and thematic interpretation (≥ 0.98 mean). Gene coverage was high but not exhaustive (0.85), with missed genes primarily in rare cell states and non-transcriptomic modalities. Table 3: Quantitative comparison between ELISA reports and reference single-cell studies. Scores reflect agreement between ELISA-generated biological interpretations and findings described in the main text of the corresponding publications. Gene coverage, pathway alignment, interaction recovery, and proportion consistency were computed programmatically; theme coverage was assessed independently by a domain expert as described in Section D. Dataset Gene Path. Int. Prop. Theme Comp. Cov. Align. Rec. Cons. Cov. score CF airway 0.80 1.0 0.20 Yes 0.85 0.82 Neuroblastoma 0.84 1.00 1.00 Yes 0.88 0.95 ICB Multi-Cancer 0.77 1.00 1.00 Yes 0.91 0.93 Breast Atlas 0.96 1.00 0.80 Yes 0.89 0.96 Fetal Lung AT2 1.00 1.00 0.40 Yes 0.88 0.91 Brain Atlas 0.85 1.00 1.00 Yes 0.90 0.95 Mean 0.85 1.00 0.77 6/6 0.88 0.90 Gene Cov.: gene coverage; Path. Align.: pathway alignment; Int. Rec.: interaction recovery; Prop. Cons.: proportion consistency; Theme Cov.: theme coverage; Biol. Eval.: independent domain expert evaluation score (0–1). Comp. score: unweighted mean of all preceding metrics (Prop. Cons. coded as 1.0 when consistent). 3.3 Discovery of candidate regulatory signals across tissue atlases Beyond reproducing the key biological signals described in the original studies, ELISA’s discovery mode highlighted several candidate regulatory signals that were not explicitly emphasized in the reference publications (Table 4). These signals represent transcriptome-derived hypotheses emerging from systematic cross-cell-type analysis of single-cell atlases. In the cystic fibrosis airway dataset, ELISA identified enrichment of the CALR–LRP1 phagocytic signaling axis within the macrophage populations. Calreticulin–LRP1 signaling has previously been implicated in apoptotic cell recognition and clearance, suggesting that altered macrophage-mediated phagocytosis may contribute to the inflammatory microenvironment characteristic of the CF lung. Within the fetal lung atlas, ELISA detected increased expression of the ubiquitin-associated regulators TRIM21 and TRIM65 in alveolar type I (AT2) cells alongside the known E3 ubiquitin ligase ITCH. Although ITCH has been implicated in regulating surfactant protein C (SFTPC) maturation, the enrichment of these additional TRIM-family ligases suggests that cooperative ubiquitin-dependent pathways may participate in surfactant protein processing and AT2 cell proteostasis. In the healthy breast tissue atlas, ELISA highlighted strong enrichment of the Kelch-family gene KLHL29 within basal–myoepithelial cell populations. Although not emphasized in the original study, this pattern suggests that KLHL29 may represent a previously unrecognized marker or structural regulator of basal epithelial identity. Analysis of the immune checkpoint blockade dataset revealed elevated expression of macrophage markers CD163 and MRC1 within tumor-associated macrophage populations following therapy. This expression pattern is consistent with an M2-like macrophage polarization state, potentially reflecting remodeling of the immune microenvironment in response to checkpoint blockade treatment. In the neuroblastoma dataset, ELISA identified differential usage of AP-1 transcription factors across treatment states. Specifically, JUND expression was enriched at diagnosis, whereas JUNB and FOS were more strongly expressed after therapy. This shift suggests dynamic remodeling of AP-1–mediated stress-response programs during therapy-induced tumor state transitions. Finally, analysis of the developing brain atlas revealed a shared transcription factor module composed of TFAP2B, LHX5, and LHX1 across Purkinje neurons and midbrain GABAergic neuronal populations. This co-occurring regulatory signature suggests the existence of a conserved transcriptional program underlying inhibitory neuron specification in anatomically distinct brain regions. Taken together, these findings illustrate how ELISA can surface candidate regulatory programs across diverse single-cell atlases. While these signals should be interpreted as transcriptome-derived hypotheses, they provide potential starting points for targeted functional validation. These signals should be interpreted as transcriptome-derived hypotheses and may serve as the starting points for targeted experimental validation. Table 4: Candidate regulatory signals identified by ELISA across six reference single-cell atlases. These signals were not explicitly highlighted in the original publications and represent transcriptome-derived hypotheses generated through ELISA’s discovery mode. Dataset Primary finding in reference study ELISA candidate discovery / hypothesis CF airway Altered immune–structural cell crosstalk and inflammatory signaling in cystic fibrosis airway tissue Detection of the macrophage CALR–LRP1 signaling axis, suggesting altered apoptotic cell recognition or phagocytic clearance pathways contributing to the CF lung inflammatory microenvironment Breast Atlas Ancestry-associated epithelial lineage variation and luminal progenitor states in healthy breast tissue Enrichment of the Kelch-family gene KLHL29 in basal–myoepithelial cells, suggesting a potential additional marker or regulator of basal epithelial structural identity Fetal Lung AT2 ITCH-mediated ubiquitin-dependent regulation of surfactant protein C (SFTPC) maturation in alveolar type I cells Upregulation of TRIM21 and TRIM65 in mature AT2 cells, suggesting additional TRIM-family ubiquitin ligases may participate in surfactant protein processing and proteostasis ICB Multi-Cancer Tumor and immune transcriptional responses associated with immune checkpoint blockade therapy Elevated CD163 and MRC1 expression in tumor-associated macrophages, consistent with an M2-like polarization state potentially associated with therapy-induced immune remodeling Neuroblastoma Therapy-induced transcriptional rewiring of tumor cell states and microenvironment interactions Differential AP-1 transcription factor usage, with JUND enriched at diagnosis and JUNB/FOS enriched post-treatment, suggesting stress-response remodeling during therapy-induced state transitions Brain Development Atlas Chromatin accessibility programs defining early neuronal lineage specification Shared transcription factor module (TFAP2B, LHX5, LHX1) across Purkinje neurons and midbrain GABAergic populations, suggesting a conserved regulatory program for inhibitory neuron specification 4 Discussion In this study we introduced ELISA, an agent-based framework that unifies semantic language retrieval, gene marker scoring, and LLM-mediated biological interpretation for interactive single-cell atlas interrogation. Systematic evaluation across six diverse datasets demonstrated that ELISA significantly outperforms CellWhisperer in cell type retrieval (combined permutation test, p<0.001p<0.001) and faithfully replicated published biological findings with a mean composite score of 0.90. Here we discuss the implications of these results for the design of retrieval systems in single-cell genomics, the limitations of contrastive multimodal alignment, and broader role of agentic AI in biological discovery. Contrastive alignment produces text-dominated embeddings. A central finding of this study is the striking asymmetry in CellWhisperer performance across query types. In ontology queries natural language descriptions of cell types and biological processes CellWhisperer performed competitively with ELISA’s Semantic pipeline, achieving mean ontology MRR values within 0.15 of ELISA Union across most datasets (Table 2, Fig. 2). This is expected: CellWhisperer’s CLIP-style contrastive training aligns transcriptome embeddings with textual descriptions, and ontology queries directly exploit this text-side alignment. However, on expression queries where users provide gene signatures rather than natural language CellWhisperer’s performance collapsed, with expression MRR averaging 0.397 compared to 0.806 for ELISA Union, a twofold deficit (Cohen’s d = 5.98). This asymmetry reveals a fundamental limitation of contrastive multimodal alignment for single-cell retrieval. CLIP-style training optimizes for text transcriptome correspondence by learning a shared embedding space where matching text cell pairs are close and mismatched pairs are distant. The resulting embeddings are, by construction, shaped primarily by the textual supervision signal: the model learns to position transcriptomes near their text descriptions, but the fine-grained transcriptomic structure which genes are differentially expressed, at what fold changes, in what fraction of cells is compressed into a representation optimized for text matching rather than gene-level querying. When a user submits a gene signature such as “MARCO FABP4 APOC1 C1QB C1QC MSR1”, these gene names are processed as text tokens rather than matched against differential expression statistics, resulting in a retrieval signal that is weaker and less specific than direct marker scoring. This observation has broader implications than those of ELISA and CellWhisperer. As foundation models for single-cell biology increasingly adopt contrastive or multimodal pretraining objectives, our results caution that text-supervised alignment may inadvertently sacrifice expression-level specificity. The dual-query evaluation framework introduced here requiring systems to perform well on both ontology and expression queries provides a principled diagnostic for detecting such modality imbalances. Explicit routing outperformed implicit fusion. ELISA’s architectural response to this challenge was to avoid implicit embedding fusion altogether. Rather than learning a single shared space that must simultaneously serve text and expression queries, ELISA maintains two separate representation spaces BioBERT semantic embeddings and gene-level DE statistics, and routes queries to the appropriate pipeline through explicit classification. The query classifier, operating on simple token-level heuristics (gene name patterns, known vocabulary membership, natural language indicators), achieved reliable routing across all six datasets without requiring any training data. This design choice is supported empirically by complementarity analysis: the semantic pipeline won ontology queries, while the gene marker scoring pipeline won on expression queries in every dataset, with minimal overlap in their error profiles. The additive union strategy, which selects the better-performing modality as the primary and appends unique results from the secondary, captures the strengths of both pipelines without the compression artifacts inherent in learned fusion. The result was a system that matched or exceeded the best single modality on every metric across every dataset a property that no implicit fusion method could guarantee. Analytical modules bridge retrieval and interpretation. A distinguishing feature of ELISA relative to prior retrieval-focused systems is the integration of downstream analytical modules pathway scoring, ligand receptor interaction prediction, comparative analysis, and proportion estimation that operate directly on the same embedded data representation used for retrieval. This design enables a seamless transition from “which cell types are relevant?” (retrieval) to “what biological programs are active in these cell types?” (analysis) to “what does this mean biologically?” (LLM interpretation), all within a single interactive session. The near-perfect pathway alignment (mean 0.98) and theme coverage (mean 0.88) scores across all six datasets demonstrated that this integrated architecture effectively connects gene-level evidence to biological programs. In contrast, systems that perform retrieval alone including CellWhisperer, require users to manually extract gene lists from retrieved clusters and perform separate pathway and interaction analyses using external tools, introducing friction and potential inconsistencies. The interaction recovery metric (mean 0.77) was the most variable across datasets, with perfect recovery in neuroblastoma, ICB, and brain datasets but lower recovery in cystic fibrosis (0.40) and fetal lung (0.40). These lower scores primarily reflect the inherent difficulty of predicting specific ligand–receptor pairs from expression data when the ligand or receptor is expressed at moderate levels across multiple cell types, making the interaction statistically detectable but not highly ranked. Future work could address this by incorporating spatial proximity information or protein-level data to improve the interaction specificity. LLM grounding and the discovery hallucination boundary. ELISA’s discovery mode, which prompts the LLM to separate dataset evidence from established biology and to propose hypotheses with probabilistic language, generated biologically plausible candidate signals in all six datasets (Table 4). These include the CALR, LRP1 phagocytic axis in cystic fibrosis macrophages, differential AP-1 family member usage in neuroblastoma therapy response, and a shared TFAP2B/LHX5/LHX1 regulatory module across inhibitory neuron subtypes in the developing brain. While these hypotheses require experimental validation, they illustrate the potential of grounded LLM reasoning to surface non-obvious patterns in complex datasets. However, a strict separation between data-derived evidence and LLM-generated interpretation is essential. Without it, the LLM would inevitably introduce plausible-sounding but unsupported claims a risk that is particularly acute in biology, where prior knowledge is vast and contextual. ELISA’s prompt architecture addresses this by providing the LLM only with retrieved cluster data, gene statistics, and pathway results as context, with explicit instructions to avoid external literature and causal claims. Future directions. Several extensions can strengthen and broaden ELISA’s capabilities. Integration with spatial transcriptomics data would enable spatially resolved interaction prediction, addressing the current limitation of expression-only interaction scoring. Incorporation of trajectory inference methods would allow ELISA to reason about dynamic processes such as differentiation and therapy response. Expansion of the retrieval engine to support cross-dataset queries comparing cell types across tissues or disease states would enable the kind of meta-analytical reasoning that was outside ELISA’s scope in the ICB dataset evaluation. Finally, replacing the fixed LLM with a fine-tuned model trained on single-cell biological reasoning can improve the specificity and depth of automated interpretations. 5 Conclusion. ELISA demonstrates that explicit modality routing, rather than implicit contrastive fusion, provides a more robust foundation for multimodal single-cell retrieval. By maintaining separate semantic and expression pipelines and combining them through adaptive query classification, ELISA achieves consistently superior performance across both natural language and gene-signature queries. The integration of analytical modules and grounded LLM interpretation within a single interactive framework bridges the gap between data exploration and biological discovery, enabling researchers to move from raw atlas data to structured biological hypotheses within a single session. As single-cell datasets continue to grow in scale and complexity, systems that combine the complementary strengths of language models and expression-aware retrieval will be essential for translating transcriptomic data into biological understanding. 6 Conflicts of interest The authors declare that they have no competing interests. 7 Funding Computational resources are furnished by Dr. Antonio Orvieto, PI at Max Planck Institute for Intelligent Systems. The rest of the work is self-financed 8 Data availability All six single-cell RNA sequencing datasets used in this study are publicly available through CZ CELLxGENE Discover (https://cellxgene.cziscience.com): cystic fibrosis airways Berg et al. (2025), high-risk neuroblastoma Yu et al. (2025), immune checkpoint blockade multi-cancer Gondal et al. (2025), fetal lung AT2 organoids Lim et al. (2025), healthy breast tissue Bhat-Nakshatri et al. (2024), and first-trimester brain Mannens et al. (2025). Datasets were downloaded in AnnData (.h5ad) format. Source code available at https://github.com/omaruno/ELISA-An-AI-Agent-for-Expression-Grounded-Discovery-in-Single-Cell-Genomics. 9 Author contributions statement Omar Coser performed everything present in this manuscript. A preliminary version of this work appeared at the ICLR 2025 Workshop on Generative AI for Genomics, and MLGenX Coser (2026a, b). If you intend to use the script of ELISA cite this work. 10 Acknowledgments The authors acknowledge Dr. Antonio Orvieto for allowing to use computational resources of his Lab. References J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al. (2023) Gpt-4 technical report. arXiv preprint arXiv:2303.08774. Cited by: §2.5. Anthropic (2024) The claude 3 model family: opus, sonnet, haiku. External Links: Link Cited by: §2.5. M. Berg, L. Krabbendam, E. K. van der Ploeg, M. van Nimwegen, T. van der Veer, M. Banchero, O. A. Carpaij, R. Hoogenboezem, M. van den Berge, E. Bindels, et al. (2025) Evidence for altered immune-structural cell crosstalk in cystic fibrosis revealed by single cell transcriptomics. Journal of Cystic Fibrosis. Cited by: §E.1, Table 5, §F.6.1, Appendix G, §2.1, §3.2, §8. P. Bhat-Nakshatri, H. Gao, A. S. Khatpe, A. K. Adebayo, P. C. McGuire, C. Erdogan, D. Chen, G. Jiang, F. New, R. German, et al. (2024) Single-nucleus chromatin accessibility and transcriptomic map of breast tissues of women of diverse genetic ancestry. Nature medicine 30 (12), p. 3482–3494. Cited by: §E.1, Table 5, Appendix H, §2.1, §3.2, §8. R. Browaeys, W. Saelens, and Y. Saeys (2020) NicheNet: modeling intercellular communication by linking ligands to target genes. Nature methods 17 (2), p. 159–162. Cited by: §F.3.1, §2.4. G. Casella and R. Berger (2024) Statistical inference. Chapman and Hall/CRC. Cited by: §3.1. O. Cinquin (2024) ChIP-gpt: a managed large language model for robust data extraction from biomedical database records. Briefings in bioinformatics 25 (2), p. bbad535. Cited by: §1. J. Cohen (2013) Statistical power analysis for the behavioral sciences. routledge. Cited by: §3.1. O. Coser (2026a) ELISA: a generative ai agent for expression-grounded discovery in single-cell genomics. In ICLR 2026 Workshop on Generative AI for Genomics, Cited by: §9. O. Coser (2026b) ELISA: an interpretable hybrid agent for expression-grounded discovery in single-cell genomics. In ICLR 2026 Workshop on Machine Learning for Genomics Explorations, Cited by: §9. H. Cui, C. Wang, H. Maan, K. Pang, F. Luo, N. Duan, and B. Wang (2024) ScGPT: toward building a foundation model for single-cell multi-omics using generative ai. Nature methods 21 (8), p. 1470–1480. Cited by: Table 1, §1. M. Efremova, M. Vento-Tormo, S. A. Teichmann, and R. Vento-Tormo (2020) CellPhoneDB: inferring cell–cell communication from combined expression of multi-subunit ligand–receptor complexes. Nature protocols 15 (4), p. 1484–1506. Cited by: §F.3.1, §2.4. S. Gao, A. Fang, Y. Huang, V. Giunchiglia, A. Noori, J. R. Schwarz, Y. Ektefaie, J. Kondic, and M. Zitnik (2024) Empowering biomedical discovery with ai agents. Cell 187 (22), p. 6125–6151. Cited by: §1. B. T. Garcia, L. Westerfield, P. Yelemali, N. Gogate, E. A. Rivera-Munoz, H. Du, M. Dawood, A. Jolly, J. R. Lupski, and J. E. Posey (2025) Improving automated deep phenotyping through large language models using retrieval-augmented generation. Genome Medicine 17 (1), p. 91. Cited by: §1. M. N. Gondal, M. Cieslik, and A. M. Chinnaiyan (2025) Integrated cancer cell-specific single-cell rna-seq datasets of immune checkpoint blockade-treated patients. Scientific Data 12 (1), p. 139. Cited by: Appendix J, §E.1, Table 5, §2.1, §3.2, §8. J. Gottweis, W. Weng, A. Daryin, T. Tu, A. Palepu, P. Sirkovic, A. Myaskovsky, F. Weissenberger, K. Rong, R. Tanno, et al. (2025) Towards an ai co-scientist. arXiv preprint arXiv:2502.18864. Cited by: Table 1, §1. A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughan, et al. (2024) The llama 3 herd of models. arXiv preprint arXiv:2407.21783. Cited by: §F.5, §2.5. K. Huang, S. Zhang, H. Wang, Y. Qu, Y. Lu, Y. Roohani, R. Li, L. Qiu, G. Li, J. Zhang, et al. (2025) Biomni: a general-purpose biomedical ai agent. biorxiv. Cited by: Table 1, §1. S. Jin, M. V. Plikus, and Q. Nie (2025) CellChat for systematic analysis of cell–cell communication from single-cell transcriptomics. Nature protocols 20 (1), p. 180–219. Cited by: §F.3.1, Table 1, §1, §2.4. J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang (2020) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36 (4), p. 1234–1240. Cited by: §F.2.3, §F.8, §2.3. P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W. Yih, T. Rocktäschel, et al. (2020) Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems 33, p. 9459–9474. Cited by: §1. C. Liang, P. Ye, H. Yan, P. Zheng, J. Sun, Y. Wang, Y. Li, Y. Ren, Y. Jiang, J. Xiang, et al. (2025) ScWGBS-gpt: a foundation model for capturing long-range cpg dependencies in single-cell whole-genome bisulfite sequencing to enhance epigenetic analysis. bioRxiv, p. 2025–02. Cited by: §1. K. Lim, E. N. Rutherford, L. Delpiano, P. He, W. Lin, D. Sun, D. J. Van den Boomen, J. R. Edgar, J. H. Bang, A. Predeus, et al. (2025) A novel human fetal lung-derived alveolar organoid model reveals mechanisms of surfactant protein c maturation relevant to interstitial lung disease. The EMBO Journal 44 (3), p. 639. Cited by: §E.1, Table 5, §H.2, §2.1, §3.2, §8. M. D. Luecken and F. J. Theis (2019) Current best practices in single-cell rna-seq analysis: a tutorial. Molecular systems biology 15 (6), p. e8746. Cited by: §1. L. v. d. Maaten and G. Hinton (2008) Visualizing data using t-sne. Journal of machine learning research 9 (Nov), p. 2579–2605. Cited by: §F.4. C. C. Mannens, L. Hu, P. Lönnerberg, M. Schipper, C. C. Reagor, X. Li, X. He, R. A. Barker, E. Sundström, D. Posthuma, et al. (2025) Chromatin accessibility during human first-trimester neurodevelopment. Nature 647 (8088), p. 179–186. Cited by: Appendix K, §E.1, Table 5, §2.1, §3.2, §8. L. McInnes, J. Healy, and J. Melville (2018) Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426. Cited by: Appendix A, §F.4, §F.8. E. Niyonkuru, J. H. Caufield, L. C. Carmody, M. A. Gargano, S. Toro, P. L. Whetzel, H. Blau, M. Soto Gomez, E. Casiraghi, L. Chimirri, et al. (2025) Leveraging generative ai to assist biocuration of medical actions for rare disease. Bioinformatics advances 5 (1), p. vbaf141. Cited by: §1. J. Pickard, R. Prakash, M. A. Choi, N. Oliven, C. Stansbury, J. Cwycyshyn, N. Galioto, A. Gorodetsky, A. Velasquez, and I. Rajapakse (2025) Automatic biomarker discovery and enrichment with brad. Bioinformatics 41 (5), p. btaf159. Cited by: Table 1, §1. Y. Qu, K. Huang, M. Yin, K. Zhan, D. Liu, D. Yin, H. C. Cousins, W. A. Johnson, X. Wang, M. Shah, et al. (2025) CRISPR-gpt for agentic automation of gene-editing experiments. Nature Biomedical Engineering, p. 1–14. Cited by: Table 1, §1. N. Reimers and I. Gurevych (2019) Sentence-bert: sentence embeddings using siamese bert-networks. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), p. 3982–3992. Cited by: Appendix A, §F.8. M. Schaefer, P. Peneder, D. Malzl, S. D. Lombardo, M. Peycheva, J. Burton, A. Hakobyan, V. Sharma, T. Krausgruber, C. Sin, et al. (2025) Multimodal learning enables chat-based exploration of single-cell data. Nature Biotechnology, p. 1–11. Cited by: Table 1, §1, §2.6, §3.1. K. Singhal, S. Azizi, T. Tu, S. S. Mahdavi, J. Wei, H. W. Chung, N. Scales, A. Tanwani, H. Cole-Lewis, S. Pfohl, et al. (2023) Large language models encode clinical knowledge. Nature 620 (7972), p. 172–180. Cited by: §1. K. Swanson, W. Wu, N. L. Bulaong, J. E. Pak, and J. Zou (2025) The virtual lab of ai agents designs new sars-cov-2 nanobodies. Nature 646 (8085), p. 716–723. Cited by: Table 1, §1. F. Tang, C. Barbacioru, Y. Wang, E. Nordman, C. Lee, N. Xu, X. Wang, J. Bodeau, B. B. Tuch, A. Siddiqui, et al. (2009) MRNA-seq whole-transcriptome analysis of a single cell. Nature methods 6 (5), p. 377–382. Cited by: §1. G. Team, R. Anil, S. Borgeaud, J. Alayrac, J. Yu, R. Soricut, J. Schalkwyk, A. M. Dai, A. Hauth, K. Millican, et al. (2023) Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805. Cited by: §2.5. Z. Wang, Q. Jin, C. Wei, S. Tian, P. Lai, Q. Zhu, C. Day, C. Ross, R. Leaman, and Z. Lu (2025) GeneAgent: self-verification language agent for gene-set analysis using domain databases. Nature Methods 22 (8), p. 1677–1685. Cited by: Table 1, §1. F. A. Wolf, P. Angerer, and F. J. Theis (2018) SCANPY: large-scale single-cell gene expression data analysis. Genome biology 19 (1), p. 15. Cited by: Appendix A, §F.8. Y. Xiao, J. Liu, Y. Zheng, X. Xie, J. Hao, M. Li, R. Wang, F. Ni, Y. Li, J. Luo, et al. (2024) Cellagent: an llm-driven multi-agent framework for automated single-cell data analysis. arXiv preprint arXiv:2407.09811. Cited by: Table 1, §1. W. Yu, R. Biyik-Sit, Y. Uzun, C. Chen, A. Thadi, J. H. Sussman, M. Pang, C. Wu, L. D. Grossmann, P. Gao, et al. (2025) Longitudinal single-cell multiomic atlas of high-risk neuroblastoma reveals chemotherapy-induced tumor microenvironment rewiring. Nature Genetics 57 (5), p. 1142–1154. Cited by: §E.1, Table 5, Appendix I, §2.1, §3.2, §8. H. Zhang, X. Zhang, Y. Lin, M. Wang, Y. Lai, Y. Wang, L. Yu, Y. Xu, R. Cheng, and E. Szczerbicki (2024) Tokensome: towards a genetic vision-language gpt for explainable and cognitive karyotyping. arXiv preprint arXiv:2403.11073. Cited by: §1. J. Zhou, B. Zhang, X. Chen, et al. (2023) Automated bioinformatics analysis via autoba. arxiv. Cited by: Table 1, §1. Appendix A Software and reproducibility ELISA was implemented in Python 3.10+ using PyTorch, sentence-transformersReimers and Gurevych [2019], scanpyWolf et al. [2018], scikit-learn, and UMAP-learnMcInnes et al. [2018]. All analyses were performed on a standard workstation without GPU requirements for retrieval and analysis. Source code, benchmark queries, and evaluation scripts are available at [repository URL]. Use of an LLM (LLaMA-3.1-8B) for automated interpretation is documented in accordance with journal policy. Topical subheadings are allowed. Authors must ensure that their Methods section includes adequate experimental and characterization data necessary for others in the field to reproduce their work. All experiment has been performed on a GPU A100 with 80 gb of RAM Appendix B Replication evaluation metrics Table 3 reports six metrics quantifying the agreement between ELISA-generated reports and the findings of the corresponding reference publications. Each metric is defined below. Gene coverage. Gene coverage measures the fraction of key genes highlighted in the reference publication in which ELISA was identified in the correct cell type context. For each dataset, the evaluator compiled a set of key genes from the paper’s main text, figures, and supplementary tables (e.g., differentially expressed genes, cell type markers and signaling molecules). A gene was scored as “recovered” if it appeared in ELISA’s output for a biologically appropriate cluster. The gene coverage is computed as: Gene coverage=|key genes recovered by ELISA||key genes reported in reference|Gene coverage= |key genes recovered by ELISA||key genes reported in reference| (1) Pathway alignment. Pathway alignment quantifies whether ELISA’s pathway scoring module detects the biological programs reported in the reference study. For each dataset, the evaluator identified the pathways discussed in this paper (e.g., IFN-γ signaling, mTOR and ErbB). A pathway was scored as “aligned” if ELISA’s module returned it with a positive score in at least one biologically appropriate cluster. Pathway alignment is computed as: Pathway alignment=|pathways found by ELISA||pathways reported in reference|Pathway alignment= |pathways found by ELISA||pathways reported in reference| (2) Interaction recovery. Interaction recovery assesses whether ELISA’s ligand–receptor prediction module detected the cell–cell communication axes described in the reference publication. For each dataset, the evaluator compiled ground truth interactions from the paper (e.g., HB-EGF/ERBB4 between macrophages and neuroblasts, HLA-E/NKG2A between epithelial and CD8+ T cells). Recovery was scored at the pair level: a ligand–receptor pair was counted as “recovered” if ELISA detected it with a non-zero score, regardless of whether the source–target cell type assignment exactly matched: Interaction recovery=|LR pairs detected by ELISA||LR pairs reported in reference|Interaction recovery= |LR pairs detected by ELISA||LR pairs reported in reference| (3) Proportion consistency. Proportion consistency is a binary (Yes/No) criterion that evaluates whether ELISA’s proportion analysis correctly identified the direction of cell type composition changes for datasets with condition contrasts. For each cell type reported in the reference as increased or decreased in the disease or treatment condition, the evaluator checked whether ELISA’s fold change pointed in the same direction. A dataset received “Yes” if the majority of reported changes were directionally consistent. Theme coverage. Theme coverage captures whether an ELISA’s interpretive summary reproduced the major biological conclusions of the reference study. Unlike gene and pathway-level metrics, that assess individual molecular entities, theme coverage evaluates high-level biological narratives. For each dataset, the evaluator identified the main themes from the paper’s abstract and results (e.g., “aberrant adaptive immunity with upregulated IFN-γ signaling” for the CF dataset; “therapy-induced macrophage polarization toward immunosuppressive phenotypes” for the neuroblastoma dataset). A theme was scored as “covered” if ELISA’s LLM-generated interpretation mentioned and correctly described the corresponding biological finding: Theme coverage=|themes captured by ELISA||major themes in reference|Theme coverage= |themes captured by ELISA||major themes in reference| (4) Biological evaluation score. The biological Evaluation Score provides an independent assessment of overall report quality. Composite score. The composite score summarizes overall replication performance as the unweighted mean of the four continuous metrics: Composite=Gene cov.+Path. align.+Int. rec.+Theme cov.4Composite= Gene cov.+Path.\ align.+Int.\ rec.+Theme cov.4 (5) Proportion consistency is excluded from the composite average because it is binary rather than continuous, but is reported separately as a quality check. Appendix C Retrieval and analytical evaluation metrics To ensure reproducible and interpretable evaluation of ELISA’s retrieval and analytical modules, we defined the full set of metrics used throughout the benchmark (see also the benchmark scripts in the supplementary code repository for complete implementations). Retrieval metrics quantify how effectively each mode recovers the expected cell types for a given query, while analytical metrics assess the accuracy of ELISA’s downstream whereas biological interpretation modules interaction discovery, pathway enrichment, proportion analysis, and comparative differential expression. An overview of the six evaluation datasets and their properties is provided in Table 5. C.1 Retrieval metrics Each radar plot in Fig. 2 displays six axes corresponding to three retrieval metrics evaluated separately on the two query categories (ontology and expression). The three metrics are: 1. Cluster Recall@k (two axes per plot: Ont R@k, Exp R@k). This metric measures the fraction of expected cell types that appear within the top-k positions of the ranked retrieval list. The value of k is adapted to each dataset’s number of clusters: R@5 and R@10 for large-cluster datasets (CF Airways with 30 clusters, ICB Multi-Cancer with 31, First-Trimester Brain with 28), R@1 and R@2 for small-cluster datasets (Breast Tissue Atlas with 8 clusters, fdAT2 Organoids with 5, High-Risk Neuroblastoma with 11). A Recall@k of 1.0 indicates that all expected clusters were retrieved within the top-k; a value of 0.0 indicates that none were found. Two Recall cutoffs are shown per plot to capture both stringent (lower k) and permissive (higher k) retrieval accuracy. 2. Mean Reciprocal Rank (two axes: Ont MRR, Exp MRR). MRR quantifies the rank position of the first correctly retrieved cluster. An MRR of 1.0 means the top-ranked result is relevant; 0.5 means the first relevant result appears at rank 2; 0.33 at rank 3, and so on. MRR captures top-of-list precision, which is critical for interactive use where researchers typically inspect only the first few results. Together, the six axes capture complementary aspects of retrieval quality: Recall@k measures coverage (how many expected clusters are found), whereas MRR measures precision at rank 1 (how quickly the first relevant cluster appears). Evaluating both metrics on ontology queries (natural-language, concept-level) and expression queries (gene-signature-based) separately reveals modality-specific strengths: a system may excel at one query type while underperforming the other. Thus, the radar footprint thus provides an at-a-glance summary of each retrieval mode’s overall coverage, precision, and balance across query types. A larger, more symmetric footprint indicates stronger and more balanced retrieval performance. Four retrieval modes compared are: CellWhisperer (pink dashed line), which uses contrastive text transcriptome CLIP embeddings; ELISA Semantic (blue), which performs BioBERT-based cosine similarity matching against cluster descriptions enriched with GO and Reactome terms; ELISA scGPT (orange), which scores clusters by matching query genes against per-cluster differential expression profiles; and ELISA Union (green), which adaptively fuses both ELISA pipelines by routing each query to the better-performing modality and appending unique results from the secondary pipeline. C.1.1 Statistical testing To assess whether performance differences between retrieval modes are statistically significant across datasets, we employed one-sided paired t-tests (with the alternative hypothesis that ELISA Union outperforms CellWhisperer) and reported Cohen’s d as the paired effect size. Because different datasets use different Recall@k cutoffs, individual metric comparisons have varying sample sizes (n=3n=3 to n=6n=6 datasets). To obtain a single omnibus test, we performed a combined permutation test: the sign of the difference (Union minus CellWhisperer) was computed for every metric dataset pair simultaneously, and dataset labels were permuted 50,000 times to construct the null distribution of the aggregate advantage. All p-values and effect sizes are reported in Table 2. Appendix D Human evaluation protocol To obtain the biological evaluation scores shown in Table 3, a domain expert with training in molecular biology and single-cell genomics independently reviewed each ELISA-generated report against the corresponding reference publication. The evaluation followed a structured five-step protocol: 1. Gene verification. Each gene reported by ELISA as differentially expressed or as a marker of a specific cell type was cross-checked against the main text, figures, and supplementary tables of the reference publications. A gene was scored as “recovered” if it appeared in the paper’s reported DE gene lists, marker panels, or figure annotations for the corresponding cell type. The gene coverage score was computed as the fraction of paper-reported key genes that ELISA identified in the correct cluster context. 2. Pathway assessment. Each pathway identified by ELISA’s pathway scoring module (e.g., “IFN-gamma signaling,” “mTOR signaling”) was compared against pathway-level findings described in the reference study. A pathway was scored as “aligned” if the reference publication reported activation or enrichment of that pathway in a consistent cell type context. Pathway alignment was computed as the fraction of paper-reported pathways that ELISA correctly detected as active (score >0>0) in at least one biologically appropriate cluster. 3. Interaction validation. Each ligand-receptor interaction predicted by ELISA was verified against the cell-cell communication analyses reported in a previous pubblication. Validation was performed at two levels: (i) whether the ligand–receptor pair itself was reported in the paper, regardless of the cell type context (LR recovery rate), and (i) whether both the pair and the source target cell type assignment matched the paper’s findings (full match rate). 4. Proportion and condition consistency. For datasets with condition contrasts (e.g., CF vs. healthy), the evaluator verified whether ELISA’s proportion analysis correctly identified the direction of cell type composition changes reported in the reference study. Each cell type with a known expected change (increased or decreased in the disease/treatment condition) was checked for directional agreement. 5. Theme coverage and hypothesis assessment. The evaluator assessed whether ELISA’s interpretive summaries captured the major biological themes and conclusions of the reference study (e.g., “aberrant adaptive immunity with upregulated IFN-γ signaling” for the CF dataset). Additionally, candidate hypotheses generated by ELISA’s discovery mode were evaluated for biological plausibility through targeted literature review: the evaluator searched PubMed for prior evidence supporting or contradicting each proposed mechanism (e.g., CALR–LRP1 in macrophage phagocytosis, TRIM-family ligases in surfactant processing). Hypotheses were classified as “plausible” if supporting literature existed, “novel” if no prior reports were found but the mechanism was biologically coherent, or “unsupported” if contradicted by existing evidence. The composite score for each dataset was computed as the unweighted mean of gene coverage, pathway alignment, interaction recovery, and theme coverage, with proportion consistency treated as a binary (pass/fail) criterion. Appendix E Materials E.1 Datasets ELISA was validated on six publicly available scRNA-seq datasets deposited in the CZ CELLxGENE Discover portal, spanning five distinct tissues, four disease contexts, and both case–control and longitudinal experimental designs (Table 5). Datasets were selected to cover a broad range of biological complexity, cell type diversity, and analytical challenges, including inflammatory lung disease, pediatric and adult cancers, drug-resistant epilepsy, immune checkpoint therapy response, and normal tissue homeostasis. Dataset 1 (D1): cystic fibrosis bronchial epithelium. Berg et al. Berg et al. [2025] generated the first single-cell transcriptome atlas of the cystic fibrosis (CF) lung comprising both structural and immune cells. Droplet-based scRNA-seq was performed on bronchial wall biopsies from patients with CF (n=8n=8) and healthy controls (n=19n=19) and integrated using the fastMNN batch correction framework with the Human Lung Cell Atlas as reference. The dataset encompasses approximately 96,000 cells across 30 annotated cell types, including epithelial (basal, ciliated, secretory, goblet, ionocyte), immune (CD8+ T cells, CD4+ T cells, B cells, plasma cells, macrophages, monocytes, NK cells, dendritic cells, mast cells), stromal (fibroblasts, pericytes), and endothelial populations. Key findings include dysregulated basal cell function, aberrant adaptive immunity with upregulated IFN-γ signaling, a novel HLA-E/NKG2A immune checkpoint axis, and altered structural–immune cell crosstalk persisting despite CFTR modulator therapy. Dataset 2 (D2): High-risk neuroblastoma. Yu et al. Yu et al. [2025] longitudinally profiled 22 patients with high-risk neuroblastoma before and after induction chemotherapy using single-nucleus RNA and ATAC sequencing combined with whole-genome sequencing. The dataset captures profound therapy-induced shifts in tumor and immune cell subpopulations, identifying enhancer-driven transcriptional regulators of neoplastic states (adrenergic, mesenchymal, proliferative) and macrophage polarization toward pro-angiogenic, immunosuppressive phenotypes. A central finding was the validation of the HB-EGF/ERBB4 paracrine signaling axis between macrophages and neoplastic cells promoting tumor growth through ERK signaling induction. Dataset 3 (D3): Immune checkpoint blockade across cancers. Gondal et al. Gondal et al. [2025] compiled and standardized eight scRNA-seq studies from nine cancer types encompassing 223 patients and over 350,000 cancer cells treated with immune checkpoint blockade (ICB). Cancer types include melanoma, basal cell carcinoma, melanoma brain metastases, triple-negative/HER2-positive/ER-positive breast cancer, clear cell renal carcinoma, hepatocellular carcinoma, and intrahepatic cholangiocarcinoma. The integrated resource enables cross-cancer investigation of cancer cell-specific ICB responses, with annotations of treatment status, response outcome, and malignant vs. non-malignant cell identity. Dataset 4 (D4): Fetal lung AT2 organoids. Lim et al. Lim et al. [2025] developed expandable alveolar type 2 (AT2) organoids derived from human fetal lungs at 16–22 post-conception weeks (pcw). Single-cell RNA sequencing of four independent organoid lines (passage 11–16) yielded approx 9.6k cells across eight annotated cell types, including AT2-like, cycling AT2-like, CXCL+ AT2-like, differentiating basal-like, differentiating pulmonary neuroendocrine, intermediate, neuroendocrine progenitor, and ciliated-like populations. The organoids express mature surfactant proteins (SFTPC, SFTPB, SFTPA1) and markers of surfactant processing (LAMP3, ABCA3, NAPSA), and can differentiate into AT1-like cells. A forward genetic screen identified the E3 ligase ITCH as a key effector of SFTPC maturation, with its depletion phenocopying the pathological SFTPC-I73T variant associated with interstitial lung disease. Dataset 5 (D5): Healthy breast tissue. Bhat-Nakshatri et al. Bhat-Nakshatri et al. [2024] constructed a single-cell atlas of healthy breast tissues collected from volunteer donors from the Komen Normal Tissue Bank. Using a rapid procurement and processing protocol, the study profiled breast epithelial and stromal cells, identifying 13 epithelial cell clusters with 23 subclusters exhibiting distinct gene expression signatures. Overlap analysis of subcluster-enriched signatures with breast tumor transcriptomes revealed dominant representation of differentiated luminal subcluster signatures in breast cancers, providing insights into putative cells of origin. Dataset 6 (D6): First-trimester human brain neurodevelopment. Mannens et al. Mannens et al. [2025] generated a high-resolution multiomic atlas of chromatin accessibility and gene expression across the entire developing human brain during the first trimester (6-13 weeks post-conception). Using scATAC-seq and paired multiome (scATAC-seq + scRNA-seq) sequencing, the study profiled 166k nuclei from 76 biological samples dissected into five antero-posterior segments (telencephalon, diencephalon, mesencephalon, metencephalon, and cerebellum), of which 166,785 nuclei included paired gene expression. The atlas defines 135 clusters spanning neurons (GABAergic, glutamatergic, Purkinje, granule), radial glia, glioblasts, oligodendrocyte progenitor cells, fibroblasts, vascular, and immune cell types. Key findings include over 100 cell-type- and region-specific candidate cis-regulatory elements, CNN-predicted enhancer syntax for neuronal specification, elucidation of the ESRRB activation mechanism in the Purkinje cell lineage, and linkage of disease-associated GWAS SNPs to specific neuronal subtypes identifying midbrain-derived GABAergic neurons as particularly vulnerable to major depressive disorder-related mutations. Table 5: Summary of scRNA-seq datasets used for ELISA validation. Approx. cells: approximate number of cells or nuclei profiled after quality control. Cell types: number of annotated major cell types. Conditions: experimental groups or treatment arms. ID Tissue Disease context Reference Approx. Cell Conditions cells types D1 Lung (bronchial) Cystic fibrosis Berg et al. Berg et al. [2025] ∼ 96k 30 CF vs. Ctrl D2 Adrenal / tumor Neuroblastoma Yu et al. Yu et al. [2025] ∼ 372k 20+ Pre- vs. post-chemo D3 Multi-cancer ICB response Gondal et al. Gondal et al. [2025] ∼ 356k 25+ R vs. NR; 9 cancers D4 Lung (fetal) AT2 organoid model Lim et al. Lim et al. [2025] ∼ 9.6k 8 fdAT2 organoid lines D5 Breast Healthy tissue atlas Bhat-Nakshatri et al. Bhat-Nakshatri et al. [2024] ∼ 51k 13 Healthy only D6 Brain (whole) Neurodevelopment Mannens et al. Mannens et al. [2025] ∼ 166k 160 6–13 PCW; 5 regions All datasets were downloaded from CZ CELLxGENE Discover (https://cellxgene.cziscience.com) in AnnData (.h5ad) format and preprocessed into ELISA’s standardized embedding format (.pt files) as described in the Data Representation section. Cell type annotations from the original publications were retained without modification. For datasets with condition metadata (D1, D2, D3, D4), condition columns were mapped to ELISA’s comparative analysis framework. Dataset D5 was used to evaluate ELISA’s performance on a single-condition atlas without disease contrast, testing the system’s capacity for cell type identification and pathway characterization in the absence of differential signals. Appendix F Methods F.1 ELISA: architecture and design principles ELISA (Embedding-Linked Interactive Single-cell Agent) is an agent-based computational framework for interactive interrogation of single-cell RNA-seq atlases. The system integrates four core modules a hybrid retrieval engine, an analytical suite, a visualization toolkit, and a large language model (LLM) chat interface to enable biologists to query scRNA-seq datasets using natural language, gene signatures, or a combination of both. The architecture follows a modular design in which each component operates on a shared data representation (a serialized PyTorch embedding file) and communicates through standardized data structures, enabling extensibility to new datasets without retraining. The system was implemented in Python 3.10+ and evaluated on a 6 dataset took from cellxgene. All source code, benchmark queries, and evaluation scripts are provided in the accompanying repository. F.2 Hybrid retrieval engine F.2.1 Query classification and routing A central design challenge in single-cell atlas retrieval is that user queries span a spectrum from pure natural language (“macrophage infiltration in CF airways”) to pure gene signatures (“MARCO FABP4 APOC1 C1QB”) and mixed queries combining both. ELISA addresses this through an explicit query classification module that routes each query to the optimal retrieval pipeline. The classifier operates by tokenizing the input and scoring each token against three criteria: (i) whether it matches a gene name pattern (uppercase alphanumeric, 2–15 characters, with optional hyphenated suffix), (i) whether it appears in the dataset’s known gene vocabulary, and (i) whether it belongs to a curated set of natural language indicator terms (e.g., “cell”, “activation”, “signaling”). Queries where ≥ 60% of tokens are classified as gene symbols are routed to the gene pipeline; queries where ≥ 20% of tokens are genes and ≥ 20% are natural language terms are routed to the mixed pipeline; all other queries are routed to the semantic (ontology) pipeline. F.2.2 Gene marker scoring pipeline For gene-list queries, ELISA scores each cluster by evaluating how well its differential expression (DE) profile matches the query genes. For each query gene g found in cluster c’s DE statistics, a per-gene score is computed as: score(g,c)=(0.5+|log2FC|)×(0.3+max(pctin−pctout, 0))score(g,c)= (0.5+| _2FC| )× (0.3+ (pct_in-pct_out,\;0) ) (6) where log2FC _2FC is the log-fold change of gene g in cluster c, and pctinpct_in and pctoutpct_out represent the fraction of cells expressing the gene inside and outside the cluster, respectively. The specificity term (pctin−pctout)(pct_in-pct_out) rewards genes that are selectively enriched in the cluster rather than ubiquitously expressed. A multiplicative bonus of 1.3×1.3× is applied when pctin>0.5pct_in>0.5. The aggregate cluster score is the sum of per-gene scores, modulated by a coverage factor (0.5+0.5×nfound/nquery)(0.5+0.5× n_found/n_query) that rewards clusters matching more query genes. Three scoring modes are available: ‘simple’ (binary hit counting), ‘weighted’ (described above), and ‘full’ (incorporating adjusted p-value significance via −log10(padj)- _10(p_adj), capped at 10). F.2.3 Semantic matching pipeline For ontology and text-based queries, ELISA employs BioBERTLee et al. [2020] (pritamdeka/BioBERT-mnli-snli-scinli-scitail-mednli-stsb) to encode both query text and precomputed cluster descriptions into a shared embedding space. Each cluster’s description is constructed during dataset preparation by concatenating its Cell Ontology name, top marker genes (ranked by |log2FC|| _2FC|), enriched Gene Ontology terms, and Reactome pathway annotations, producing a dual-representation embedding that captures both identity and functional context. At query time, the input text is encoded with BioBERT and cosine similarity is computed against all cluster embeddings. Two augmentation strategies improve retrieval accuracy. First, a name-boosting mechanism adds a score bonus (α=0.15α=0.15, scaled by word-overlap ratio) when significant substrings (≥ 4 characters) of a cluster’s name appear in the query. Second, a synonym expansion module maps common cell type aliases (e.g., “endothelial” → “endocardial cell”; “NK” → “natural killer cell”) to their Cell Ontology equivalents and applies a score boost (β=0.10β=0.10) to matching clusters, addressing vocabulary gaps between colloquial and formal ontology terminology. F.2.4 Reciprocal rank fusion for mixed queries Mixed queries containing both gene names and biological text are handled through reciprocal rank fusion (RRF). Both the gene and semantic pipelines are executed independently, and their ranked outputs are combined using: RRF(d)=∑rwrk+rankr(d)+1RRF(d)= _r w_rk+rank_r(d)+1 (7) where k=60k=60 is the RRF constant, wrw_r are per-pipeline weights (default: 1.0 for both), and rankr(d)rank_r(d) is the 0-indexed rank of cluster d in pipeline r. For gene-dominated queries routed through the gene pipeline, a light fusion with the semantic pipeline at a 3:1 weight ratio is applied as a safety mechanism to capture semantically related clusters that lack direct marker gene overlap. F.2.5 Additive union evaluation strategy For benchmarking, we introduce an additive union strategy that maximizes complementarity between modalities. For each query, the modality achieving higher recall@5 against expected clusters is designated as the primary pipeline. The union output begins with the primary pipeline’s full ranked list, followed by unique clusters from the secondary pipeline appended in their original rank order. This produces an untruncated result list (up to 2×2× top-k), evaluated at recall@5, @10, @15, and @20. Ties at recall@5 are broken by mean reciprocal rank (MRR). F.3 Analytical modules F.3.1 Cell–cell interaction prediction ELISA predicts ligand–receptor (LR) interactions between cell types using a curated database of 280+ LR pairs spanning 25 signaling pathway categories. The database was compiled from established resources (CellChatJin et al. [2025], CellPhoneDBEfremova et al. [2020], NicheNetBrowaeys et al. [2020]) and augmented with context-specific pairs for cystic fibrosis, neurodegeneration, neuroblastoma, and immune checkpoint biology. Each interaction is represented as a (ligand, receptor, pathway) tuple. For each source–target cluster pair, the interaction score is computed as: sij=pctin(ligand,ci)×pctin(receptor,cj)s_ij=pct_in(ligand,\,c_i)×pct_in(receptor,\,c_j) (8) where pctinpct_in denotes the fraction of cells expressing the gene above detection threshold. Interactions are filtered by minimum expression thresholds (ligand ≥ 10%, receptor ≥ 5% by default) and ranked by score. The module outputs per-interaction statistics, pathway-level summaries, and directional pair summaries. F.3.2 Pathway activity scoring Pathway activity across clusters is quantified using curated gene sets encompassing 60+ pathways organized into five categories: immune signaling (IFN-γ, Type I IFN, TNF/NF-κ , JAK-STAT, complement, TLR, chemokine), cell biology (mTOR, PI3K-Akt, Wnt, Notch, Hippo, Hedgehog, cell cycle, apoptosis), neuroscience (glutamatergic/GABAergic synapse, neurodegeneration, FCD progenitor markers), metabolism (oxidative phosphorylation, glycolysis, lipid metabolism, fatty acid metabolism), and tissue-specific programs (surfactant metabolism, epithelial defense, fibrosis, angiogenesis). For each pathway–cluster combination, the score is computed as the mean pctinpct_in (or alternative metric: log2FC _2FC, pctoutpct_out) across pathway genes detected in the cluster’s DE profile, requiring a minimum of 3 genes for a non-zero score. Coverage (fraction of pathway genes detected) is reported alongside scores. Pathway query matching uses word-overlap fuzzy matching to accommodate variant pathway names. F.3.3 Comparative analysis When dataset metadata includes a condition column (e.g., “patient_group” with values “CF” and “Ctrl”), ELISA enables condition-stratified analysis. The module detects condition columns through keyword matching against a curated list (patient_group, condition, disease, treatment, genotype, etc.) and validates that the column contains 2–10 distinct values. For each cluster, the condition distribution is estimated from metadata field weights, and a condition bias label is assigned (>>60% of cells from one condition). Per-gene statistics (log2FC _2FC, pctinpct_in, pctoutpct_out) are reported within condition-biased clusters, and condition-enriched gene lists are compiled across all clusters. F.3.4 Proportion analysis The cell type proportion analysis computes the per-cluster cell counts and fractions relative to the total dataset size. When a condition column is available, the module additionally computes the condition-specific proportions and fold changes. For binary conditions (e.g., CF vs. Control), fold changes were calculated as the fraction of condition A cells in a cluster divided by the fraction of condition B cells, enabling the identification of cell types enriched or depleted in disease states. F.3.5 Additional analytical functions Supplementary analytical functions include: (i) marker specificity scoring, which ranks genes by a weighted score combining specificity (pctin/(pctin+pctout)) (pct_in/(pct_in+pct_out) ) and effect size (|log2FC|| _2FC|); (i) co-expression analysis, computing Pearson correlations of pctinpct_in profiles across clusters; (i) cell cycle scoring using established S-phase and G2M-phase gene signatures (43 and 54 genes, respectively); and (iv) gene set enrichment against 10 MSigDB Hallmark gene sets. F.4 Visualization module ELISA includes a comprehensive visualization module that generates publication-quality figures in two categories. Retrieval-level visualizations include: embedding landscape projections (UMAPMcInnes et al. [2018], t-SNEMaaten and Hinton [2008], or PCA fallback) of cluster-level semantic and expression embeddings, with optional highlighting of retrieved clusters; inter-cluster cosine similarity heatmaps; retrieval score waterfall plots; gene evidence bar charts (log2FC _2FC or pctinpct_in); gene-by-cluster heatmaps; radar charts for multi-metric cluster profiles; semantic vs. expression similarity scatter plots for hybrid retrieval diagnostics; and lambda sweep curves for fusion weight optimization. When an AnnData (.h5ad) file is provided, cell-level visualizations are generated in a style consistent with Nature and Cell journals: cell-level UMAP plots with Cell Ontology labels placed using a centroid-offset algorithm with iterative repulsion to minimize label overlap; single-gene expression UMAPs with non-expressing cells shown in grey and expression on a purple gradient (capped at the 98th percentile); multi-gene expression grids; and dot plots showing percentage expression (dot size) and z-scored mean expression (dot color) across clusters. All plots used a 40-color colorblind-friendly palette and rasterized cell-level rendering for efficient file sizes. F.5 LLM-mediated chat interface The interactive chat interface wraps all modules behind a command-driven interface that routes user queries to the appropriate pipeline and generates LLM-interpreted summaries. The interface supports six retrieval and analysis modes (semantic, hybrid, discovery, compare, interactions, pathway, proportions) and 15 visualization commands. Each analysis result is automatically accumulated into a session-level report builder. LLM interpretation is performed via the Groq API using the LLaMA-3.1-8B-Instant modelGrattafiori et al. [2024] at temperature 0.2. Prompts are constructed with mode-specific templates that enforce strict grounding in dataset evidence: the LLM receives only the retrieved cluster data, gene statistics, and pathway/interaction results as context, with explicit instructions to avoid hallucination, external literature, and causal claims. Context payloads are trimmed to fit within the model’s token limits (∼ 4,500 tokens for user content), with priority given to top-ranked clusters and highest-effect-size genes. A discovery mode extends standard retrieval by prompting the LLM to produce four structured sections: (i) dataset evidence, (i) established biology, (i) consistency analysis identifying matches and mismatches with known biology, and (iv) candidate novel hypotheses stated with probabilistic language. This mode is designed to surface unexpected findings that may represent context-shifted gene functions or novel cell–cell interactions. F.6 Benchmarking framework F.6.1 Query design The benchmark comprises 100 queries divided into two categories: 50 ontology queries (concept-level, testing semantic understanding) and 50 expression queries (gene-signature-based, testing transcriptomic matching). Queries were derived from the findings of Berg et al.Berg et al. [2025], covering all major cell types identified in the study (macrophages, monocytes, CD8+ T cells, CD4+ T cells, B cells, basal cells, ciliated cells, NK cells, ionocytes, endothelial cells, dendritic cells, mast cells, secretory/goblet cells, fibroblasts, and neuroendocrine cells). Each query has a curated set of expected clusters and expected genes, enabling evaluation at both the cluster retrieval and gene delivery levels. F.6.2 Baseline comparisons ELISA’s retrieval performance was evaluated against: the progression CellWhisperer, Semantic ELISA, scGPT ELISA, Additive Union. F.6.3 Metrics Retrieval performance was assessed using three metrics. Cluster Recall@k measures the fraction of expected clusters appearing in the top-k retrieved results, using fuzzy matching (substring containment or word-overlap Jaccard similarity ≥0.5≥ 0.5) to accommodate Cell Ontology naming variations. Mean Reciprocal Rank (MRR) captures the rank position of the first relevant cluster. Gene Recall measures the fraction of expected genes recoverable from the DE profiles of the top-5 retrieved clusters, assessing whether retrieved clusters collectively provide the gene evidence needed for biological interpretation. F.6.4 Analytical module evaluation Analytical modules were evaluated against ground truth derived from the source publication. Interaction prediction was assessed by ligand–receptor pair recovery rate (whether the correct LR pair was detected regardless of cell type) and full match rate (correct LR pair between the correct source and target cell types, using fuzzy cell type matching). Pathway scoring was evaluated by alignment: the fraction of path activities reported on paper that ELISA correctly identified as active (score >0>0) in at least one group. The proportion analysis was evaluated by the consistency rate whether cell types reported as increased or decreased in CF show fold changes in the expected direction. Comparative analysis was evaluated by gene recall, the fraction of differentially expressed genes reported on paper that can be recovered from the condition-stratified analysis of ELISA’s. F.7 Data representation and preprocessing Each dataset is preprocessed into a single serialized PyTorch file (.pt) containing: cluster identifiers, precomputed BioBERT semantic embeddings (768-dimensional, L2-normalized), optional scGPT expression embeddings, per-cluster DE gene statistics (log2FC _2FC, pctinpct_in, pctoutpct_out, adjusted p-value), per-cluster GO and Reactome enrichment terms, per-cluster metadata (cell counts, condition distributions, categorical field frequencies), cluster text descriptions, and the complete gene vocabulary. This representation enables ELISA to operate entirely at the cluster level without requiring access to the original count matrix, substantially reducing memory requirements and enabling deployment on standard hardware. F.8 Software dependencies and reproducibility ELISA depends on: PyTorch (≥ 1.12) for tensor operations and data serialization, NumPy for numerical computation, sentence-transformersReimers and Gurevych [2019] for BioBERT encoding, scikit-learn for t-SNE projections, UMAP-learnMcInnes et al. [2018] for UMAP projections, matplotlib for visualization, scanpyWolf et al. [2018] for AnnData-backed cell-level plots, SciPy for hierarchical clustering and sparse matrix operations, and the Groq Python SDK for LLM access. All analyses were performed on a standard workstation without GPU requirements for the retrieval and analytical modules; BioBERT Lee et al. [2020] encoding benefits from but does not require GPU acceleration. F.9 ELISA parameters and hyperparameters Tables 6–9 report all parameters and hyperparameters used in the ELISA framework. Default values were used throughout all experiments; no dataset-specific tuning was performed. Table 6: Data preprocessing and embedding generation parameters. Parameter Value Description Preprocessing (Scanpy) target_sum 10,000 Library-size normalization target n_top_genes 3,000 HVGs selected (Seurat v3) max_value 10 Z-score clipping threshold n_comps 50 PCA components Leiden resolution 1.0 Used only if no annotations exist Differential expression Method Wilcoxon Via scanpy.tl.rank_genes_groups DE_PVAL 0.10 Adjusted p-value cutoff TOP_K_MARKERS_STATS 10,000 Max genes stored per cluster TOP_K_MARKERS_TEXT 400 Genes in cluster text summaries Enrichment (gseapy) Gene sets GO_Biological_Process_2023, Reactome_2022 TOP_K_GO 15 GO terms retained per cluster TOP_K_REACTOME 15 Reactome terms per cluster Enrichment cutoff 0.05 Adjusted p-value threshold Input genes 200 Top DE genes per enrichment call Semantic embedding (BioBERT) Model pritamdeka/BioBERT-mnli-snli-scinli-scitail-mednli-stsb Embedding dim 768 Output dimensionality α (IDENTITY_ALPHA) 0.6 Identity vs. context weight Normalization L2 Final combined embeddings Batch size 16 Sentences per encoding batch scGPT expression embedding Model scGPT whole-human Pre-trained foundation model Embedding dim 512 CLS token dimensionality N_BINS 51 Expression binning resolution MAX_TOKENS 3,000 Max gene tokens per cell Batch size 64 Cells per inference batch Aggregation Mean pooling Cell → cluster centroids Normalization L2 Cluster-level centroids Table 7: Hybrid retrieval engine parameters. Parameter Value Description Query classification Gene threshold ≥ 60% Token fraction to route as gene query Mixed threshold ≥ 20% each Gene + NL tokens for mixed routing Gene pattern A–Z, 2–15 chars Regex for gene symbol detection Gene marker scoring Score function (0.5+|log2FC|)×(0.3+max(pctin−pctout, 0))(0.5+| _2FC|)×(0.3+ (pct_in-pct_out,\,0)) High-expr bonus ×1.3 When pctin>0.5pct_in>0.5 Coverage factor 0.5+0.5×nfound/nquery0.5+0.5× n_found/n_query Semantic matching Similarity Cosine Query vs. cluster embeddings Name boost (α) 0.15 Bonus for ontology name overlap Min substring 4 chars For name boost activation Synonym boost (β) 0.10 Bonus for synonym match Reciprocal rank fusion RRF constant (k) 60 Smoothing constant Weights 1.0 : 1.0 Gene : semantic Additive union (benchmarking) Primary selection Recall@5 Higher-recall modality is primary Tiebreaker MRR When Recall@5 is tied Default settings top_k 5 Clusters returned per query pre_k 40 Candidates before reranking γ 2.5 Reranking sharpness λsem _sem (scGPT) 0.0 Pure gene scoring mode λsem _sem (discovery) 0.5 Balanced mode Table 8: Analytical module parameters. Parameter Value Description Ligand–receptor interactions Database size 280+ pairs From CellChat, CellPhoneDB, NicheNet Pathway categories 25 Signaling annotations min_ligand_pct 0.10 Min ligand expr. in source min_receptor_pct 0.05 Min receptor expr. in target Score pctin(L)×pctin(R)pct_in(L)×pct_in(R) Expression fraction product Self-interactions Excluded Source ≠ target Pathway activity scoring Number of pathways 60+ Across 5 categories Metric Mean pctinpct_in Avg. expression of pathway genes min_genes 3 Min for non-zero score Categories Immune, Cell biology, Neuroscience, Metabolism, Tissue-specific Comparative analysis Condition bias >>60% Fraction to assign bias label min_pct 0.05 Min expr. for gene inclusion top_n 20 Genes per cluster Enriched genes 30 Per-condition summary limit Proportion analysis Fold change fracA/fracBfrac_A/frac_B Condition ratio Min denominator 0.001 Below: reported as ∞ Cell cycle scoring S-phase genes 43 Seurat S-phase markers G2M-phase genes 54 Seurat G2M markers Cycling threshold S>0.3S>0.3 and G2M>0.3>0.3 Both above threshold Gene set enrichment Default gene sets 10 MSigDB Hallmark Curated pathways min_genes 3 Min for non-zero score Table 9: LLM interpretation parameters. Parameter Value Description LLM configuration Default provider Groq Free tier, 500K tokens/day Default model LLaMA-3.1-8B Via Groq Cloud API Supported 4 providers Groq, Gemini, OpenAI, Claude Temperature 0.2 Low for reproducibility Prompt limit 18,000 chars ≈ 4,500 tokens Context limit 12,000 chars ≈ 3,000 tokens Safety and rate limiting Spending cap €1.00 Hard cap, configurable Max retries 5 On rate-limit errors Initial wait 10 s Backoff start Backoff Exponential Max 120 s Context trimming Clusters Top 10 In compare mode Gene evidence Top 5 Per cluster Pathway scores Top 10 Entries to LLM Interactions Top 20 Entries to LLM Discovery sections 4 Evidence, Biology, Consistency, Hypotheses Appendix G D1: Cystic Fibrosis Airways (Berg et al. [2025] et al.) G.1 Ontology Queries 1. Macrophage and monocyte infiltration in cystic fibrosis airways 2. Recruited monocytes and pro-inflammatory macrophages in CF lung tissue 3. Macrophage scavenging receptor expression and phagocytosis in CF 4. Non-classical monocyte patrol function in CF bronchial wall 5. CD8 T cell activation and cytotoxicity in CF lung inflammation 6. CD8 T cell inflammatory cytokine production and IFNG signaling in CF 7. HLA-E CD94 NKG2A immune checkpoint inhibiting CD8 T cell activity 8. Dysfunctional CD8 T cell response to chronic Pseudomonas infection in CF 9. CALR LRP1 interaction between T cells and macrophages promoting inflammation 10. CD4 helper T cell immune activation in cystic fibrosis 11. CD4 T cell VEGF receptor signaling and hypoxia response in CF 12. Aberrant Th2 and Th17 T cell responses in Pseudomonas-infected CF lungs 13. Chronic adaptive immune activation of T lymphocytes in CF despite modulator therapy 14. B cell activation and immunoglobulin response in CF airways 15. B cell receptor downregulation and reduced plasma cell markers in CF 16. Interferon gamma signaling and HLA-DP expression in B cells of CF patients 17. PDGFRB signaling pathway activated in B cells from CF lungs 18. Basal cell dysfunction and reduced stemness in cystic fibrosis epithelium 19. Impaired basal cell differentiation and pathogenic basal cell variants in CF 20. Basal cell DNA damage repair and chromatin remodeling in CF airways 21. Reduced keratinization gene expression CSTA HSPB1 in CF basal cells 22. Basal cell altered cell–cell communication and increased interactions in CF 23. Ciliated cell ciliogenesis and increased abundance in CF bronchial epithelium 24. Ciliated cell HLA class I expression and immune-linked transcriptional changes in CF 25. Skewed basal cell differentiation towards ciliated cells in CF epithelium 26. Natural killer cell cytotoxicity and NKG2A immune checkpoint in CF 27. NKG2A blockade to restore NK and CD8 T cell function in CF lung 28. Innate lymphoid cell dysfunction and impaired antimicrobial defense in CF 29. Pulmonary ionocyte CFTR expression in cystic fibrosis 30. Ionocyte unique cell–cell interactions with adaptive lymphocytes in CF 31. Endothelial cell remodeling and VEGF signaling in CF lung 32. Reduced endothelial cell proportions and altered differentiation in CF airways 33. Hypoxia-induced VEGF upregulation and vascular remodeling in CF lungs 34. Dendritic cell antigen presentation in CF airways 35. IFNG IFNGR2 interaction between CD8 T cells and dendritic cells in CF 36. Mast cell degranulation and allergic inflammation in CF 37. Secretory cell mucus overproduction and inflammatory signaling in CF epithelium 38. Goblet cell hyperplasia and mucin gene expression in cystic fibrosis 39. Submucosal gland epithelial cell changes in cystic fibrosis 40. Reduced submucosal gland cell proportions and gland development dysfunction in CF 41. Type I interferon response and inflammatory signaling in CF epithelial cells 42. Interferon responsive gene upregulation across epithelial subsets in CF 43. VEGF receptor signaling and hypoxia response across cell types in CF 44. TXNIP-mediated NLRP3 inflammasome activation in CF lymphocytes and epithelial cells 45. GNAI2 immunomodulatory signaling in CD8 T cells and B cells in CF 46. GNAI2 adenylate cyclase regulation and CFTR function in lymphocytes 47. Stromal cell and fibroblast remodeling in CF airway tissue 48. Pericyte and stromal cell contribution to airway fibrosis in CF 49. IFNG–IFNGR1 interaction between CD8 T cells and basal cells, macrophages, and endothelial cells in CF 50. Altered structural–immune cell crosstalk in CF involving lymphocytes, ionocytes, and macrophages G.2 Expression Queries 1. MARCO FABP4 APOC1 C1QB C1QC MSR1 2. CD68 CD14 CSF1R CSF2RA LGALS2 3. GOS2 FABP4 PPARG APOC1 C1QB 4. FCGR3A CX3CR1 CD14 CDKN1C LILRB2 5. CD8A CD8B GZMB PRF1 IFNG NKG7 6. IFNG GNAI2 CD69 CD81 CD3G FOS JUND 7. GZMB PRF1 NKG7 GNLY KLRD1 CD8A 8. TXNIP MAP2K2 IFNG CD81 CD3G CD69 9. KLF2 IL7R CD48 TXNIP ETS1 10. CD3D CD4 IL7R CD3E CD3G 11. TRAJ52 TRBV22-1 TRDJ2 CD3E CD3G 12. CD3G CD3E CD69 IL7R CD81 FOS 13. IGLJ3 IGKJ1 IGHJ5 JCHAIN MZB1 XBP1 14. CD79A IGHG3 IGLC2 SYK CD81 JCHAIN 15. SYK CSK CD9 CD81 JUND LTB HLA-DPA1 16. IGHG3 IGLC2 IGHD IGHA1 IGLC1 IGLC3 17. KRT5 KRT14 KRT15 TP63 IL33 CSTA 18. CSTA HSPB1 KRT5 KRT14 TP63 19. KRT5 IL33 TP63 KRT15 LAMB3 COL17A1 20. FOXJ1 DNAH5 CAPS PIFO RSPH1 DNAI1 21. DNAH5 SYNE1 SYNE2 CAPS PIFO 22. GNLY KLRD1 KLRK1 NKG7 PRF1 GZMB 23. GNLY NKG7 KLRD1 KLRK1 KLRC1 24. ATP6V1G3 FOXI1 BSND CLCNKB ASCL3 25. FOXI1 CFTR ATP6V1G3 BSND RARRES2 26. PLVAP ACKR1 ERG VWF PECAM1 CDH5 27. VIM PLVAP ACKR1 MGP PTGDS CXCL14 28. CPA3 TPSAB1 TPSB2 MS4A2 HDC GATA2 29. TPSAB1 TPSB2 KIT CPA3 MS4A2 30. HLA-DPA1 HLA-DRB1 CD74 GPR183 LGALS2 31. HLA-DPA1 HLA-DPB1 HLA-DRB1 CD80 CD86 CD74 32. SCGB1A1 SCGB3A1 MUC5AC MUC5B LYPD2 PRR4 33. SCGB1A1 MUC5AC SCGB3A1 LYPD2 34. MUC5AC MUC5B LYZ SCGB1A1 SCGB3A1 35. COL1A2 LUM DCN SFRP2 COL3A1 PDGFRA 36. PDGFRA COL1A2 COL3A1 VCAN DCN LUM 37. PDGFRB VIM COL1A2 MGP CXCL14 38. SST CHGA ASCL1 GRP CALCA SYP 39. GRP ASCL1 SYT1 CHGA SYP CALCA 40. HLA-E KLRC1 KLRD1 KLRC2 KLRC3 KLRK1 41. HLA-E KLRC1 KLRD1 CD8A CD8B 42. CALR LRP1 GNAI2 FOS JUND MAP2K2 43. GNAI2 CXCR3 F2R S1PR4 CD69 44. IFIT1 MX1 OAS2 ISG15 IFITM3 IFIT3 45. IFIT1 MX1 OAS2 IFIT3 IFI6 46. KDM1A KMT5A RAD50 ERCC6 ERCC8 47. TXNIP MAP2K2 ETS1 VEGFA KLF2 48. IFNG IFNGR1 IFNGR2 CALR LRP1 49. CCL5 CCR5 CXCL10 CXCR3 F2R 50. CFTR FOXI1 SCGB1A1 KRT5 FOXJ1 MUC5AC Appendix H D5: Healthy Breast Tissue Atlas (Bhat-Nakshatri et al. [2024] et al.) H.1 Ontology Queries 1. Luminal hormone sensing cells with estrogen receptor expression in the healthy breast 2. FOXA1 pioneer transcription factor activity in luminal hormone responsive breast epithelial cells 3. ERα–FOXA1–GATA3 transcription factor network in hormone responsive breast cells 4. Mature luminal cells with hormone receptor positive identity in breast tissue 5. Hormone sensing alpha versus beta cell states in breast epithelium 6. LHS cell-enriched fate factor DACH1 and PI3K pathway regulator INPP4B in breast 7. Lobular epithelial cells expressing APOD and immunoglobulin genes in breast 8. Luminal adaptive secretory precursor cells and progenitor identity in breast 9. ELF5 and EHF transcription factor expression in luminal progenitor breast cells 10. Alveolar progenitor cell state enriched in Indigenous American breast tissue 11. BRCA1 associated breast cancer originating from luminal progenitor cells 12. KIT receptor expression and chromatin accessibility in luminal progenitor cells 13. MFGE8 and SHANK2 expression in luminal progenitor cells of the breast 14. LASP basal–luminal intermediate progenitor cell identity in the breast 15. Basal-myoepithelial cells with TP63 and KRT14 expression in breast 16. Basal cell chromatin accessibility and TP63 binding site enrichment 17. Basal alpha and basal beta cell states in breast myoepithelium 18. SOX10 motif enrichment in basal-myoepithelial cells of the breast 19. KRT14 KRT17 expression in ductal epithelial and basal cells of breast tissue 20. Fibroblast heterogeneity and cell states in healthy breast stroma 21. Genetic ancestry-dependent variability in breast fibroblast cell states 22. Fibro-prematrix state enrichment in African ancestry breast tissue fibroblasts 23. PROCR ZEB1 PDGFRα multipotent stromal cells enriched in African ancestry breast 24. Myofibroblast and inflammatory fibroblast subtypes in breast cancer stroma 25. SFRP4 and Wnt pathway modulation in breast fibroblasts 26. Endothelial cell subtypes and vascular markers in breast tissue 27. Lymphatic endothelial cells expressing LYVE1 in breast stroma 28. ACKR1 stalk-like endothelial cell subtype in breast vasculature 29. Vascular endothelial cell heterogeneity in mammary gland microvasculature 30. Breast tissue angiogenesis and endothelial cell MECOM expression 31. T lymphocyte markers and immune cell identity in breast tissue 32. CD4 T cell IL7R expression and chromatin accessibility in breast 33. CD8 T cell GZMK cytotoxic activity and IFNG signaling in breast tissue 34. Tissue-resident memory T lymphocyte populations in healthy breast 35. Adaptive immune surveillance by T cells in mammary gland stroma 36. Macrophage identity and FCGR3A expression in breast tissue stroma 37. Macrophage subtypes and tissue-resident immune cells in healthy breast 38. Breast tissue-resident macrophage phagocytic function and complement expression 39. Myeloid lineage immune cells and monocyte-derived macrophages in mammary gland 40. Adipocyte subtypes and lipid metabolism in breast tissue 41. Adipocyte PLIN1 and FABP4 expression in healthy breast stroma 42. PLIN1 lipid droplet biology and adipocyte identity in mammary fat pad 43. Mammary gland adipose tissue and fatty acid binding protein expression 44. Epithelial cell hierarchy from basal to luminal hormone sensing in breast 45. CXCL12 chemokine expression in endothelial cells and fibroblasts of breast 46. VEGFA angiogenic signaling from luminal cells to endothelium in breast 47. IGF1 paracrine signaling from fibroblasts to luminal cells in breast stroma 48. Breast tissue microenvironment with stromal and immune cell interactions 49. Ancestry differences in breast tissue cellular composition and cancer risk 50. Gene expression differences between ductal and lobular epithelial cells of the breast H.1.1 Expression Queries 1. FOXA1 ESR1 GATA3 ERBB4 ANKRD30A AFF3 TTC6 2. MYBPC1 THSD4 CTNND2 DACH1 INPP4B NEK10 3. ESR1 FOXA1 GATA3 ELOVL5 ANKRD30A 4. AFF3 TTC6 ERBB4 MYBPC1 THSD4 5. DACH1 NEK10 CTNND2 INPP4B ELOVL5 6. APOD IGHA1 IGKC ESR1 FOXA1 GATA3 7. DUSP1 DPM3 RPL36 IGHA1 IGKC APOD 8. ELF5 EHF KIT CCL28 KRT15 BARX2 NCALD 9. MFGE8 SHANK2 SORBS2 AGAP1 ELF5 10. KRT15 CCL28 KIT INPP4B ELF5 11. RBMS3 EHF BARX2 NCALD ELF5 12. ESR1 ELF5 EHF KIT CCL28 13. ELF5 KIT CCL28 EHF KRT15 BARX2 14. NCALD BARX2 SHANK2 SORBS2 MFGE8 ELF5 15. TP63 KRT14 KLHL29 FHOD3 SEMA5A 16. KLHL13 KLHL29 TP63 KRT14 PTPRT 17. TP63 KRT14 KRT17 FHOD3 ABLIM3 18. ST6GALNAC3 PTPRM SEMA5A KLHL29 19. KRT14 KRT17 TP63 KLHL29 KLHL13 FHOD3 20. LAMA2 SLIT2 RUNX1T1 COL1A1 COL3A1 21. COL3A1 POSTN COL1A1 IGF1 ADAM12 22. CFD MGST1 MFAP5 COL3A1 POSTN 23. PROCR ZEB1 PDGFRA COL1A1 LAMA2 24. SFRP4 COL1A1 POSTN LAMA2 SLIT2 25. COL1A1 PDPN CD34 CXCL12 LAMA2 26. MECOM LDB2 MMRN1 CXCL12 ACKR1 27. LYVE1 MECOM LDB2 MMRN1 28. ACKR1 CXCL12 MECOM LDB2 29. MECOM LDB2 MMRN1 LYVE1 ACKR1 30. CXCL12 MECOM LDB2 ACKR1 MMRN1 31. PTPRC SKAP1 ARHGAP15 THEMIS IL7R 32. IL7R GZMK PTPRC SKAP1 33. IFNG GZMK IL7R THEMIS PTPRC 34. THEMIS ARHGAP15 SKAP1 PTPRC IL7R 35. PTPRC SKAP1 GZMK IFNG THEMIS ARHGAP15 36. FCGR3A ALCAM LYVE1 CD163 37. ALCAM FCGR3A LYVE1 CD14 38. FCGR3A ALCAM CD163 MERTK 39. ALCAM LYVE1 FCGR3A CD163 MARCO 40. PLIN1 FABP4 KIT ADIPOQ LEP 41. FABP4 PLIN1 ADIPOQ LEP LPL 42. PLIN1 FABP4 LPL PPARG ADIPOQ 43. FABP4 PLIN1 KIT ADIPOQ 44. FOXA1 ELF5 TP63 KRT14 GATA3 ESR1 45. GATA3 EHF ELF5 FOXA1 KRT15 KRT14 TP63 46. MECOM PTPRC FCGR3A PLIN1 LAMA2 TP63 FOXA1 47. CXCL12 LAMA2 MECOM LDB2 COL1A1 48. ESR1 FOXA1 ELF5 EHF KIT TP63 KRT14 49. PTPRC FCGR3A FABP4 PLIN1 MECOM 50. VEGFA LDB2 IGF1 LAMA2 FOXA1 ELF5 H.2 D3: Fetal Lung AT2 Organoids (Lim et al. [2025] et al.) H.2.1 Ontology Queries 1. Alveolar type 2 cell identity and surfactant protein production in fetal lung organoids 2. Mature AT2 cell markers and lamellar body formation in fdAT2 organoids 3. Surfactant protein C maturation and intracellular trafficking in alveolar epithelium 4. SFTPC processing through endosomal compartments and multivesicular bodies 5. Surfactant secretion and lamellar body exocytosis in human AT2 cells 6. ITCH E3 ubiquitin ligase role in SFTPC trafficking and ubiquitination 7. K63 ubiquitination of surfactant protein C for ESCRT recognition and MVB entry 8. HECT domain E3 ligase ITCH depletion phenocopying SFTPC-I73T pathogenic variant 9. Ubiquitome forward genetic screen for SFTPC trafficking effectors 10. SFTPC relocalisation to plasma membrane and recycling endosomes upon ITCH loss 11. AT2 stem cell self-renewal and proliferation in fetal lung organoids 12. FGF7-driven AT2 cell proliferation and surfactant processing balance 13. Expandable fetal-derived AT2 organoids maintaining identity over passaging 14. Alveolar type 1 cell differentiation from AT2 organoids via YAP activation 15. AT2 to AT1 lineage transition through Wnt withdrawal and LATS inhibition 16. AT1 cell fate markers AQP5 CAV1 AGER in differentiated fdAT2 organoids 17. CXCL chemokine expressing AT2 subpopulation in fetal lung organoids 18. Immune response gene expression in alveolar type 2 cells 19. Chemokine-mediated innate immune signaling in AT2 organoid subsets 20. Aberrant basal cell differentiation from AT2 cells in organoid culture 21. Hypoxia-induced airway differentiation of alveolar type 2 cells 22. Pulmonary neuroendocrine cell differentiation in AT2 organoids 23. Neuroendocrine progenitor cells co-expressing SFTPC and NE markers 24. Ciliated cell-like differentiation in fetal AT2 organoid culture 25. Intermediate transitional cell state between AT2 and differentiated lineages 26. Surfactant metabolism and lipid transport in fetal alveolar epithelium 27. Vesicle-mediated transport and lysosome localization in AT2 surfactant processing 28. Lipid storage membrane transport and vesicle cytoskeleton trafficking in AT2 cells 29. Wnt signaling pathway maintaining AT2 identity and inhibiting AT1 differentiation 30. SFTPC-I73T pathogenic variant causing interstitial lung disease and AT2 dysfunction 31. Toxic gain-of-function effect of misfolded surfactant protein C variants 32. Transcriptional maturity of fdAT2 organoids compared to adult AT2 and PSC-iAT2 33. Missing immune response MHC class I genes in fetal versus adult AT2 cells 34. CRISPRi-mediated depletion of ITCH and UBE2N in fdAT2 organoids 35. Reversible SFTPC mislocalization after CRISPRi recovery in AT2 organoids 36. ESCRT complex components HRS VPS28 required for SFTPC MVB entry 37. Endosomal recycling of SFTPC to plasma membrane upon ubiquitination failure 38. SUMOylation pathway components UBE2I UBA2 PIAS1 and SFTPC expression regulation 39. Fetal lung tip progenitor differentiation into mature AT2 cells 40. EpCAM positive tip epithelial cell isolation and AT2 organoid derivation 41. SFTPC C-terminal cleavage and proprotein processing in endosomal compartments 42. proSFTPC plasma membrane transit before endocytosis and maturation 43. Interstitial lung disease caused by SFTPC variants and AT2 cell dysfunction 44. Heritable pulmonary fibrosis from SFTPC mistrafficking and toxic accumulation 45. AT2 medium components dexamethasone cAMP IBMX DAPT for alveolar differentiation 46. fdAT2 organoid engraftment in mouse precision-cut lung slices and AT1 differentiation 47. NEDD4-2 HECT domain ligase role in SFTPC ubiquitination and maturation 48. Cell type heterogeneity and proportions across fdAT2 organoid lines 49. fdAT2 organoid stability over long-term passaging and cryopreservation 50. Genetic manipulation of fetal AT2 organoids using lentiviral CRISPRi system H.3 Expression Queries 1. SFTPC SFTPB SFTPA1 SFTPA2 NAPSA LAMP3 2. SFTPC SFTPB ABCA3 LAMP3 HOPX NKX2-1 3. NKX2-1 SLC34A2 LPCAT1 HOPX CEACAM6 4. SFTPC SFTPD SFTA3 CD36 CAV1 SLC34A2 5. SFTPA1 SFTPA2 SFTPB SFTPC SFTPD 6. ITCH UBE2N HRS VPS28 RABGEF1 EEA1 7. ITCH NEDD4 NEDD4L UBE2N UBE2I 8. EEA1 MICALL1 LAMP3 HRS VPS28 9. UBE2I UBA2 PIAS1 ITCH RABGEF1 10. ABCA3 LAMP3 NAPSA CKAP4 ZDHHC2 CTSH 11. ABCA3 SFTPB SFTPC LAMP3 P2RY2 LMCD1 12. MKI67 PCNA TOP2A SFTPC NKX2-1 13. MKI67 PCNA CDK1 CCNB1 SFTPC 14. CXCL1 CXCL2 CXCL3 CCL2 SFTPC 15. CXCL1 CXCL3 CCL2 CCL4 CCL4L1 16. CXCL1 CXCL2 HLA-DPA1 HLA-DPB1 CCL2 17. HLA-DQB1 HLA-DMA HLA-DMB HLA-DRA HLA-DOA 18. HLA-DPA1 HLA-DPB1 HLA-DRA CD86 TNF 19. AQP5 CAV1 AGER HOPX 20. CAV1 AGER AQP5 PDPN 21. TP63 KRT5 KRT14 KRT15 SOX2 22. KRT5 KRT14 TP63 LAMB3 COL17A1 23. ASCL1 NEUROD1 GRP CHGA SYP CALCA 24. GRP ASCL1 SYT1 CHGA SYP 25. ASCL1 GRP SFTPC NKX2-1 26. FOXJ1 DNAH5 CAPS PIFO RSPH1 27. FOXJ1 DNAH5 DNAI1 RSPH1 CAPS 28. SOX2 SOX9 NKX2-1 SFTPC TP63 29. SOX2 NKX2-1 HOPX CAV1 30. CTNNB1 TCF7L2 AXIN2 WNT3A LGR5 31. SFTPC NKX2-1 HOPX SFTPB ABCA3 MKI67 32. NAPSA ABCA3 SFTA3 SFTPD LAMP3 HOPX 33. SFTPC ITCH EEA1 LAMP3 MICALL1 ABCA3 34. SFTPC NAPSA CTSH LAMP3 ITCH UBE2N 35. SFTPC CXCL1 CXCL2 NKX2-1 LAMP3 36. CDH1 TJP1 EPCAM SFTPC NKX2-1 37. ITCH HRS VPS28 UBE2N RABGEF1 PIAS1 UBE2I UBA2 38. ITCH NEDD4 NEDD4L HRS UBAP1 USP8 39. MKI67 TOP2A PCNA CDK1 CCNB1 CCNA2 40. SFTPC TP63 ASCL1 FOXJ1 NKX2-1 41. SFTPC SFTPB ASCL1 GRP TP63 KRT5 42. SFTPC CAV1 AGER AQP5 HOPX NKX2-1 43. LAMP3 ABCA3 SFTPB SFTPC NAPSA CD36 44. CKAP4 ZDHHC2 SLC34A2 CTSH SFTPC 45. CXCL1 CXCL2 CXCL3 CCL2 CCL4 TNF 46. SOX9 NKX2-1 SFTPC SFTPB LAMP3 47. SFTPC NKX2-1 ASCL1 NEUROD1 GRP MKI67 48. SFTA3 SFTPD NAPSA NKX2-1 CKAP4 ZDHHC2 SLC34A2 CTSH SFTPA1 SFTPA2 SFTPC SFTPB 49. ITCH SFTPC LAMP3 ABCA3 UBE2N NAPSA 50. SFTPC CXCL1 MKI67 TP63 ASCL1 FOXJ1 SOX2 CAV1 Appendix I D2: High-Risk Neuroblastoma (Yu et al. [2025] et al.) I.1 Ontology Queries 1. Neuroblast neoplastic cell of sympathetic nervous system expressing PHOX2B and ISL1 2. Neuroblastoma tumor cell with MYCN amplification and proliferative phenotype 3. Adrenergic neuroblast expressing catecholamine biosynthesis enzymes tyrosine hydroxylase 4. Neuroblastoma cell with calcium and synaptic signaling pathway enrichment 5. Dopaminergic neuroblast expressing dopamine transporter and metabolic genes 6. Proliferating neuroblastoma cell with cell cycle and DNA replication markers 7. Mesenchymal neuroblastoma cell state expressing extracellular matrix genes and YAP1 8. Intermediate OXPHOS neuroblast with ribosomal gene expression and oxidative phosphorylation 9. EZH2 expressing neuroblastoma cell PRC2 polycomb repressive complex chromatin regulation 10. Neuroblastoma cell ERBB4 receptor expressing epidermal growth factor signaling 11. Neuroblast with adrenergic transcription factor PHOX2A PHOX2B GATA3 expression 12. Neural crest derived neoplastic cell in pediatric tumor expressing chromogranin 13. Neuroblastoma cell immune evasion NECTIN2 and checkpoint ligand expression 14. Mesenchymal transition state in neuroblastoma with AP-1 transcription factors 15. Tumor associated macrophage in neuroblastoma microenvironment CD68 CD163 expressing 16. Pro-inflammatory macrophage IL18 expressing anti-tumor immune response 17. Pro-angiogenic macrophage VCAN expressing promoting tumor vascularization 18. Immunosuppressive macrophage C1QC SPP1 complement expressing in tumor 19. Tissue resident macrophage F13A1 expressing phagocytic function in neuroblastoma 20. Lipid associated macrophage HS3ST2 with metabolic phenotype in tumor 21. Macrophage secreting HB-EGF ligand for ERBB4 receptor activation on neuroblasts 22. CCL4 expressing pro-angiogenic macrophage chemokine signaling in tumor 23. Proliferating macrophage MKI67 TOP2A expanding after chemotherapy 24. THY1 positive macrophage undefined myeloid phenotype in neuroblastoma 25. T cell lymphocyte infiltrating neuroblastoma tumor expressing CD247 CD96 26. Cytotoxic T cell with granzyme perforin mediated tumor cell killing 27. Tumor infiltrating T lymphocyte immune response to neuroblastoma 28. B cell lymphocyte PAX5 MS4A1 in neuroblastoma tumor immune microenvironment 29. B lymphocyte humoral immunity and antigen presentation in pediatric tumor 30. Dendritic cell IRF8 FLT3 antigen presentation priming T cell responses in tumor 31. Professional antigen presenting dendritic cell MHC class I expression 32. Fibroblast stromal cell PDGFRB DCN extracellular matrix production in neuroblastoma 33. Cancer associated fibroblast FAP ACTA2 expressing in tumor stroma 34. Neural crest derived endoneurial fibroblast in neuroblastoma tissue 35. Schwann cell PLP1 CDH19 myelinating glial cell in neuroblastoma microenvironment 36. Schwann cell precursor neural crest lineage expanding after therapy 37. Endothelial cell PECAM1 PTPRB vascular marker in neuroblastoma tumor vasculature 38. Tumor endothelium blood vessel lining cell expressing vascular endothelial markers 39. Adrenal cortex cell steroidogenesis CYP11A1 CYP11B1 adjacent normal tissue 40. Cortical cell of adrenal gland steroid hormone biosynthesis normal adjacent tissue 41. Hepatocyte ALB expressing liver cell from adjacent normal tissue in neuroblastoma biopsy 42. Kidney cell renal tissue PKHD1 from adjacent normal tissue in neuroblastoma specimen 43. Chemotherapy induced tumor microenvironment rewiring macrophage expansion after therapy 44. HB-EGF ERBB4 paracrine signaling axis between macrophage and neuroblast promoting ERK 45. Tumor immune evasion and antigen presentation in neuroblastoma 46. VEGFA angiogenesis signaling in neuroblastoma tumor microenvironment 47. Immune cell infiltration in high-risk neuroblastoma T cell B cell macrophage 48. THBS1 CD47 don’t eat me signal between macrophage and neuroblastoma cell 49. Neuroblastoma cell expressing ALK receptor tyrosine kinase oncogenic driver 50. Tumor microenvironment cell diversity neuroblasts fibroblasts Schwann endothelial macrophages I.2 Expression Queries Q51. PHOX2B ISL1 HAND2 TH DBH DDC CHGA Q52. MYCN MKI67 TOP2A EZH2 SMC4 BIRC5 Q53. PHOX2A PHOX2B GATA3 ASCL1 ISL1 HAND2 Q54. CACNA1B SYN2 KCNMA1 KCNQ3 GPC5 CREB5 Q55. SLC18A2 TH DDC AGTR2 ATP2A2 PHOX2B Q56. MKI67 TOP2A EZH2 SMC4 BIRC5 BUB1B ASPM KIF11 Q57. YAP1 FN1 VIM COL1A1 SERPINE1 SPARC THBS2 Q58. ERBB4 EGFR HBEGF TGFA EREG AREG Q59. NECTIN2 CD274 B2M HLA-A HLA-B PHOX2B Q60. JUN FOS JUNB JUND FOSL2 BACH1 BACH2 Q61. CHGA CHGB PHOX2B ISL1 NTRK1 RET Q62. ETS1 ETV6 ELF1 KLF6 KLF7 RUNX1 ZNF148 Q63. ALK MYCN NTRK2 PHOX2B TH Q64. CD68 CD163 CD86 CSF1R MRC1 SPP1 Q65. IL18 CD68 CD163 CD86 HLA-DRA CSF1R Q66. VCAN VEGFA CD68 CD163 SPP1 EGFR Q67. C1QC SPP1 CD68 CD163 APOE TREM2 Q68. F13A1 CD68 CD163 MRC1 LYVE1 CSF1R Q69. HS3ST2 CYP27A1 CD68 CD163 APOE LPL Q70. HBEGF TGFA EREG AREG CD68 CD163 Q71. CCL4 CD68 CD163 VEGFA CSF1R CCL3 Q72. THY1 CD68 CD163 MRC1 CSF1R CD86 Q73. CD247 CD96 CD3D CD3E CD8A CD4 Q74. GZMA GZMB PRF1 IFNG CD8A CD3D Q75. PAX5 MS4A1 CD19 CD79A HLA-DRA HLA-DRB1 Q76. IRF8 FLT3 CLEC9A CD1C CD80 HLA-DRA Q77. PDGFRB DCN LUM COL1A1 COL1A2 VIM Q78. FAP ACTA2 COL1A1 PDGFRA DCN LUM Q79. PLP1 CDH19 SOX10 MPZ MBP S100B Q80. PECAM1 PTPRB CDH5 VWF KDR FLT1 Q81. CYP11A1 CYP11B1 CYP17A1 STAR NR5A1 Q82. ALB DCDC2 HNF4A APOB Q83. PKHD1 PAX2 WT1 SLC12A1 Q84. PHOX2B CD68 CD3D MS4A1 PECAM1 DCN PLP1 Q85. HBEGF ERBB4 CD68 PHOX2B MAPK1 Q86. VCAN THBS1 CD47 ITGB1 CD68 PHOX2B Q87. HLA-A HLA-B HLA-C B2M HLA-DRA HLA-DRB1 Q88. VEGFA KDR FLT1 NRP1 GPC1 PECAM1 Q89. CD68 IL18 VCAN C1QC SPP1 F13A1 HS3ST2 CCL4 THY1 Q90. PHOX2B MKI67 TOP2A YAP1 CACNA1B SLC18A2 Q91. APOE LDLR VLDLR LPL HS3ST2 CD68 Q92. THBS1 ITGB1 ITGA3 LRP5 CD47 FN1 Q93. COL1A1 COL1A2 COL4A1 COL4A2 FN1 VIM SPARC Q94. MAPK1 MAPK3 AKT1 ERBB4 EGFR HBEGF Q95. CD274 PDCD1 CTLA4 TIGIT LAG3 NECTIN2 Q96. PHOX2B CD68 PLP1 PECAM1 DCN IRF8 PAX5 CD247 Q97. CYP11A1 ALB PKHD1 PHOX2B CD68 Q98. PHOX2B HBEGF ERBB4 VCAN SPP1 CD163 VEGFA Q99. MKI67 TOP2A PCNA CDK1 CCNB1 EZH2 MELK Q100. PHOX2B ISL1 CD68 CD163 CD3D MS4A1 PLP1 PECAM1 DCN CYP11A1 ALB Appendix J D3: Immune Checkpoint Blockade Multi-Cancer (Gondal et al. [2025] et al.) J.1 Ontology Queries 1. Malignant cancer cell expressing immune checkpoint ligand PD-L1 for immune evasion 2. Tumor cell immune evasion through HLA downregulation and B2M loss 3. Melanoma cancer cell expressing MITF MLANA PMEL lineage markers 4. Breast cancer epithelial cell markers EPCAM KRT8 KRT18 KRT19 in ICB treated tumors 5. Tumor cell proliferation and cell cycle markers in malignant cells 6. Cancer cell VEGFA and TGFB1 immunosuppressive signaling in tumor microenvironment 7. Epithelial mesenchymal transition EMT markers in cancer cells during ICB treatment 8. Effector CD8 T cell cytotoxic function with granzyme and perforin expression 9. Activated CD8 T cell expressing IFNG and TNF anti-tumor cytokines 10. CD8 T cell exhaustion with PD-1 LAG3 TIM3 TIGIT checkpoint receptor co-expression 11. TOX transcription factor driving T cell exhaustion program in chronic antigen stimulation 12. Central memory CD8 T cell with TCF7 and IL7R expression for long-lived immunity 13. Naive CD8 T cell expressing CCR7 SELL before antigen encounter 14. CD8-positive T cell co-stimulatory receptor 4-1B ICOS upon activation 15. CD4 positive helper T cell TCR signaling and cytokine production 16. Regulatory T cell FOXP3 expressing immunosuppressive function in tumor 17. T follicular helper cell CXCR5 BCL6 supporting B cell responses in tertiary lymphoid structures 18. Th17 helper T cell IL17A RORC inflammatory response in tumor microenvironment 19. CD8-positive CD28-negative regulatory T cell with suppressive function 20. Natural killer T cell NKT innate cytotoxicity with KLRD1 and NKG7 expression 21. NK cell mediated tumor killing through NCR1 and KLRB1 receptor activation 22. B cell CD19 MS4A1 CD79A antigen presentation and humoral immunity in tumor 23. Plasma cell antibody secreting immunoglobulin production SDC1 MZB1 24. Tertiary lymphoid structure B cell and plasma cell formation in ICB-responsive tumors 25. Tumor associated macrophage M2 polarization CD163 MRC1 immunosuppressive function 26. Macrophage complement expression C1QA C1QB and TREM2 in tumor microenvironment 27. Classical monocyte CD14 LYZ infiltration into tumor during checkpoint blockade 28. Dendritic cell antigen presentation CD80 CD86 priming T cell responses 29. Plasmacytoid dendritic cell IRF7 LILRA4 type I interferon production 30. Myeloid cell general CSF1R ITGAM expressing innate immune population 31. Mast cell KIT TPSB2 CPA3 in allergic and inflammatory tumor responses 32. Microglial cell brain resident macrophage in melanoma brain metastasis 33. Cancer associated fibroblast FAP ACTA2 COL1A1 producing extracellular matrix 34. Myofibroblast ACTA2 TAGLN contractile smooth muscle actin expression in tumor stroma 35. Tumor endothelial cell PECAM1 CDH5 VWF vascular marker expression 36. Melanocyte pigmentation pathway MITF TYR TYRP1 DCT lineage genes 37. Hematopoietic multipotent progenitor cell stem cell marker expression 38. PD-1 blockade restoring effector CD8 T cell anti-tumor cytotoxicity 39. CTLA-4 blockade enhancing CD4 helper T cell and reducing Treg suppression 40. T cell clonal replacement and expansion following PD-1 checkpoint inhibition 41. TCF4 dependent resistance program in mesenchymal-like melanoma cells 42. T cell exclusion program in tumor cells resisting checkpoint blockade therapy 43. Antigen processing and MHC class I presentation in tumor cells 44. MHC class I antigen presentation by professional antigen presenting cells 45. Interferon gamma response driving PD-L1 upregulation on tumor cells 46. Tumor infiltrating lymphocyte diversity including T B and NK cells 47. Liver cancer hepatocellular carcinoma markers ALB AFP GPC3 in ICB dataset 48. Clear cell renal carcinoma CA9 PAX8 markers in kidney cancer patients 49. Basal cell carcinoma Hedgehog pathway PTCH1 GLI1 GLI2 SHH signaling 50. Lymphocyte general population in tumor immune microenvironment J.2 Expression Queries 1. CD274 PDCD1LG2 B2M HLA-A CD47 IDO1 VEGFA 2. MITF MLANA PMEL TYR DCT SOX10 TYRP1 3. EPCAM KRT8 KRT18 KRT19 MUC1 CDH1 ESR1 4. MKI67 TOP2A PCNA CD274 B2M TGFB1 5. PRF1 GZMA GZMB GZMK GNLY NKG7 IFNG 6. GZMB PRF1 IFNG TNF FASLG NKG7 CD8A 7. CD69 ICOS TNFRSF9 IFNG GZMB CD8A 8. PDCD1 LAG3 HAVCR2 TIGIT TOX ENTPD1 9. TOX TOX2 PDCD1 HAVCR2 LAG3 TIGIT BTLA 10. TCF7 LEF1 CCR7 SELL IL7R CD8A CD8B 11. CCR7 SELL TCF7 LEF1 IL7R CD3D 12. CD4 CD3D CD3E IL7R CD28 ICOS TCF7 13. FOXP3 IL2RA CTLA4 IKZF2 TNFRSF18 TIGIT 14. CXCR5 BCL6 ICOS PDCD1 CD4 CD3D 15. RORC IL17A IL23R CCR6 CD4 CD3E 16. CD8A GZMB PRF1 LAG3 CTLA4 PDCD1 17. KLRD1 KLRK1 NKG7 GNLY PRF1 GZMB NCAM1 18. NCAM1 NCR1 KLRB1 KLRC1 GZMB IFNG 19. CD19 MS4A1 CD79A CD79B HLA-DRA HLA-DRB1 20. SDC1 MZB1 JCHAIN IGHG1 IGKC CD79A 21. CD163 MRC1 MSR1 MARCO CD68 APOE TREM2 22. C1QA C1QB APOE TREM2 CD68 SPP1 23. CD14 FCGR3A S100A8 S100A9 LYZ CSF1R 24. CD80 CD86 CD83 CCR7 HLA-DRA CLEC9A 25. LILRA4 IRF7 IRF8 IL3RA NRP1 26. ITGAM CSF1R CD68 LYZ S100A8 S100A9 27. KIT TPSB2 TPSAB1 CPA3 HPGDS HDC 28. P2RY12 TMEM119 CX3CR1 CSF1R AIF1 29. FAP ACTA2 COL1A1 COL1A2 PDGFRA DCN LUM 30. ACTA2 TAGLN MYH11 COL1A1 PDGFRB VIM 31. PECAM1 CDH5 VWF KDR FLT1 ENG 32. MITF TYR TYRP1 DCT MLANA PMEL SOX10 33. CD34 KIT FLT3 PROM1 THY1 PTPRC 34. CD3D CD3E CD8A CD4 TRAC TRBC1 35. HLA-DRA HLA-DRB1 HLA-DPA1 HLA-DPB1 CD74 CIITA 36. HLA-A HLA-B HLA-C B2M TAP1 TAP2 37. PDCD1 CD274 CTLA4 CD80 CD86 LAG3 HAVCR2 38. CD274 CD47 IDO1 GZMB PRF1 IFNG 39. CD8A CD4 MS4A1 CD68 PECAM1 FAP EPCAM NCAM1 40. GZMB IFNG FOXP3 CD163 CD274 MS4A1 PECAM1 41. ALB AFP GPC3 EPCAM KRT19 42. CA9 PAX8 MME EPCAM VEGFA 43. PTCH1 GLI1 GLI2 EPCAM KRT14 44. ERBB2 ESR1 EPCAM KRT8 KRT18 MUC1 45. CCR7 SELL TCF7 PDCD1 TOX GZMB PRF1 46. IFNG CD274 STAT1 IRF1 B2M HLA-A 47. CD8A CD4 FOXP3 CXCR5 RORC CCR7 KLRD1 CD3D 48. CD68 CD163 CD14 S100A8 CD80 KIT LILRA4 ITGAM 49. FAP ACTA2 PECAM1 CDH5 COL1A1 PDGFRA VWF 50. CD274 GZMB CD68 MS4A1 FAP PECAM1 MITF FOXP3 CD8A KIT LILRA4 Appendix K D6: First-Trimester Human Brain (Mannens et al. [2025] et al.) K.1 Ontology Queries 1. GABAergic inhibitory neuron differentiation in developing human midbrain 2. Midbrain GABAergic neuron OTX2 GATA2 TAL2 transcription factor expression 3. Cortical interneuron derived from medial ganglionic eminence LHX6 DLX2 4. Interneuron diversity parvalbumin somatostatin VIP subtypes developing cortex 5. TAL2 expressing midbrain GABAergic neurons linked to major depressive disorder 6. Lateral and caudal ganglionic eminence interneuron migration in telencephalon 7. Medial ganglionic eminence derived parvalbumin somatostatin interneuron 8. SOX14 expressing midbrain GABAergic neuron thalamic migration 9. Glutamatergic excitatory neuron in developing human telencephalon cortex 10. Telencephalic glutamatergic neuron LHX2 BHLHE22 cortical layer specification 11. Hindbrain glutamatergic neuron ATOH1 MEIS1 cerebellar granule cell 12. Deep layer cortical neuron FEZF2 BCL11B corticospinal projection 13. SATB2 expressing telencephalic excitatory neuron callosal projection 14. Upper layer cortical neuron CUX1 CUX2 RORB intracortical connectivity 15. EMX2 transcription factor dorsal telencephalon glutamatergic identity 16. Purkinje cell differentiation in developing cerebellum PTF1A ESRRB lineage 17. Purkinje neuron ESRRB oestrogen-related nuclear receptor cerebellum specific 18. Cerebellar Purkinje progenitor PTF1A ASCL1 NEUROG2 ventricular zone 19. TFAP2B LHX5 activation of ESRRB enhancer in Purkinje neuroblast 20. RORA FOXP2 EBF3 late Purkinje maturation gene regulatory network 21. Cerebellar granule neuron ATOH1 MEIS1 external granular layer 22. Radial glial cell neural stem cell SOX2 PAX6 NES in developing brain 23. Radial glia to glioblast transition NFI factor maturation NFIA NFIB NFIX 24. Neural progenitor cell proliferation and neurogenesis in ventricular zone 25. Loss of stemness and glial fate restriction by NFI transcription factors 26. Progenitor cell dividing in developing human brain VIM HES1 proliferating 27. Notch signaling DLL1 JAG1 NOTCH1 lateral inhibition neurogenesis 28. Glioblast astrocyte precursor GFAP S100B AQP4 BCAN TNC fetal brain 29. Astrocyte maturation and glial scar markers in developing brain 30. Oligodendrocyte precursor cell OLIG2 PDGFRA SOX10 specification 31. Oligodendrocyte differentiation MBP MOG PLP1 myelination fetal brain 32. Committed oligodendrocyte precursor SOX10 lineage commitment 33. Dopaminergic neuron midbrain TH NR4A2 substantia nigra ventral tegmental area 34. Serotonergic neuron raphe nucleus TPH2 SLC6A4 FEV brainstem 35. FOXA2 LMX1A floor plate derived dopaminergic neuron specification 36. Endothelial cell blood–brain barrier CLDN5 PECAM1 CDH5 fetal brain 37. Pericyte PDGFRB RGS5 FOXF2 cerebral vasculature developing brain 38. Vascular leptomeningeal cell FOXC1 meningeal fibroblast DCN COL1A1 39. Vascular smooth muscle cell ACTA2 MYH11 cerebral artery 40. Microglial cell CX3CR1 P2RY12 TMEM119 brain resident macrophage 41. Border-associated macrophage RUNX1 haematopoietic origin fetal brain 42. Immature T cell and leukocyte infiltration in developing fetal brain 43. Schwann cell MPZ CDH19 SOX10 neural crest derived myelinating peripheral glial 44. Sensory neuron dorsal root ganglion NTRK1 ISL1 peripheral nervous system 45. Glycinergic neuron SLC6A5 GLRA1 inhibitory spinal cord hindbrain 46. Neuroblast immature migrating neuron fetal cortex RBFOX3 NEFM 47. Major depressive disorder MDD midbrain GABAergic neuron NEGR1 LRFN5 48. Schizophrenia cortical interneuron medial ganglionic eminence SATB2 49. Attention deficit hyperactivity disorder ADHD cerebellar Purkinje 50. Autism spectrum disorder hindbrain neuroblast brainstem involvement K.2 Expression Queries 1. GAD1 GAD2 SLC32A1 DLX2 DLX5 LHX6 2. OTX2 GATA2 TAL2 SOX14 GAD2 SLC32A1 3. PVALB SST VIP LAMP5 SNCG ADARB2 4. DLX1 DLX2 DLX5 DLX6 MEIS2 LHX6 5. GAD1 GAD2 SLC32A1 TFAP2B OTX2 6. TAL2 SOX14 GAD2 OTX2 GATA2 7. SLC17A7 SLC17A6 SATB2 TBR1 FEZF2 BCL11B 8. EMX2 LHX2 BHLHE22 CUX1 CUX2 RORB 9. ATOH1 MEIS1 MEIS2 SLC17A6 RBFOX3 10. FEZF2 BCL11B TBR1 SATB2 SLC17A7 11. CUX1 CUX2 RORB LHX2 BHLHE22 EMX2 12. PTF1A ASCL1 NEUROG2 NHLH1 NHLH2 TFAP2B 13. ESRRB RORA PCP4 FOXP2 EBF3 LHX5 14. LHX5 LHX1 PAX2 TFAP2B DMBX1 NHLH2 15. ESRRB PCP4 RORA EBF1 EBF3 FOXP2 LHX1 16. SOX2 PAX6 NES VIM HES1 HES5 FABP7 17. NFIA NFIB NFIX SOX9 FABP7 18. SOX2 HES1 HES5 PAX6 NES VIM 19. NOTCH1 NOTCH2 DLL1 JAG1 HES1 HES5 20. GFAP S100B AQP4 ALDH1L1 BCAN TNC 21. OLIG1 OLIG2 SOX10 PDGFRA CSPG4 22. MBP MOG PLP1 MAG SOX10 23. OLIG2 SOX10 PDGFRA NKX2-2 OLIG1 24. TH DDC SLC6A3 SLC18A2 NR4A2 LMX1A FOXA2 25. FOXA2 LMX1A NR4A2 TH DDC SLC18A2 26. TPH2 SLC6A4 FEV DDC SLC18A2 27. SLC6A5 GLRA1 SLC32A1 GAD1 28. RBFOX3 SNAP25 SYT1 NEFM NEFL TUBB3 29. NEFM NEFL MAP2 TUBB3 SYT1 30. CLDN5 PECAM1 CDH5 ERG FLT1 VWF 31. PDGFRB RGS5 ACTA2 MYH11 COL1A2 32. ACTA2 MYH11 PDGFRB TAGLN 33. DCN LUM COL1A1 COL1A2 FOXC1 COL3A1 34. FOXC1 FOXF2 DCN COL1A2 LUM 35. AIF1 CX3CR1 P2RY12 TMEM119 HEXB CSF1R 36. RUNX1 SPI1 CSF1R AIF1 CD68 37. AIF1 HEXB P2RY12 TMEM119 CX3CR1 38. CD3D CD3E CD3G PTPRC CD2 39. MPZ CDH19 SOX10 MBP PLP1 40. NTRK1 NTRK2 ISL1 PRPH SNAP25 41. RBFOX3 SLC17A6 GAD2 NEFM SNAP25 42. NEFM NEFL RBFOX3 TUBB3 DCX 43. NEGR1 BTN3A2 LRFN5 SCN8A RGS6 MYCN 44. OTX2 GATA2 MEIS2 PRDM10 MYCN 45. CTCF MECP2 Y1 RAD21 SMC3 46. SHH PTCH1 GLI1 GLI2 FOXA2 NKX2-1 47. WNT5A CTNNB1 LEF1 TCF7L2 AXIN2 48. BMP4 BMPR1A SMAD1 ID1 ID3 49. VEGFA KDR FLT1 PDGFB PDGFRB CLDN5 50. SOX2 PAX6 OLIG2 GFAP RBFOX3 GAD2 SLC17A7 Appendix L Example of plot on Cystic Fibrosis Dataset Figure 3: Cell-level UMAP of the cystic fibrosis airway dataset (D1) colored by Cell Ontology annotation. Approximately 96,000 cells are shown across 30 annotated cell types spanning immune (T cells, B cells, NK cells, macrophages, monocytes, dendritic cells, mast cells), epithelial (basal, suprabasal, multiciliated, secretory, goblet, club, ionocyte, neuroendocrine), and stromal (fibroblasts, pericytes, endocardial cells) compartments. Labels are placed at cluster centroids with iterative repulsion to minimize overlap. Figure 4: Expression of HLA-E projected onto the cell-level UMAP of the cystic fibrosis airway dataset (D1). Color intensity (purple gradient) indicates normalized expression level, with non-expressing cells shown in grey. HLA-E is most highly expressed in immune cell clusters, particularly CD8+ T cells and NK cells, consistent with its role as a ligand for the NKG2A inhibitory receptor. Moderate expression is observed across epithelial populations including basal cells, supporting the HLA-E/NKG2A immune checkpoint axis identified by Berg et al.