Paper deep dive

HypeMed: Enhancing Medication Recommendations with Hypergraph-Based Patient Relationships

Xiangxu Zhang, Xiao Zhou, Hongteng Xu, Jianxun Lian

Year: 2026Venue: arXiv preprintArea: cs.IRType: PreprintEmbeddings: 106

Abstract

Abstract:Medication recommendations aim to generate safe and effective medication sets from health records. However, accurately recommending medications hinges on inferring a patient's latent clinical condition from sparse and noisy observations, which requires both (i) preserving the visit-level combinatorial semantics of co-occurring entities and (ii) leveraging informative historical references through effective, visit-conditioned retrieval. Most existing methods fall short in one of both aspects: graph-based modeling often fragments higher-order intra-visit patterns into pairwise relations, while inter-visit augmentation methods commonly exhibit an imbalance between learning a globally stable representation space and performing dynamic retrieval within it. To address these limitations, this paper proposes HypeMed, a two-stage hypergraph-based framework unifying intra-visit coherence modeling and inter-visit augmentation. HypeMed consists of two core modules: MedRep for representation pre-training, and SimMR for similarity-enhanced recommendation. In the first stage, MedRep encodes clinical visits as hyperedges via knowledge-aware contrastive pre-training, creating a globally consistent, retrieval-friendly embedding space. In the second stage, SimMR performs dynamic retrieval within this space, fusing retrieved references with the patient's longitudinal data to refine medication prediction. Evaluation on real-world benchmarks shows that HypeMed outperforms state-of-the-art baselines in both recommendation precision and DDI reduction, simultaneously enhancing the effectiveness and safety of clinical decision support.

PDF

Open source PDF →Open local PDF →

Intelligence

Status: not_run | Model: - | Prompt: - | Confidence: 0%

Entities (0)

No extracted entities yet.

Relation Signals (0)

No relation signals yet.

Cypher Suggestions (0)

No Cypher suggestions yet.

Full Text

105,649 characters extracted from source content.

Expand or collapse full text

HypeMed: Enhancing Medication Recommendations with Hypergraph-Based Patient Relationships Xiangxu Zhang Gaoling School of Artificial Intelligence, Renmin University of ChinaBeijingChina100872 xansar@ruc.edu.cn , Xiao Zhou Gaoling School of Artificial Intelligence, Renmin University of ChinaBeijingChina Beijing Key Laboratory of Research on Large Models and Intelligent GovernanceBeijingChinaEngineering Research Center of Next-Generation Intelligent Search and Recommendation, MOEBeijingChina xiaozhou@ruc.edu.cn , Hongteng Xu Gaoling School of Artificial Intelligence, Renmin University of ChinaBeijingChina100872 hongtengxu313@gmail.com and Jianxun Lian Microsoft Research AsiaBeijingChina100080 jianxun.lian@outlook.com (2024) Abstract. Medication recommendation aims to generate safe and effective medication sets from health records. However, accurately recommending medications hinges on inferring a patient’s latent clinical condition from sparse and noisy observations, which requires both (i) preserving the visit-level combinatorial semantics of co-occurring diagnoses/procedures and (i) leveraging informative historical references through effective, visit-conditioned retrieval. Most existing methods fall short in one of these aspects: graph-based modeling often fragments higher-order intra-visit patterns into pairwise relations, while inter-visit augmentation methods commonly exhibit an imbalance between learning a globally stable representation space and performing dynamic retrieval within it. To address these limitations, this paper proposes HypeMed, a two-stage hypergraph-based framework unifying intra-visit coherence modeling and inter-visit augmentation. HypeMed consists of two components: MedRep for representation pre-training and SimMR for similarity-enhanced recommendation. In the first stage, MedRep encodes clinical visits as hyperedges via knowledge-aware contrastive pre-training, creating a globally consistent, retrieval-friendly embedding space. In the second stage, SimMR performs dynamic retrieval within this space, fusing retrieved references with the patient’s longitudinal data to refine medication prediction. Evaluation on real-world benchmarks shows that HypeMed outperforms state-of-the-art baselines in both recommendation precision and DDI reduction, simultaneously enhancing the effectiveness and safety of clinical decision support. The implementation is publicly available at https://github.com/xansar/HypeMed. Medication Recommendation, Electronic Health Records, Hypergraph †copyright: none†journalyear: 2024†doi: X.X†journal: TOIS†journalvolume: X†journalnumber: X†article: X†isbn: 978-1-4503-X-X/18/06†ccs: Information systems Data mining†ccs: Applied computing Health informatics Figure 1. Overview of Bob’s EHR spanning four medical visits. Each visit includes diagnosis, procedure, and medication codes, illustrating the progression of related respiratory conditions over time. Note that the medication section in Visit-4 is left unfilled, indicating it as the target for medication recommendation. 1. Introduction Healthcare remains a fundamental societal challenge (Braveman and Gottlieb, 2014; Blumenthal et al., 2020; Li et al., 2025b), intensified by rapid urbanization (He and Zhou, 2025; Li et al., 2025a; Zhang et al., 2026; Yong et al., 2026; Li et al., 2025c; Hong et al., 2026) and recurring public health crises (Hong et al., 2025; Zhu et al., 2025a). As a key application of Artificial Intelligence (AI) (Zhu et al., 2025b; Guo et al., 2025b; Yong et al., 2025, ) in healthcare (Jiang et al., 2017; Saraswat et al., 2022), medication recommendation systems (Yang et al., 2021b; Wu et al., 2022; Shang et al., 2019; Zhang et al., 2017; Choi et al., 2016) aim to generate safe and effective medication combinations from patients’ Electronic Health Records (EHRs) (Menachemi and Collum, 2011; Evans, 2016). EHRs provide a longitudinal trace of diagnoses, procedures, and medications across visits (Fig. 1). While EHRs offer a wealth of information, the inherent complexity and high dimensionality of such data make it difficult to capture the complete clinical picture. Despite steady progress (Yang et al., 2021b; Wu et al., 2022; Yang et al., 2023; Kim et al., 2024; Wu et al., 2024, 2023), the central difficulty remains unchanged: medication recommendations hinge on accurately inferring a patient’s latent clinical condition from sparse, noisy observations. The observed medical codes provide only indirect, discrete evidence of an underlying physiological state, making condition reconstruction inherently ambiguous. To approximate the true patient status, a model must integrate evidence not only from the current observations but also from their relationships in EHRs. In this paper, we use patient relationships to refer to two complementary forms of clinical evidence: (i) intra-visit set-level co-occurrence among diagnoses/procedures/medications, which reflects coherent syndromic patterns, and (i) inter-visit connections that link the current visit to relevant visits in the patient trajectory, including prior visits of the same patient and clinically similar visits from other patients. However, existing methods often fail to unify these two forms of evidence, yielding a fragmented and superficial estimate of the underlying condition. From the intra-visit perspective, a dominant bottleneck is semantic fragmentation. Most approaches (Shang et al., 2019; Yang et al., 2021b; Wu et al., 2022; Li et al., 2023) model co-occurrence with simple graphs, breaking inherently high-order clinical patterns into pairwise edges and thus diluting visit-level combinatorial semantics. For instance, Charcot’s triad—fever, right-upper-quadrant pain, and jaundice—is individually non-specific, but their co-occurrence is highly suggestive of acute cholangitis and warrants urgent care (Frossard and Bonvin, 2011; Rumsey et al., 2017). This motivates modeling a visit as a set-level interaction, rather than an accumulation of pairwise relations. From the inter-visit perspective, the key is to retrieve the right historical references that can effectively complement the current visit for decision making (Brooks et al., 1991; Solomon, 2006; Brown, 2016). Existing methods often exhibit an imbalance between static representation learning and dynamic retrieval. Representation-centric approaches (e.g., DGCL (Li et al., 2023), PROMISE (Wu et al., 2024)) mainly focus on learning a globally stable and static embedding space by aggregating historical/external signals, yet they typically lack an explicit mechanism to perform visit-conditioned retrieval tailored to the current context. In contrast, retrieval-centric approaches (e.g., GAMENet (Shang et al., 2019), COGNet (Wu et al., 2022), VITA (Kim et al., 2024), DAPSNet (Wu et al., 2023)) emphasize on-the-fly retrieval in the embedding space to augment the current input, but often rely on representations that are not explicitly optimized for retrieval, making the retrieved neighbors less semantically aligned with the current clinical context. This imbalance motivates a unified design that jointly shapes a retrieval-friendly representation space and performs context-aware retrieval within it. To address these limitations, we propose HypeMed, a hypergraph-based framework that reconstructs latent clinical conditions by unifying intra-visit combinatorial interaction modeling with inter-visit reference augmentation, thereby capturing patient relationships both within and across visits. To mitigate semantic fragmentation, we model each clinical visit as a hyperedge, so that the representation is learned from the set-level co-occurrence of diagnoses and procedures rather than decomposed pairwise links. Based on this structure, we introduce Medical Entity Relevance Representation (MedRep), a knowledge-aware hypergraph contrastive pre-training stage that encodes high-order interactions into a globally consistent embedding space. On top of the MedRep space, to alleviate the representation–retrieval imbalance, we further propose Similar Visit Enhanced Medication Recommendation (SimMR), which performs visit-conditioned dynamic retrieval directly in this hyperedge-aware metric space. This coupled design makes the retrieved neighbors better aligned with the current visit context, enabling HypeMed to incorporate informative historical references and refine the estimated condition for medication recommendation. Extensive experiments on real-world benchmarks (MIMIC-I/IV and eICU) demonstrate that HypeMed consistently outperforms state-of-the-art baselines in recommendation accuracy while reducing drug–drug interaction (DDI) rates, thereby achieving a superior balance of clinical effectiveness and medication safety. Our main contributions are summarized as follows: • We address semantic fragmentation in intra-visit modeling by representing each visit as a hyperedge and proposing a hypergraph-based contrastive pre-training module (MedRep) to encode knowledge-aware entity representations that capture combinatorial semantics. • We alleviate the representation–retrieval imbalance in inter-visit augmentation by designing a similar visit enhanced medication recommendation module (SimMR) that performs visit-conditioned dynamic retrieval directly in the MedRep-optimized embedding space for context-aligned reference aggregation. • Extensive experiments on public MIMIC-I/IV and eICU datasets demonstrate that HypeMed consistently improves recommendation accuracy while reducing the drug–drug interaction (DDI) rate, validating both effectiveness and safety. 2. Related Work 2.1. Medication Recommendation Medication recommendation is an important subdomain of recommender systems (Ma et al., 2024; Guo et al., 2025a; Zhou et al., 2025; Li and Zhou, 2025; Guo et al., 2026). Improving medication recommendation from EHRs hinges on two complementary capabilities: modeling intra-visit clinical context from the current observations, and leveraging inter-visit references from the patient trajectory and clinically similar visits from other patients. Intra-visit reasoning with relational structures. A representative line of work leverages structured relational modeling—often via graph neural networks (GNNs) (Scarselli et al., 2008)—to capture concurrence among diagnoses, procedures, and medications, and to inject domain knowledge such as DDIs for safety-aware prediction (Shang et al., 2019; Yang et al., 2021b; Wu et al., 2022; Li et al., 2023; Chen et al., 2023; Bhoi et al., 2021; Zhang et al., 2023). While effective, these methods predominantly operate on pairwise edges, which can fragment the visit-level context when the clinical meaning arises from the joint presence of multiple entities. Inter-visit augmentation via similarity and retrieval. Early studies mainly adopted instance-based models that recommend medications from a single visit, without explicitly leveraging inter-visit context (Zhang et al., 2017; Gong et al., 2021). Later, longitudinal models incorporated a patient’s historical visits to capture temporal dependencies and improve prediction (Choi et al., 2016; Le et al., 2018; Sun et al., 2022; Chen et al., 2023; Li et al., 2023; Wu et al., 2022). More recently, external augmentation extends the context beyond an individual patient by retrieving similar visits from the broader cohort, yet existing designs often emphasize only one side of the problem. PROMISE (Wu et al., 2024) mainly focuses on learning a globally stable, trajectory-level representation (e.g., via DTW-based aggregation), but provides limited visit-conditioned, on-the-fly retrieval for the current context. In contrast, DAPSNet (Wu et al., 2023) emphasizes dynamic retrieval at prediction time to augment the current visit, but relies on representations that are not explicitly optimized for retrieval, which may weaken the quality of retrieved references. However, these retrieval-based approaches often decouple similarity estimation from representation learning, which can induce a mismatch between the embedding space and the retrieval objective, ultimately limiting the utility of retrieved references. Our approach. To fill these gaps, HypeMed unifies intra-visit coherence modeling with inter-visit reference augmentation in a single hypergraph framework. MedRep represents each visit as a hyperedge and performs knowledge-aware hypergraph contrastive pre-training to encode high-order interactions into a globally consistent, retrieval-friendly embedding space, mitigating semantic fragmentation. Building upon this space, SimMR performs visit-conditioned dynamic retrieval and integrates the retrieved references with the patient trajectory to refine latent condition estimation and improve medication recommendation. 2.2. Hypergraph Contrastive Learning in Recommendation Hypergraph contrastive learning (Lee and Shin, 2023) has recently attracted increasing attention as an effective strategy to mitigate data sparsity and enhance representation robustness in recommender systems. For instance, in the context of session-based recommendation, DHCN (Xia et al., 2021) applies contrastive learning to align session representations derived from hypergraphs and their corresponding line graphs. MHCN (Yu et al., 2021) extends this idea to social recommendation by contrasting local and global node representations, while HCCF (Xia et al., 2022) integrates collaborative filtering with contrastive learning to alleviate over-smoothing and sparsity issues. Collectively, these methods demonstrate the potential of combining hypergraph structures with contrastive objectives to capture richer relational semantics. However, their applications remain largely confined to general recommendation scenarios. This paper adapts and extends hypergraph contrastive learning to medication recommendation, enabling set-level intra-visit modeling and inter-visit reference augmentation in a unified framework. 3. Preliminaries This section introduces the notations and definitions used in this paper and outlines the process of constructing the hypergraph. The main symbols are summarized in Tab. 1. Table 1. Summary of main notations used in this paper. Notation Definition ℛR EHR database containing all patients’ visit sequences t(i)S^(i)_t The t-th visit of patient i (patient index i is omitted when clear) ,,ℳD,P,M Sets of all diagnosis, procedure, and medication codes ,,d,p,m Multi-hot vectors for diagnoses, procedures, and medications v^D, v^P, and ℳv^M Visit representations derived from diagnoses, procedures, and medications ℋ=(,ℰ)H_X=(N_X,E_X) Hypergraph for domain X∈,,ℳX\!∈\!\D,P,M\ ,Z^X,U^X Node and hyperedge embeddings learned from ℋH_X th_t Health status representation of visit t (derived from diagnoses and procedures) histv_hist, simv_sim Representations from the historical-visit and similar-visit channels tv_t Final fused visit representation ty_t Predicted probability vector of medications for visit t A Drug–drug interaction (DDI) adjacency matrix 3.1. Notations and Definitions 3.1.1. Definition of EHR Data. The fundamental unit of EHR data is the patient, and each patient’s EHR consists of multiple visits. Let ℛ=(i)i=1NR=\S^(i)\_i=1^N represent the EHR data of N patients. Each patient i has a patient trajectory, represented as an ordered sequence of visits (i)=⟨1,2,…,|(i)|⟩S^(i)= _1,S_2,…,S_|S^(i)| , where each visit t=⟨t,t,t⟩S_t= _t,p_t,m_t contains diagnosis, procedure, and medication multi-hot vectors, with t∈0,1||d_t\!∈\!\0,1\^|D|, t∈0,1||p_t\!∈\!\0,1\^|P|, and t∈0,1|ℳ|m_t\!∈\!\0,1\^|M|. Given the current diagnosis td_t, procedure tp_t, and the EHR history ⟨1,2,…,t−1⟩ _1,S_2,…,S_t-1 , HypeMed aims to predict the medication set tm_t for the current visit. Patient indices (i)(i) and domain subscripts are omitted when clear from context. 3.1.2. Definition of Hypergraph. A hypergraph is denoted as ℋ=(,ℰ)H=(N,E), where N and ℰE are the sets of nodes and hyperedges, respectively. Fig. 2 illustrates an example hypergraph with four nodes and two hyperedges, where =nii=14N=\n_i\_i=1^4 and ℰ=eii=12E=\e_i\_i=1^2. Each hyperedge can connect more than two nodes; for example, e1e_1 connects n1n_1, n2n_2, and n3n_3. (ni)N(n_i) denotes the set of hyperedges incident to node nin_i, and (ei)N(e_i) represents the set of nodes contained in hyperedge eie_i. Figure 2. An example of a hypergraph with four nodes and two hyperedges: e1e_1 connects nodes n1n_1, n2n_2, and n3n_3, while e2e_2 connects nodes n3n_3 and n4n_4. Algorithm 1 Hypergraph Construction for Diagnosis, Procedure, and Medication 1:function ConstructHypergraphs(ℋXX∈,,ℳ,ℛ\H_X\_X∈\D,P,M\,R) 2: N,N,Nℳ←∅N_D,N_P,N_M← ⊳ Initialize node sets 3: E,E,Eℳ←∅E_D,E_P,E_M← ⊳ Initialize hyperedge sets 4: AddEntitiesToHypergraph(,N,E,ℛD,N_D,E_D,R) 5: AddEntitiesToHypergraph(,N,E,ℛP,N_P,E_P,R) 6: AddEntitiesToHypergraph(ℳ,Nℳ,Eℳ,ℛM,N_M,E_M,R) 7: return (N,E),(N,E),(Nℳ,Eℳ)(N_D,E_D),(N_P,E_P),(N_M,E_M) 8:end function 9:procedure AddEntitiesToHypergraph(,N,E,ℛX,N_X,E_X,R) 10: for all entities x∈x do 11: Add x to N_X 12: end for 13: for all visits t(i)∈ℛS_t^(i) do 14: e←e^X← hyperedge for entities in t(i)S_t^(i) of type X 15: Add e^X to E_X 16: end for 17:end procedure 3.2. EHR Hypergraph Construction When constructing the EHR hypergraph, we do not merge all medical entities into a single heterogeneous hypergraph. Instead, Alg. 1 outlines the process for building separate hypergraphs for each domain: diagnosis, procedure, and medication. This design is motivated by three factors: context specificity, complexity management, and scalability, as detailed below. • Context Specificity: Diagnoses, procedures, and medications each carry distinct medical information and contextual meanings. By constructing separate hypergraphs, we can capture the unique contextual relationships inherent to each type of medical entity more precisely. • Complexity Management: A single visit may contain as many as hundreds of medical entities. Incorporating all these entities into a single heterogeneous hyperedge could lead to excessively high complexity and introduce additional noise, making it more difficult to capture the relationships among nodes. Constructing separate hypergraphs reduces system complexity, making the model easier to train. • Flexibility and Scalability: The structure of independently constructed hypergraphs offers greater flexibility for future model iterations. This design also facilitates extension to new entity types or updated datasets. Taking the diagnosis hypergraph ℋ=(,ℰ)H_D=(N_D,E_D) as an example, it is constructed in two steps: • Node Set Construction: Create nodes for all diagnostic entities in D and add them to the hypergraph node set N_D. • Hyperedge Set Construction: Traverse all visits in the EHR training data. For each diagnostic vector t(i)d_t^(i) in visit t(i)S_t^(i), create a hyperedge e connecting all diagnostic entities contained in that visit, and add e to the hypergraph hyperedge set ℰE_D. The construction procedures for ℋH_P and ℋℳH_M follow the same logic. 4. Framework This section outlines the HypeMed framework, as illustrated in Fig. 3. The architecture primarily comprises the Medical Entity Relevance Representation Stage (MedRep) and the Similar Visit Enhanced Medication Recommendation Stage (SimMR). We also explored an end-to-end variant in early experiments, which resulted in an approximately 3% decrease in Jaccard score and lower training efficiency. Therefore, in this paper we adopt a two-stage training pipeline. We adopt a two-stage training paradigm, consisting of pre-training followed by fine-tuning. First, we pre-train MedRep on the clinical hypergraph with a contrastive objective to learn medical entity (node) and visit (hyperedge) representations. At the start of SimMR, we instantiate two trainable embedding tables for entities and visit hyperedges, initialized with the pre-trained MedRep outputs. During fine-tuning, these embedding tables and all recommender parameters are optimized end-to-end under the recommendation loss. This preserves the semantic structure learned in pre-training while allowing task-specific adaptation at the embedding level, yielding stable optimization and improved downstream performance. The overall training pipeline is illustrated in Alg. 2. Figure 3. Overall architecture of HypeMed. HypeMed comprises two stages: the Medical Entity Relevance Representation Stage (MedRep) and the Similar Visit Enhanced Medication Recommendation Stage (SimMR). MedRep focuses on encoding intra-visit set-level combinatorial semantics into a globally consistent, retrieval-friendly embedding space. SimMR integrates longitudinal history and visit-conditioned retrieved similar visits to refine latent condition estimation. Algorithm 2 Two-Stage Training Pipeline for HypeMed 1:function TrainModel(ℋ,,,ℳH,D,P,M) 2: Initialize KHGE parameters ⊳ Stage 1: MedRep training 3: Construct augmented subgraphs ℋ1,ℋ2H_1,H_2 by dropout on ℋH 4: for epoch=1epoch=1 to erepe_rep do 5: (1,1)←EncodeHypergraph(ℋ1)(Z_1,U_1)← EncodeHypergraph(H_1) ⊳ Encode augmented hypergraph via the hypergraph encoder (Sec. 4.1.1) 6: (2,2)←EncodeHypergraph(ℋ2)(Z_2,U_2)← EncodeHypergraph(H_2) ⊳ Encode augmented hypergraph via the hypergraph encoder (Sec. 4.1.1) 7: Optimize 1,1,2,2Z_1,U_1,Z_2,U_2 using contrastive loss ℒclL_cl ⊳ Eq. (5) 8: end for 9: return ,Z,U ⊳ Final entity and hyperedge embeddings 10: Initialize SimMR with ,Z,U ⊳ Stage 2: SimMR training (embeddings initialized from MedRep and trainable) 11: for epoch=1epoch=1 to erece_rec do 12: for all visits t(i)S^(i)_t do 13: Retrieve k similar visits for simv_sim 14: hist←ComputeHistoricalRepresentation(t(i))v_hist← ComputeHistoricalRepresentation(S^(i)_t) ⊳ Aggregate past visits via temporal attention (Sec. 4.2.2) 15: sim←ComputeSimilarVisitRepresentation(t(i))v_sim← ComputeSimilarVisitRepresentation(S^(i)_t) ⊳ Encode retrieved similar visits (Sec. 4.2.3) 16: Fuse histv_hist and simv_sim to obtain tv_t 17: Optimize with total loss ℒL 18: end for 19: end for 20: return trained SimMR model 21:end function 4.1. MedRep: Medical Entity Relevance Representation In the MedRep stage, we obtain node and hyperedge representations through hypergraph-based contrastive learning. We first generate two augmented subgraphs of the original hypergraph and then apply a knowledge-aware hypergraph encoder (KHGE) to compute node and hyperedge embeddings for both views. Finally, we perform contrastive learning between these two views to enhance embedding quality. By modeling each visit as a hyperedge that jointly connects diagnoses, procedures, and medications, MedRep mitigates the intra-visit semantic fragmentation (Introduction) and yields context-aware entity and visit embeddings. Below we first introduce the encoder, followed by the augmentation and contrastive objectives. 4.1.1. Knowledge-aware Hypergraph Encoder We design a knowledge-aware hypergraph encoder (Fig. 4) to capture contextual information and incorporate prior medical category knowledge. This encoder consists of two complementary modules: (1) a Local Message Passing Network (LMPN) that models neighborhood-level dependencies on the medical hypergraph, and (2) a Knowledge-aware Global Attention Network (KGAN) that incorporates structured medical knowledge (e.g., ICD/ATC hierarchies) to capture global correlations. The outputs of both components are fused via a two-layer feed-forward network with residual connections and layer normalization. The detailed formulations for each module are presented as follows. Figure 4. The detailed architecture of the Knowledge-aware Hypergraph Encoder (KHGE). The encoder consists of a Local Message Passing Network (LMPN) and a Knowledge-aware Global Attention Network (KGAN). The LMPN focuses on message propagation over the medical hypergraph, while the KGAN integrates medical knowledge into a global self-attention mechanism. The two outputs are combined through a feed-forward fusion layer with residual connections. (a) Local Message Passing Network (LMPN). Research on graph neural networks (Scarselli et al., 2008) and hypergraph neural networks (Feng et al., 2019) suggests that node representations derived from local message passing effectively model neighborhood relationships, making nodes connected by edges more similar. Thus, applying local message passing on a medical hypergraph enhances the similarity of medical entity representations within the same visit. The LMPN is implemented via a hypergraph attention layer followed by layer normalization and residual connections, as illustrated in Fig. 4. Local message passing involves two processes: hyperedge representation computation and node representation computation: (1) j(k+1) _j^(k+1) =∑ni∈(ej)αij(k)i(k), = _n_i (e_j) _ij^(k)z_i^(k), i(k+1) _i^(k+1) =∑ej∈(ni)αij(k)j(k), = _e_j (n_i) _ij^(k)u_j^(k), where the attention coefficient αij(k) _ij^(k) controls the message weight between node nin_i and hyperedge eje_j. Before aggregation, a learnable linear transformation is applied to refine both node and hyperedge embeddings. (b) Knowledge-aware Global Attention Network (KGAN). To capture long-range dependencies and inject domain knowledge, we introduce the Knowledge-aware Global Attention Network (KGAN). Medical knowledge from ICD or ATC code hierarchies provides a structured prior indicating semantic closeness among entities. We first transform the hierarchical tree into a knowledge bias matrix , where each element encodes the path distance between two entities and is mapped to a learnable bias parameter. This bias is then incorporated into the multi-head self-attention mechanism as: (2) g(k+1)=Softmax(q(k)(k(k))T+d)v(k). _g^(k+1)=Softmax\! ( W_qZ^(k)(W_kZ^(k))^T+ d )W_vZ^(k). Here, qW_q, kW_k, and vW_v are learnable projection matrices, and provides a prior bias that allows the attention to favor medically related entities. (c) Fusion and Multi-layer Aggregation. We integrate the node representations from LMPN and KGAN through a two-layer feed-forward network (FFN) with residual connections and layer normalization, where ℓ(k+1)Z_ ^(k+1) and g(k+1)Z_g^(k+1) denote the outputs of LMPN and KGAN, respectively. To mitigate over-smoothing, we perform a layer-wise average of intermediate representations: (3) (k+1)=FFN(ℓ(k+1)+g(k+1)), ^(k+1)=FFN\! (Z_ ^(k+1)+Z_g^(k+1) ), =1L∑k=1L(k),=1L∑k=1L(k). = 1L _k=1^LZ^(k), = 1L _k=1^LU^(k). This fusion mechanism effectively combines local relational learning with global knowledge reasoning. 4.1.2. Hypergraph Contrastive Learning Existing research (Lee and Shin, 2023) indicates that hypergraph contrastive learning can enhance the representational capabilities of embeddings. Hypergraph contrastive learning brings the representations of similar nodes and hyperedges closer together while distancing those of dissimilar nodes and hyperedges. This aligns with our expectations for medical entity representation. Hence, we employ a hypergraph contrastive learning approach based on nodes and hyperedges for the training phase of MedRep. Initially, we construct two augmented subgraphs based on the original hypergraph. Subsequently, we compute node and hyperedge representations based on these two subgraphs. Finally, contrastive learning is executed between these two sets of representations. Augmented Subgraphs Construction. EHR data often contains a considerable amount of noise, which can negatively impact the learning of medical entity representations. To address this issue, constructing augmented subgraphs through random dropout of elements in the original visits has become crucial for mitigating the effects of noise (Lee and Shin, 2023; Wu et al., 2021). We independently perform two stochastic augmentations on the original hypergraph to obtain subgraphs ℋ1H_1 and ℋ2H_2, by randomly dropping nodes, incidences, and features with predefined probabilities. Dropped elements are excluded when computing the contrastive objectives. Contrastive Learning. After obtaining two augmented subgraphs ℋ1H_1 and ℋ2H_2, we use the knowledge-aware hypergraph encoder to generate node and hyperedge embeddings for both views: (4) 1,1 _1,U_1 =KHGE(ℋ1), =KHGE(H_1), 2,2 _2,U_2 =KHGE(ℋ2). =KHGE(H_2). To enhance the consistency of multi-level representations, we adopt a three-level contrastive learning strategy encompassing node-, hyperedge-, and membership-level objectives (as illustrated in the upper-left of Fig. 3). This design ensures that (1) nodes sharing similar contexts are mapped closely in the embedding space, (2) hyperedges containing related nodes exhibit similar representations, and (3) membership contrast bridges the alignment between node and hyperedge embeddings. The overall contrastive objective is formulated as: (5) ℒcl=InfoNCE(1,2)+λ1InfoNCE(1,2)+λ2InfoNCE(1,2),L_cl=InfoNCE(Z_1,Z_2)+ _1\,InfoNCE(U_1,U_2)+ _2\,InfoNCE(Z_1,U_2), where λ1 _1 and λ2 _2 balance the contributions of hyperedge-level and membership-level contrastive terms. The InfoNCE loss (Oord et al., 2018) is defined as: (6) InfoNCE(,)=−1N∑i=1Nlog⁡exp⁡(i⋅i/τ)∑j=1Nexp⁡(i⋅j/τ),InfoNCE(U,V)=- 1N _i=1^N \! (u_i\!·\!v_i/τ) _j=1^N (u_i\!·\!v_j/τ), where τ is the temperature parameter controlling the smoothness of the contrastive distribution. During the MedRep stage, we optimize ℒclL_cl to jointly learn node embeddings Z (medical entities) and hyperedge embeddings U (visits), thereby establishing a unified and semantically consistent representation space. Upon completing MedRep, we export the learned embedding matrices for medical entities and visit embeddings (factorized into diagnosis, procedure, and medication subspaces). These matrices are then used in SimMR to initialize two trainable embedding tables (entities and visits), which are updated jointly with the recommender parameters. Specifically, the task-specific representations are obtained as: (7) ,=KHGE(ℋ),∈,,ℳ.Z_X,U_X=KHGE(H_X), ∈\D,P,M\. In this way, the pre-trained embeddings serve as a high-quality initialization for the recommendation task, enabling the model to leverage rich medical correlations captured during the contrastive learning phase. 4.2. SimMR: Similar Visit Enhanced Medication Recommendation In the second stage, we train SimMR based on the representations obtained from MedRep, specifically optimizing for the medication recommendation task. In clinical practice, similar visits are often consulted as external evidence (Brooks et al., 1991; Solomon, 2006; Brown, 2016). Accordingly, SimMR refines latent condition estimation for the current visit by augmenting it with two types of inter-visit references: (i) information from the patient trajectory (temporal causality) and (i) visit-conditioned retrieved similar visits (analogical evidence). We first compute the representation of the current visit and then feed it into two channels to obtain complementary visit representations: a historical channel and a similar-visit channel. Finally, we fuse these two representations and apply a dot-product scorer to produce the medication probabilities. The pipeline is shown in the right panel of Fig. 3. 4.2.1. Visit Representation A single visit may involve multiple domains of medical entities (diagnoses, procedures, medications) that exhibit semantic gaps. We therefore compute domain-specific visit representations for each visit. We first mean-pool the entity embeddings in a given domain to form an initial visit representation (tz_t^X), then refine it via attention between it and the corresponding entity embeddings (like hyperedge computation in MedRep). This yields the domain-specific visit representation and facilitates subsequent similar-visit retrieval. Concretely, for domain ∈,,ℳX\!∈\!\D,P,M\, we compute (8) t=MHA(t,,),v_t^X=MHA\! (z_t^X,Z^X,Z^X ), where MHA(⋅)MHA(·) denotes multi-head attention and tz_t^X is the mean-pooled embedding of the entities in domain X for the t-th visit. We obtain tv_t^D, tv_t^P, and tℳv_t^M accordingly, and denote the medication label as t∈0,1|ℳ|m_t∈\0,1\^|M|. 4.2.2. Historical Information Representation The success of longitudinal medication recommendation methods (Shang et al., 2019; Yang et al., 2021b; Wu et al., 2022) underscores the significance of historical information in the task of medication recommendation. This subsection describes how we compute visit representations using historical information. Two types of information are particularly crucial in a patient trajectory. The first is the patient’s health status information, which includes diagnostic and procedure conditions. These visits directly reflect the patient’s health condition. The second type is medication information. Existing studies (Yang et al., 2021a; Wu et al., 2022) indicate that patients exhibit continuity in medication usage. Medications previously administered are likely to be used again in the future. Hence, modeling medication information can reflect the patient’s historical trends in medication usage. However, considering all historical visits in the patient trajectory is impractical, as not all historical visits are relevant to the current one. Visits from the distant past may not provide helpful references for current medication. Therefore, we employ a window-based visit attention mechanism over the patient trajectory. The following formula can represent this process: (9) hist=MHA(t,t−k:t−1,t−k:t−1),v_hist=MHA\! (h_t,V_t-k:t-1,V_t-k:t-1 ), where th_t is the current health-status embedding (from diagnoses and procedures), and t−k:t−1∈ℝk×dV_t-k:t-1\!∈\!R^k× d stacks the k most recent visit embeddings. The resulting histv_hist captures recent diagnostic, procedural, and medication trends within the temporal window. 4.2.3. Similar Visit Information Representation In the MedRep stage, we obtain embeddings for all training visits (hyperedges) and factorize them into domain-specific subspaces. Given th_t, we retrieve the top-k similar visits in the health-status subspace to ensure consistency with the pre-trained geometry. This design is clinically motivated: we retrieve similar past visits as external evidence, and condition the retrieval on the current visit’s health-status embedding th_t to ensure relevance. Let HU^H and ℳU^M denote the hyperedge embeddings in the health-status (diagnosis/procedure) and medication subspaces, respectively. We aggregate the retrieved evidence via attention: (10) sim=MHA(t,top-kH,top-kℳ),v_sim=MHA\! (h_t,U^H_top-k,U^M_top-k ), where top-kHU^H_top-k and top-kℳU^M_top-k are from the retrieved k nearest visits in the corresponding subspaces. To further improve retrieval alignment, we adopt an in-batch contrastive objective that aligns the current visit’s embeddings with those of the retrieved visits in the same representation spaces: (11) ℒclsim=InfoNCE(t,H)+InfoNCE(tℳ,ℳ),L_cl^sim=InfoNCE\! (h_t,U^H )+InfoNCE\! (v_t^M,U^M ), where both terms use temperature-scaled inner-product similarity, consistent with the pre-trained geometry, and help mitigate representation–retrieval mismatch during fine-tuning by maintaining geometric consistency between the current visit and the retrieved visits. 4.2.4. Channels Fusion We perform representation fusion after obtaining the visit representations from the two channels (historical information and similar visit information). We stack the two channel representations for the same visit and compute their weights using a two-layer MLP followed by a Softmax layer. This process is formulated as follows: (12) t=αhisthist+αsimsim,[αhist,αsim]=Softmax(MLP([hist∥sim])).v_t= _hist\,v_hist+ _sim\,v_sim, [ _hist, _sim]=Softmax\! (MLP([v_hist\|v_sim]) ). To capture information from different perspectives, we apply regularization constraints to them. We encourage the two representations to be decorrelated (approximately orthogonal). Concretely, we minimize their cosine similarity: (13) ℒorth=|hist⋅sim‖hist‖‖sim‖|.L_orth= | v_hist\!·\!v_sim\|v_hist\|\;\|v_sim\| |. 4.2.5. Prediction and Objectives We calculate the probability distribution for recommending medications by assessing the dot-product similarity between the representations of patient visits and medications. This can be expressed as follows: (14) t=σ(t⊤ℳ),y_t=σ\! (v_t Z^M ), where σ(⋅)σ(·) is the logistic function applied element-wise. Given a threshold η, the predicted medication set is (15) m^t=i∣(t)i≥η. m_t=\\,i (y_t)_i≥η\,\. Here, (t)i(y_t)_i denotes the ithi^th element of ty_t. That is, m^t m_t is the set of indices whose predicted probabilities exceed η. We adopt the binary cross-entropy loss, multi-label loss, and DDI loss used in SafeDrug (Yang et al., 2021b), where the binary cross-entropy loss and multi-label loss are related to prediction accuracy, and the DDI loss pertains to the DDI rate. These loss functions are computed as follows: • Binary Cross Entropy Loss is a typical loss function used in machine learning, particularly effective in multi-label classification settings. It calculates the loss by measuring the distance between the model’s predicted probabilities and the actual binary labels, penalizing predictions based on the divergence from the true labels. This makes it highly suitable for scenarios where each instance can belong to multiple classes independently. (16) ℒbce=−∑i=1|ℳ|[t,ilog⁡t,i+(1−t,i)log⁡(1−t,i)].L_bce=- _i=1^|M| [m_t,i _t,i+(1-m_t,i) (1-y_t,i) ]. • Multi-label Loss promotes model predictions where the presence of one label significantly exceeds the probability of an absent label by a specified margin, effectively learning both label occurrence and their relationships in multi-label classification tasks. (17) ℒmulti=1|ℳ|∑i,j:t,i=1,t,j=0max⁡(0, 1−(t,i−t,j)).L_multi= 1|M|\!\! _i,j:m_t,i=1,m_t,j=0\!\! \! (0,\,1-(y_t,i-y_t,j) ). • DDI Loss is designed to minimize the risk of adverse drug-drug interactions in predictive models. It penalizes predicted probabilities assigned to medication pairs with known DDIs, thus reducing the likelihood of recommending harmful medication combinations in clinical settings. (18) ℒddi=∑i,jijt,it,j.L_ddi= _i,jA_ij\,y_t,i\,y_t,j. Here, |ℳ||M| is the number of medications, tm_t is the ground-truth multi-hot label, ty_t is the predicted probability vector, and ∈ℝ|ℳ|×|ℳ|A\!∈\!R^|M|×|M| is the DDI adjacency matrix. The total loss function can be formulated as follows: (19) ℒ=ℒbce+λmultiℒmulti+λddiℒddi+λaux(ℒclsim+ℒorth),L=L_bce+ _multi\,L_multi+ _ddi\,L_ddi+ _aux\,(L_cl^sim+L_orth), where λmulti _multi, λddi _ddi, and λaux _aux are balancing hyperparameters. 5. Experiments To evaluate the performance of HypeMed against state-of-the-art (SOTA) models, we conduct extensive experiments on three widely used and publicly available benchmark datasets: MIMIC-I (Johnson et al., 2016), MIMIC-IV (Johnson et al., 2023), and eICU (Pollard et al., 2018). 5.1. Experimental Settings 5.1.1. Datasets. MIMIC-I111https://physionet.org/content/mimiciii/1.4/ and MIMIC-IV222https://physionet.org/content/mimiciv/2.0/ are large-scale de-identified EHR databases from the Beth Israel Deaconess Medical Center, comprising data from over 40,000 ICU patients across diverse clinical conditions. eICU333https://physionet.org/content/eicu-crd/2.0/ is a multi-center critical care database covering more than 200 hospitals across the United States and is used to assess model generalizability under cross-institutional settings. Following prior work (Wu et al., 2022; Chen et al., 2023; Li et al., 2023; Yang et al., 2023), we adopt the data preprocessing pipeline from SafeDrug (Yang et al., 2021b), which includes filtering low-frequency entities, removing patients with fewer than two visits, and converting codes into standardized ICD and ATC formats. For eICU, medications are recorded using GTC codes. Since procedure codes are unavailable, we use only diagnosis and medication information for this dataset. Moreover, because eICU lacks a DDI graph, we exclude DDI-related losses and metrics during both training and evaluation. The statistics of the processed datasets are summarized in Tab. 2. Following common practice in contrastive representation learning (e.g., TriCL (Lee and Shin, 2023) and SGL (Wu et al., 2021)), we construct two independently augmented views of each hypergraph to form positive pairs for contrastive supervision. Empirically, this setup achieves an optimal balance between representation diversity and computational efficiency. Table 2. Statistics of processed MIMIC-I/MIMIC-IV and eICU. Item MIMIC-I MIMIC-IV eICU # of patients 6,350 9,036 7,855 # of visits 15,032 20,616 16,869 avg. # of visits 2.37 2.28 2.15 # of unique diag. codes 1,958 1,892 692 # of unique proc. codes 1,430 4,939 - # of unique med. codes 131 131 46 avg. # of diag. per visit 10.51 13.62 4.32 avg. # of proc. per visit 3.84 3.55 - avg. # of med. per visit 11.44 10.29 10.59 5.1.2. Evaluation. To comprehensively evaluate model performance, we employ five standard metrics: Jaccard Similarity ↑ (%), F1-score ↑ (%), and Precision-Recall AUC (PRAUC) ↑ (%) for accuracy; DDI Rate ↓ (%) for safety; and the average number of prescribed medications (# Med.) ↓ for prescription compactness. The ground-truth DDI rates are 8.68% and 7.24% for the MIMIC-I and MIMIC-IV datasets, respectively.444Due to discrepancies across prior reports (Wu et al., 2022; Li et al., 2023; Sun et al., 2022), we recalculated the DDI ground truth following the standard procedure and provide reference code for reproducibility. Given that the DDI ground truth is non-zero, our objective is not to eliminate DDIs entirely, but to ensure that the model’s recommended medication combinations exhibit a DDI rate below the empirical human benchmark. Symbols ↑ and ↓ respectively indicate higher is better and lower is better. To ensure a fair and consistent comparison, we adopt the conventional evaluation metrics used in the domain of medication recommendation (Shang et al., 2019; Yang et al., 2021b; Wu et al., 2022). The calculation methods for each metric are as follows: • Jaccard Similarity between the predicted and ground-truth medication sets for an individual patient is computed as: (20) Jaccard=1T∑t=1T|mt∩m^t||mt∪m^t|,Jaccard= 1T _t=1^T |m_t∩ m_t||m_t∪ m_t|, where mtm_t denotes the ground-truth medication set and m^t m_t the predicted set at visit t. • F1-score is defined as: (21) F1=1T∑t=1T2PtRtPt+Rt,Pt=|mt∩m^t||m^t|,Rt=|mt∩m^t||mt|.F1= 1T _t=1^T 2P_tR_tP_t+R_t, P_t= |m_t∩ m_t|| m_t|, R_t= |m_t∩ m_t||m_t|. Here, PtP_t and RtR_t denote the precision and recall for visit t, respectively, and T is the total number of visits. • PRAUC represents the area under the precision–recall curve. Following prior work (Wu et al., 2022; Yang et al., 2021b), we compute it as the mean of per-visit average precision scores: (22) PRAUC=1T∑t=1TAPt,PRAUC= 1T _t=1^TAP_t, where APtAP_t is the average precision of visit t. • DDI Rate measures the proportion of predicted medication pairs that have known drug–drug interactions: (23) DDI=1T∑t=1T∑i=1|m^t|∑j=i+1|m^t|m^t(i),m^t(j)=1∑i=1|m^t|∑j=i+1|m^t|1,DDI= 1T _t=1^T _i=1^| m_t| _j=i+1^| m_t|1\A_ m_t^(i), m_t^(j)=1\ _i=1^| m_t| _j=i+1^| m_t|1, where ⋅1\·\ denotes the indicator function and A is the adjacency matrix of the DDI graph. • # Med. quantifies the average number of medications prescribed per visit: (24) #Med.=1T∑t=1T|m^t|.\#Med.= 1T _t=1^T| m_t|. First, we calculate these metrics for each patient, and then we compute the average value across all patients to determine the final performance of the method. Following prior work (Yang et al., 2021b; Wu et al., 2022; Chen et al., 2023; Yang et al., 2023), we evaluate all models using a bootstrapping protocol rather than cross-validation. Each model is trained on a fixed training set with hyperparameters tuned on a validation set. During evaluation, we perform ten rounds of bootstrapped testing by randomly sampling 80% of the test set with replacement, reporting the mean and standard deviation across rounds. 5.1.3. Baselines. To demonstrate the effectiveness of HypeMed, we compare it against representative medication recommendation methods, grouped into three categories: machine-learning, instance-based, and longitudinal approaches. • Machine Learning Methods. – LR is a logistic regression–based baseline that performs independent binary classification for each medication label. – ECC (Read et al., 2011) (Ensembled Classifier Chain) encodes diagnosis and procedure sets into multi-hot vectors and applies a chain of SVM classifiers to perform multi-label prediction. The chaining mechanism enables ECC to capture label dependencies and improve overall prediction accuracy. • Instance-based Methods. – LEAP (Zhang et al., 2017) formulates medication recommendation as a sequential decision-making process. It employs a recurrent decoder with content-based attention to model label dependencies across visits. • Longitudinal Methods. – RETAIN (Choi et al., 2016) is an interpretable predictive framework that processes EHR data using a reverse-time attention mechanism, enhancing both prediction accuracy and clinical interpretability. – GAMENet (Shang et al., 2019) integrates a DDI knowledge graph with a memory-augmented GCN to jointly model patient history and medication safety. – SafeDrug (Yang et al., 2021b) recommends effective and safe medication combinations by jointly modeling global and local molecular structures of medications. It also incorporates a controllable loss term to explicitly penalize DDIs. – COGNet (Wu et al., 2022) introduces a “copy-or-predict” mechanism within an encoder–decoder framework. By leveraging historical medication usage, COGNet determines whether to replicate past prescriptions or predict new ones. – Carmen (Chen et al., 2023) is a context-aware medication recommendation framework based on graph neural networks. It fuses molecular structure, contextual information, and DDI graph encoding to enhance both performance and safety.555We were unable to successfully execute Carmen’s official implementation; therefore, we report the results provided in their paper (Chen et al., 2023), obtained under the same data preprocessing pipeline. – DGCL (Li et al., 2023) employs graph contrastive learning to jointly train medication knowledge graphs and EHR graphs, effectively controlling DDI levels and improving recommendation robustness. – VITA (Kim et al., 2024) improves accuracy through relevant-visit selection and target-aware attention mechanisms, which better capture relationships between current and historical visits. – MoleRec (Yang et al., 2023) utilizes molecular substructure–aware encoding and attention mechanisms to generate personalized and safer medication combinations. – DAPSNet (Wu et al., 2023) employs dual attention mechanisms at both code and visit levels to construct comprehensive patient representations. It retrieves information from similar trajectories through a patient memory module and applies an information bottleneck to enhance robustness and safety. – PROMISE (Wu et al., 2024) introduces a pre-trained multimodal framework that integrates structured EHRs and clinical text via hypergraph and language encoders. By combining multimodal pre-training with controllable DDI optimization, it effectively balances accuracy and medication safety. We adopted the hyperparameter configurations recommended in the original papers of all baseline models to ensure fair comparison. For baselines without publicly available implementations, we carefully re-implemented them according to the methodological details described in their papers. Our method was implemented in Python 3.8.16 using PyTorch 1.13.1. Experiments were conducted on a workstation equipped with dual Intel Xeon Gold 5318Y CPUs, 251GB of RAM, and three NVIDIA A40 GPUs. The hyperparameter search space is summarized in Tab. 3. Table 3. Hyperparameter search space. Hyperparameter Range Learning rate [5×10−3, 1×10−4][5× 10^-3,\,1× 10^-4] Weight decay [1×10−4, 1×10−5][1× 10^-4,\,1× 10^-5] Dropout ratio [0, 0.5][0,\,0.5] Epochs 5050 Batch size 1616 Pretraining epochs 150, 300, 500, 1500, 3000\150,\,300,\,500,\,1500,\,3000\ Pretraining learning rate 1×10−3, 5×10−4\1× 10^-3,\,5× 10^-4\ Pretraining weight decay 1×10−4, 1×10−5\1× 10^-4,\,1× 10^-5\ KHGE layers 1, 2, 4\1,\,2,\,4\ Embedding dimension 6464 λmulti _multi [1, 10−6][1,\,10^-6] λddi _ddi [1, 10−6][1,\,10^-6] λaux _aux [1, 10−6][1,\,10^-6] Table 4. Performance comparison on MIMIC-I in terms of Jaccard (%), F1 (%), PRAUC (%), DDI (%) and # Med.. Numbers in bold and underlined indicate the best and the second-best performance, respectively, according to t-tests at the 95% confidence level. The ground-truth rate of DDI in the MIMIC-I dataset is 8.68%. In the DDI column, values in orange indicate that the mean+std exceeds the ground truth (inferior), while values in blue indicate that the mean-std falls below it (superior). Model Jaccard↑ F1↑ PRAUC↑ DDI↓ # Med.↓ LR 49.24±0.2749.24_± 0.27 64.96±0.2564.96_± 0.25 75.48±0.3175.48_± 0.31 8.30±0.07 [rgb]0,0.4,0.68.30_± 0.07 16.05±0.1416.05_± 0.14 ECC 48.89±0.2648.89_± 0.26 64.47±0.2364.47_± 0.23 76.19±0.2076.19_± 0.20 8.57±0.08 [rgb]0,0.4,0.68.57_± 0.08 15.74±0.0715.74_± 0.07 LEAP 45.50±0.1845.50_± 0.18 61.68±0.1761.68_± 0.17 65.46±0.4265.46_± 0.42 7.82±0.08 [rgb]0,0.4,0.67.82_± 0.08 18.57±0.0818.57_± 0.08 RETAIN 48.60±0.3448.60_± 0.34 64.66±0.3464.66_± 0.34 75.98±0.4375.98_± 0.43 8.83±0.10 [rgb]1,0.25,08.83_± 0.10 18.92±0.1318.92_± 0.13 GAMENet 50.18±0.2050.18_± 0.20 65.84±0.1965.84_± 0.19 76.48±0.2376.48_± 0.23 8.88±0.09 [rgb]1,0.25,08.88_± 0.09 27.39±0.1927.39_± 0.19 SafeDrug 51.14±0.2551.14_± 0.25 66.81±0.2366.81_± 0.23 76.43±0.2376.43_± 0.23 5.87±0.04 [rgb]0,0.4,0.65.87_± 0.04 18.94±0.1318.94_± 0.13 DAPSNet 51.68±0.3051.68_± 0.30 67.22±0.2667.22_± 0.26 76.64±0.4076.64_± 0.40 5.96±0.05 [rgb]0,0.4,0.65.96_± 0.05 21.14±0.1521.14_± 0.15 PROMISE 52.33±0.2552.33_± 0.25 67.91±0.2167.91_± 0.21 80.58±0.1780.58_± 0.17 9.33±0.15 [rgb]1,0.25,09.33_± 0.15 15.27±0.1315.27_± 0.13 COGNet 52.94±0.3952.94_± 0.39 68.30±0.3668.30_± 0.36 76.79±0.2576.79_± 0.25 8.62±0.09 [rgb]1,0.25,08.62_± 0.09 28.08±0.1828.08_± 0.18 Carmen 52.67±0.2152.67_± 0.21 68.12±0.1968.12_± 0.19 76.52±0.3676.52_± 0.36 – – DGCL 52.54±0.1552.54_± 0.15 67.98±0.1467.98_± 0.14 77.28±0.1577.28_± 0.15 8.31±0.03 [rgb]0,0.4,0.68.31_± 0.03 21.92±0.0521.92_± 0.05 VITA 52.15±0.2552.15_± 0.25 67.42±0.2167.42_± 0.21 76.23±0.2576.23_± 0.25 8.11±0.09 [rgb]0,0.4,0.68.11_± 0.09 29.53±0.1529.53_± 0.15 MoleRec 53.12¯±0.39 53.12_± 0.39 68.50¯±0.33 68.50_± 0.33 77.51±0.2577.51_± 0.25 7.16±0.07 [rgb]0,0.4,0.67.16_± 0.07 20.54±0.1620.54_± 0.16 HypeMed 53.73±0.3053.73_± 0.30 69.06±0.2569.06_± 0.25 78.34¯±0.31 78.34_± 0.31 8.19±0.07 [rgb]0,0.4,0.68.19_± 0.07 20.01±0.1420.01_± 0.14 Table 5. Performance comparison on MIMIC-IV. The ground-truth DDI rate is 7.24%. Other notations are consistent with Tab. 4. Model Jaccard↑ F1↑ PRAUC↑ DDI↓ # Med.↓ LR 47.65±0.2047.65_± 0.20 63.31±0.2063.31_± 0.20 74.08±0.2274.08_± 0.22 7.07±0.09 [rgb]0,0.4,0.67.07_± 0.09 13.85±0.0913.85_± 0.09 ECC 45.82±0.1645.82_± 0.16 61.39±0.1561.39_± 0.15 73.58±0.1473.58_± 0.14 7.11±0.14 [rgb]0,0.4,0.67.11_± 0.14 13.08±0.1413.08_± 0.14 LEAP 43.73±0.2643.73_± 0.26 59.84±0.2559.84_± 0.25 63.73±0.2663.73_± 0.26 6.57±0.06 [rgb]0,0.4,0.66.57_± 0.06 17.34±0.0717.34_± 0.07 RETAIN 44.10±0.4944.10_± 0.49 60.23±0.5060.23_± 0.50 71.60±0.4071.60_± 0.40 7.29±0.14 [rgb]1,0.25,07.29_± 0.14 14.43±0.1714.43_± 0.17 GAMENet 47.80±0.2647.80_± 0.26 63.56±0.2863.56_± 0.28 75.09±0.2075.09_± 0.20 7.85±0.05 [rgb]1,0.25,07.85_± 0.05 23.36±0.1723.36_± 0.17 SafeDrug 48.47±0.1748.47_± 0.17 64.24±0.1764.24_± 0.17 74.27±0.2274.27_± 0.22 6.11±0.11 [rgb]0,0.4,0.66.11_± 0.11 18.15±0.1118.15_± 0.11 DAPSNet 49.14±0.2249.14_± 0.22 64.85±0.1964.85_± 0.19 74.07±0.1474.07_± 0.14 5.95±0.07 [rgb]0,0.4,0.65.95_± 0.07 18.62±0.1518.62_± 0.15 PROMISE 47.29±0.3047.29_± 0.30 63.29±0.2863.29_± 0.28 76.76±0.2876.76_± 0.28 7.08±0.13 [rgb]0,0.4,0.67.08_± 0.13 11.34±0.0811.34_± 0.08 COGNet 50.38±0.2750.38_± 0.27 65.87±0.2365.87_± 0.23 74.99±0.1574.99_± 0.15 7.79±0.15 [rgb]1,0.25,07.79_± 0.15 25.43±0.1925.43_± 0.19 Carmen 50.06±0.1250.06_± 0.12 65.69±0.0765.69_± 0.07 74.62±0.3074.62_± 0.30 – – DGCL 49.98±0.1349.98_± 0.13 65.62±0.1265.62_± 0.12 75.52±0.1475.52_± 0.14 7.50±0.04 [rgb]1,0.25,07.50_± 0.04 18.41±0.0718.41_± 0.07 VITA 50.31±0.2950.31_± 0.29 65.75±0.2665.75_± 0.26 74.80±0.2074.80_± 0.20 7.45±0.11 [rgb]1,0.25,07.45_± 0.11 26.37±0.1826.37_± 0.18 MoleRec 50.64¯±0.23 50.64_± 0.23 66.23¯±0.21 66.23_± 0.21 75.14±0.3275.14_± 0.32 6.78±0.07 [rgb]0,0.4,0.66.78_± 0.07 17.74±0.1117.74_± 0.11 HypeMed 51.05±0.2851.05_± 0.28 66.50±0.2566.50_± 0.25 76.23¯±0.28 76.23_± 0.28 6.65±0.11 [rgb]0,0.4,0.66.65_± 0.11 16.53±0.1616.53_± 0.16 Table 6. Performance comparison on eICU. There is no DDI information available in the eICU dataset. Other notations are consistent with Tab. 4. Model Jaccard↑ F1↑ PRAUC↑ # Med.↓ LR 39.96±0.2239.96_± 0.22 55.33±0.2155.33_± 0.21 73.24¯±0.36 73.24_± 0.36 7.55±0.077.55_± 0.07 ECC 33.41±0.2633.41_± 0.26 47.66±0.2947.66_± 0.29 72.66±0.2872.66_± 0.28 5.62±0.055.62_± 0.05 LEAP 41.25±0.1841.25_± 0.18 56.43±0.2156.43_± 0.21 62.67±0.2962.67_± 0.29 13.36±0.0413.36_± 0.04 RETAIN 41.29±0.3541.29_± 0.35 56.53±0.3456.53_± 0.34 71.02±0.3571.02_± 0.35 9.81±0.069.81_± 0.06 GAMENet 40.80±0.1740.80_± 0.17 56.00±0.1856.00_± 0.18 68.86±0.1868.86_± 0.18 12.45±0.0312.45_± 0.03 SafeDrug 41.19±0.2341.19_± 0.23 56.42±0.2156.42_± 0.21 69.79±0.3069.79_± 0.30 10.68±0.0610.68_± 0.06 DAPSNet 41.14±0.2841.14_± 0.28 56.44±0.3056.44_± 0.30 69.99±0.3169.99_± 0.31 10.87±0.0610.87_± 0.06 PROMISE 40.89±0.2540.89_± 0.25 56.41±0.2656.41_± 0.26 73.54±0.3273.54_± 0.32 7.16±0.057.16_± 0.05 COGNet 41.19±0.4241.19_± 0.42 56.20±0.4356.20_± 0.43 67.68±0.4167.68_± 0.41 20.14±0.1020.14_± 0.10 DGCL 38.23±0.1438.23_± 0.14 53.38±0.1553.38_± 0.15 69.63±0.1369.63_± 0.13 8.70±0.028.70_± 0.02 VITA 41.46¯±0.26 41.46_± 0.26 56.38±0.2956.38_± 0.29 67.39±0.3767.39_± 0.37 18.47±0.0318.47_± 0.03 MoleRec 41.32±0.2041.32_± 0.20 56.63¯±0.24 56.63_± 0.24 70.14±0.2770.14_± 0.27 10.44±0.0510.44_± 0.05 HypeMed 41.95±0.2341.95_± 0.23 56.97±0.2356.97_± 0.23 68.71±0.2768.71_± 0.27 16.32±0.0216.32_± 0.02 5.2. Performance Comparison As shown in Tab. 4, Tab. 5, and Tab. 6, HypeMed achieves the best or highly competitive overall performance across the three benchmark datasets (MIMIC-I, MIMIC-IV, and eICU), particularly on Jaccard and F1, while maintaining favorable safety/compactness trade-offs, demonstrating strong generalization ability across diverse clinical settings. Machine learning (ML) methods (LR and ECC) achieve relatively low performance, as they rely on simple multi-hot encodings and fail to capture temporal or relational dependencies among medical entities. Instance-based methods (e.g., LEAP) make predictions solely based on the current visit, ignoring both patient history and inter-visit references from similar visits, which further limits their accuracy. Longitudinal methods (e.g., RETAIN, GAMENet, SafeDrug, COGNet, DGCL, and MoleRec) perform substantially better by modeling temporal dependencies within patient histories. Among them, SafeDrug effectively reduces DDI rates by incorporating drug–drug interaction information, though at the cost of slightly lower accuracy. COGNet employs sequence generation with beam search, achieving modest accuracy gains but increasing the number of recommended medications and thus DDI risk. VITA retrieves relevant past visits via target-aware attention, yet its performance varies with dataset temporal structures. Carmen, DGCL, and MoleRec further integrate co-occurrence relationships, contrastive learning, and molecular structures, respectively, to enhance medication representations. However, these models mainly focus on intra-patient historical modeling and overlook inter-visit reference augmentation via visit-conditioned retrieval. Compared with DAPSNet and PROMISE, the performance improvements of HypeMed primarily stem from its representation-consistent design that couples representation learning and visit-conditioned inter-visit retrieval. Although DAPSNet and PROMISE also incorporate similarity information to improve recommendation accuracy, their retrieval mechanisms are decoupled from representation learning: DAPSNet relies on embedding-level similarity and answer-side matching, while PROMISE computes trajectory similarity via DTW followed by hierarchical attention. This separation can lead to a mismatch between the learned embedding space and the retrieval criterion. In contrast, HypeMed performs retrieval and aggregation within the same hypergraph representation space, enabling coherent use of inter-visit references to refine latent condition estimation and medication prediction. This representation–retrieval consistency ensures semantically aligned visit matching and contributes to the observed performance gains across multiple evaluation metrics. Consequently, HypeMed surpasses the strongest baselines on the primary recommendation metrics across all three datasets, achieving Jaccard score improvements of +0.61%+0.61\%, +0.41%+0.41\%, and +0.49%+0.49\% on MIMIC-I, MIMIC-IV, and eICU, respectively. Moreover, HypeMed maintains DDI rates below or comparable to real-world prescriptions (8.68% for MIMIC-I and 7.24% for MIMIC-IV), indicating that its superior accuracy does not compromise safety (on MIMIC-I/IV where DDI graphs are available). Notably, the results on the eICU dataset further demonstrate HypeMed’s robustness and generalization capability on cross-domain, cross-institution, and cross-population datasets. Overall, HypeMed achieves a favorable balance between accuracy and safety, underscoring its potential as a reliable clinical decision-support system. Table 7. Ablation study on MIMIC-I and MIMIC-IV. Results are reported for both MedRep and SimMR ablations. Jaccard, F1, PRAUC, and DDI are expressed as percentages (%), while #Med. retains its original scale. Values denote mean ± standard deviation over ten bootstrap sampling. MIMIC-I Model Jaccard↑ F1↑ PRAUC↑ DDI↓ #Med.↓ HypeMed 53.10±0.4353.10_± 0.43 68.49±0.3768.49_± 0.37 77.32±0.3777.32_± 0.37 6.67±0.066.67_± 0.06 22.77±0.1322.77_± 0.13 – SimMR– w/o Sim. 52.91±0.4252.91_± 0.42 68.33±0.3768.33_± 0.37 77.44±0.3477.44_± 0.34 6.40±0.066.40_± 0.06 22.50±0.1322.50_± 0.13 – SimMR– w/o Hist. 51.90±0.4151.90_± 0.41 67.40±0.3767.40_± 0.37 76.40±0.4476.40_± 0.44 6.12±0.066.12_± 0.06 23.16±0.1723.16_± 0.17 – MedRep–None 51.70±0.4651.70_± 0.46 67.23±0.4267.23_± 0.42 76.28±0.3676.28_± 0.36 6.52±0.066.52_± 0.06 23.51±0.1423.51_± 0.14 – MedRep–Fixed 50.49±0.2350.49_± 0.23 66.24±0.2166.24_± 0.21 75.44±0.3375.44_± 0.33 6.45±0.046.45_± 0.04 22.74±0.1322.74_± 0.13 – MedRep–HGCN 52.17±0.4552.17_± 0.45 67.67±0.4067.67_± 0.40 75.66±0.3575.66_± 0.35 6.01±0.056.01_± 0.05 22.37±0.1322.37_± 0.13 – MedRep–GCN 51.40±0.4051.40_± 0.40 66.96±0.3666.96_± 0.36 76.31±0.3876.31_± 0.38 6.05±0.056.05_± 0.05 23.50±0.1223.50_± 0.12 MIMIC-IV Model Jaccard↑ F1↑ PRAUC↑ DDI↓ #Med.↓ HypeMed 50.92±0.2550.92_± 0.25 66.35±0.2466.35_± 0.24 75.96±0.2675.96_± 0.26 6.60±0.136.60_± 0.13 16.68±0.1116.68_± 0.11 – SimMR–w/o Sim. 50.52±0.2650.52_± 0.26 66.03±0.2366.03_± 0.23 75.91±0.2475.91_± 0.24 6.79±0.156.79_± 0.15 16.61±0.1216.61_± 0.12 – SimMR–w/o Hist. 49.98±0.2349.98_± 0.23 65.52±0.2165.52_± 0.21 75.19±0.2575.19_± 0.25 6.47±0.116.47_± 0.11 16.26±0.1116.26_± 0.11 – MedRep–None 50.15±0.2550.15_± 0.25 65.68±0.2365.68_± 0.23 75.47±0.2575.47_± 0.25 6.65±0.116.65_± 0.11 16.61±0.1116.61_± 0.11 – MedRep–Fixed 46.68±0.2146.68_± 0.21 62.58±0.2262.58_± 0.22 72.99±0.2772.99_± 0.27 6.01±0.076.01_± 0.07 21.11±0.1221.11_± 0.12 – MedRep–HGCN 50.21±0.2350.21_± 0.23 65.76±0.2265.76_± 0.22 75.54±0.3075.54_± 0.30 6.48±0.126.48_± 0.12 16.63±0.1216.63_± 0.12 – MedRep–GCN 50.25±0.2050.25_± 0.20 65.77±0.1965.77_± 0.19 75.61±0.2375.61_± 0.23 6.48±0.126.48_± 0.12 16.95±0.1116.95_± 0.11 5.3. Ablation Study To evaluate the contribution of each component in HypeMed, we conduct comprehensive ablation studies on the MIMIC-I (Johnson et al., 2016) and MIMIC-IV (Johnson et al., 2023) datasets. We construct several model variants as follows: (1) HypeMed–MedRep–None initializes entity and visit representations randomly, removing the pretrained entity and visit representations from MedRep; (2) HypeMed–MedRep–Fixed keeps the embeddings generated by MedRep frozen during SimMR training, preventing task-specific adaptation; (3) HypeMed–MedRep–HGCN replaces the KHGE encoder in MedRep with a standard HGCN (Feng et al., 2019); (4) HypeMed–MedRep–GCN replaces the KHGE encoder with a conventional GCN (Scarselli et al., 2008); (5) HypeMed–SimMR–w/o Hist. removes the historical information encoder, relying solely on the similarity channel; and (6) HypeMed–SimMR–w/o Sim. disables the similarity channel, preserving only the historical encoder. From the results, several key observations can be drawn. First, all ablated variants show degraded performance compared to the full HypeMed, confirming the overall effectiveness and complementarity of its components. Second, removing MedRep (HypeMed–MedRep–None) leads to a notable decline (e.g., Jaccard decreases from 53.10→51.7053.10→ 51.70 in MIMIC-I), highlighting the essential role of pretrained representations in encoding medical knowledge. Notably, the HypeMed–MedRep–Fixed variant, which freezes the pretrained embeddings without further optimization, exhibits a more pronounced decrease compared to the jointly optimized model. This confirms that fine-tuning the pretrained representations during SimMR training is crucial for aligning them with the downstream recommendation objective. Furthermore, substituting the KHGE encoder with HGCN or GCN (HypeMed–MedRep–HGCN / HypeMed–MedRep–GCN) produces only marginal gains (Jaccard: 50.15→50.2550.15→ 50.25 in MIMIC-IV) over the random baseline, indicating that conventional graph encoders are less capable of capturing intra-visit combinatorial semantics among medications. Finally, removing the historical channel (HypeMed–SimMR–w/o Hist.) results in the largest degradation (e.g., F1: 68.49→67.4068.49→ 67.40 / 66.35→65.5266.35→ 65.52 in MIMIC-I/IV respectively), underscoring the critical importance of longitudinal information. Nevertheless, the two channels are complementary, and their integration yields the best overall performance. 5.4. Hyperparameters Analysis In HypeMed, three hyperparameters potentially influence the final performance: (1) the number of pretraining epochs in the MedRep stage, (2) the number of retrieved similar visits (Top-n) in SimMR, and (3) the number of augmented hypergraph views used for contrastive learning. We conduct a comprehensive sensitivity analysis to evaluate their effects, with results summarized in Tab. 8, Tab. 9, and Tab. 10. All experiments evaluate five metrics—Jaccard, F1, PRAUC, DDI rate, and average number of medications—while keeping all other settings fixed. Table 8. Pretraining-epoch sensitivity analysis on MIMIC-I and MIMIC-IV. Jaccard, F1, PRAUC, and DDI are reported in percentage (%), while #Med. retains its original scale. Values denote mean ± standard deviation over ten bootstrap sampling. Dataset Epoch Jaccard↑ F1↑ PRAUC↑ DDI↓ #Med.↓ MIMIC-I 0 52.27±0.2552.27_± 0.25 67.75±0.2267.75_± 0.22 77.08±0.3277.08_± 0.32 7.93±0.107.93_± 0.10 21.90±0.1721.90_± 0.17 100 52.48±0.2352.48_± 0.23 67.97±0.2067.97_± 0.20 76.10±0.2976.10_± 0.29 6.00±0.046.00_± 0.04 21.93±0.1021.93_± 0.10 300 52.26±0.2552.26_± 0.25 67.78±0.2267.78_± 0.22 76.27±0.3976.27_± 0.39 6.06±0.046.06_± 0.04 22.74±0.1122.74_± 0.11 1000 51.91±0.2251.91_± 0.22 67.41±0.1967.41_± 0.19 75.95±0.3775.95_± 0.37 6.28±0.046.28_± 0.04 22.95±0.1422.95_± 0.14 MIMIC-IV 0 49.87±0.2349.87_± 0.23 65.48±0.2365.48_± 0.23 75.60±0.2775.60_± 0.27 6.30±0.106.30_± 0.10 16.50±0.1416.50_± 0.14 100 50.03±0.2350.03_± 0.23 65.65±0.2265.65_± 0.22 75.87±0.2775.87_± 0.27 6.34±0.076.34_± 0.07 16.81±0.1716.81_± 0.17 300 50.17±0.3350.17_± 0.33 65.75±0.3265.75_± 0.32 75.92±0.2775.92_± 0.27 6.39±0.096.39_± 0.09 16.32±0.1516.32_± 0.15 1000 49.66±0.2649.66_± 0.26 65.30±0.2665.30_± 0.26 75.67±0.2575.67_± 0.25 6.46±0.096.46_± 0.09 16.20±0.1616.20_± 0.16 (1) Number of Pretraining Epochs. We vary the number of pretraining epochs over 0,100,300,1000\0,100,300,1000\ to examine the effect of contrastive pretraining duration. As shown in Tab. 8, performance on both datasets first improves and then slightly declines as the number of epochs increases, indicating that moderate pretraining enhances representation quality, whereas excessive training may lead to overfitting or representation collapse. Table 9. Top-n sensitivity analysis on MIMIC-I and MIMIC-IV. Jaccard, F1, PRAUC, and DDI are expressed as percentages (%), while #Med. retains its original scale. Values denote mean ± standard deviation over ten bootstrap sampling. Dataset Top-n Jaccard↑ F1↑ PRAUC↑ DDI↓ #Med.↓ MIMIC-I 0 53.10±0.4353.10_± 0.43 68.49±0.3768.49_± 0.37 77.32±0.3777.32_± 0.37 6.67±0.066.67_± 0.06 22.77±0.1322.77_± 0.13 1 53.23±0.4053.23_± 0.40 68.59±0.3568.59_± 0.35 77.50±0.3577.50_± 0.35 6.52±0.046.52_± 0.04 22.58±0.1322.58_± 0.13 10 52.93±0.3752.93_± 0.37 68.33±0.3368.33_± 0.33 77.43±0.3877.43_± 0.38 6.25±0.056.25_± 0.05 22.63±0.1322.63_± 0.13 100 52.88±0.4052.88_± 0.40 68.27±0.3568.27_± 0.35 77.20±0.3677.20_± 0.36 6.44±0.066.44_± 0.06 23.31±0.1323.31_± 0.13 500 52.80±0.3852.80_± 0.38 68.22±0.3468.22_± 0.34 77.13±0.3677.13_± 0.36 6.39±0.066.39_± 0.06 23.20±0.1223.20_± 0.12 1000 52.86±0.3752.86_± 0.37 68.26±0.3368.26_± 0.33 77.16±0.3977.16_± 0.39 6.40±0.056.40_± 0.05 22.97±0.1222.97_± 0.12 MIMIC-IV 0 50.52±0.2650.52_± 0.26 66.03±0.2366.03_± 0.23 75.91±0.2475.91_± 0.24 6.79±0.156.79_± 0.15 16.61±0.1216.61_± 0.12 1 50.51±0.2550.51_± 0.25 66.01±0.2466.01_± 0.24 75.79±0.2175.79_± 0.21 6.69±0.126.69_± 0.12 16.93±0.1116.93_± 0.11 10 50.39±0.2350.39_± 0.23 65.90±0.2165.90_± 0.21 75.98±0.2375.98_± 0.23 6.49±0.126.49_± 0.12 16.69±0.1116.69_± 0.11 100 50.70±0.1850.70_± 0.18 66.23±0.1766.23_± 0.17 76.18±0.2376.18_± 0.23 6.49±0.136.49_± 0.13 16.80±0.1216.80_± 0.12 500 50.77±0.2150.77_± 0.21 66.29±0.1966.29_± 0.19 76.30±0.2376.30_± 0.23 6.52±0.136.52_± 0.13 16.51±0.1216.51_± 0.12 1000 50.72±0.1550.72_± 0.15 66.22±0.1466.22_± 0.14 76.25±0.2276.25_± 0.22 6.54±0.106.54_± 0.10 16.42±0.1216.42_± 0.12 (2) Number of Retrieved Similar Visits. To assess the impact of inter-visit reference retrieval, we vary the number of retrieved similar visits (Top-n) in 0,1,10,100,500,1000\0,1,10,100,500,1000\. As shown in Tab. 9, incorporating a small number of similar visits significantly enhances predictive performance, whereas a large n introduces noise or less relevant samples, leading to marginal declines in Jaccard and F1. Nevertheless, the DDI rate and average medication count remain stable, suggesting that HypeMed achieves a robust balance between accuracy and safety. Table 10. Performance and GPU memory usage under different numbers of views on MIMIC-I and MIMIC-IV. Jaccard, F1, PRAUC, and DDI are reported in percentage (%); #Med. and GPU Memory (GiB) keep their original scales. Dataset #Views Jaccard↑ F1↑ PRAUC↑ DDI↓ #Med.↓ GPU Mem. MIMIC-I 22 51.13±0.1751.13_± 0.17 66.77±0.1666.77_± 0.16 75.73±0.3475.73_± 0.34 6.50±0.056.50_± 0.05 23.80±0.1423.80_± 0.14 9.769.76 33 51.26±0.2251.26_± 0.22 66.85±0.1866.85_± 0.18 75.86±0.4075.86_± 0.40 6.23±0.056.23_± 0.05 24.60±0.1224.60_± 0.12 14.0914.09 44 51.05±0.2051.05_± 0.20 66.67±0.1966.67_± 0.19 75.89±0.4375.89_± 0.43 6.33±0.046.33_± 0.04 23.49±0.1223.49_± 0.12 24.5324.53 MIMIC-IV 22 48.67±0.2348.67_± 0.23 64.40±0.2264.40_± 0.22 73.57±0.3273.57_± 0.32 5.90±0.075.90_± 0.07 20.93±0.1720.93_± 0.17 14.4614.46 33 48.61±0.2148.61_± 0.21 64.34±0.2064.34_± 0.20 73.44±0.2773.44_± 0.27 5.99±0.065.99_± 0.06 21.19±0.1821.19_± 0.18 23.7523.75 44 48.66±0.2548.66_± 0.25 64.39±0.2564.39_± 0.25 73.31±0.2973.31_± 0.29 6.03±0.076.03_± 0.07 20.87±0.1720.87_± 0.17 38.6138.61 (3) Number of Augmented Views. In lines 5–6 of Alg. 2, HypeMed generates two augmented hypergraph views for contrastive pretraining in the MedRep stage. This design follows standard contrastive frameworks (Lee and Shin, 2023; Wu et al., 2021), where two correlated views are typically sufficient to learn discriminative yet invariant representations. To validate this choice, we vary the number of augmented views over 2,3,4\2,3,4\. As shown in Tab. 10, adding more views provides negligible improvements or even slight degradation, while GPU memory usage increases sharply (from 9.8→ 24.5 GiB on MIMIC-I and 14.5→ 38.6 GiB on MIMIC-IV). These findings suggest that two views strike the best balance between performance and computational efficiency, supporting the design adopted in Alg. 2. (a) MIMIC-I (b) MIMIC-IV Figure 5. Visualization of adaptive channel weighting and its association with model performance across different visit lengths. Blue and orange dots represent per-visit gate weights for the history and similarity channels, respectively. Black and purple lines indicate the Spearman correlations (ρ) between Jaccard performance and each channel’s weight, computed within visit-length groups. To ensure statistical reliability, visit-length groups with fewer than 20 samples were excluded. 5.5. Interpretability analysis To better understand how HypeMed balances information from different sources, we visualize the per-visit gate weights of the two channels learned by the fusion module (blue: history channel; orange: similarity channel) in Fig. 5, along with their Spearman correlations (ρ) with predictive performance (Jaccard). Each dot represents a single visit. In both datasets, the similarity channel generally receives smaller gate weights than the history channel, since the latter always incorporates the current visit’s diagnosis and procedure information—even for single-visit patients—whereas the similarity channel depends on retrieved visits and provides supplementary context when longitudinal data are sparse. This finding aligns with our ablation results, where removing the history channel led to a larger performance degradation. However, the strength of association between each channel and predictive accuracy varies across datasets. In MIMIC-I, where patient trajectories are shorter and sparser, the similarity-channel weight correlates positively with performance during early visits (1–4), suggesting that retrieved visits compensate for limited historical context. Once sufficient history accumulates (≥ 5 visits), the correlation shifts toward the history channel, indicating increasing reliance on personal longitudinal information. In contrast, MIMIC-IV contains richer and longer patient trajectories, and its performance correlates more strongly with the history channel across all visit lengths. Overall, the correlation between predictive accuracy and the similarity-channel weight exhibits a general downward trend as visit length increases, indicating that HypeMed gradually shifts its reliance from retrieved similar visits to accumulated patient-specific history. This visualization thus provides an intrinsic, model-level diagnostic perspective on how the relative contributions of different information sources evolve with increasing data completeness. Table 11. Performance comparison under the cold-start (first-visit) setting on MIMIC-I and MIMIC-IV. All metrics are reported as percentages (%), except for the average number of medications (#Med.). Values denote mean ± standard deviation over ten bootstrapped runs. A total of 192 and 307 patients have only one visit in MIMIC-I and MIMIC-IV, respectively. Dataset Model Jaccard↑ F1↑ PRAUC↑ DDI↓ #Med.↓ MIMIC-I SafeDrug 49.47±1.2449.47_± 1.24 65.22±1.0365.22_± 1.03 74.92±0.9174.92_± 0.91 6.22±0.206.22_± 0.20 18.01±0.2618.01_± 0.26 DAPSNet 50.68±0.9550.68_± 0.95 66.20±0.7966.20_± 0.79 74.90±1.0074.90_± 1.00 6.12±0.116.12_± 0.11 19.92±0.3319.92_± 0.33 COGNet 51.19±1.2051.19_± 1.20 66.64±1.1166.64_± 1.11 75.62±0.6975.62_± 0.69 8.62±0.158.62_± 0.15 24.90±0.2724.90_± 0.27 HypeMed 51.60±0.8651.60_± 0.86 67.14±0.7367.14_± 0.73 75.83±0.9875.83_± 0.98 8.65±0.298.65_± 0.29 20.04±0.3120.04_± 0.31 HypeMed -SimMR- w/o Hist. 49.93±1.0049.93_± 1.00 65.46±0.9265.46_± 0.92 74.78±1.0474.78_± 1.04 6.19±0.256.19_± 0.25 23.44±0.3223.44_± 0.32 HypeMed -SimMR- w/o Sim. 49.52±1.0849.52_± 1.08 65.24±0.9865.24_± 0.98 74.62±0.9674.62_± 0.96 6.35±0.256.35_± 0.25 22.43±0.2822.43_± 0.28 MIMIC-IV SafeDrug 44.89±1.1644.89_± 1.16 60.61±1.1460.61_± 1.14 66.83±1.3466.83_± 1.34 6.41±0.246.41_± 0.24 17.58±0.3117.58_± 0.31 DAPSNet 49.49±1.1049.49_± 1.10 65.23±1.0165.23_± 1.01 73.71±1.0573.71_± 1.05 6.07±0.226.07_± 0.22 17.49±0.2517.49_± 0.25 COGNet 46.91±0.4846.91_± 0.48 62.64±0.4662.64_± 0.46 72.82±0.7072.82_± 0.70 8.90±0.178.90_± 0.17 18.67±0.3918.67_± 0.39 HypeMed 50.30±0.9450.30_± 0.94 65.78±0.8665.78_± 0.86 75.19±1.1175.19_± 1.11 6.81±0.376.81_± 0.37 16.31±0.2716.31_± 0.27 HypeMed -SimMR- w/o Hist. 49.84±0.9849.84_± 0.98 65.39±0.8965.39_± 0.89 74.94±1.1374.94_± 1.13 6.61±0.346.61_± 0.34 16.30±0.2516.30_± 0.25 HypeMed -SimMR- w/o Sim. 49.62±1.0049.62_± 1.00 65.15±0.9465.15_± 0.94 74.89±1.1274.89_± 1.12 7.18±0.267.18_± 0.26 16.86±0.3616.86_± 0.36 5.6. Cold-start performance analysis To evaluate whether the proposed similar-visit retrieval mechanism effectively alleviates the cold-start problem, we perform a targeted assessment under the no-history (first-visit) setting, as shown in Tab. 11. Across both MIMIC-I and MIMIC-IV, HypeMed achieves the best performance among all baselines (SafeDrug, DAPSNet, and COGNet) in terms of Jaccard, F1, and PRAUC, demonstrating strong generalization capability even when patient history is unavailable. Compared with its ablated variants, the complete model exhibits clear advantages. Removing the similarity channel (w/o Sim.) leads to consistent declines in all major metrics—on MIMIC-I, Jaccard decreases by approximately 2%2\% and F1 by 1.5%1.5\%—indicating that retrieved visits provide valuable external priors that compensate for the absence of personal history. In contrast, removing the historical channel (w/o Hist.) results in slightly lower performance, suggesting that under the single-visit condition, the model mainly relies on current-visit information while still benefiting from the structural representation of the historical channel. Together with the observations from Fig. 5, these results highlight a complementary effect between the two channels: the similarity channel is crucial in true cold-start cases, whereas the historical channel becomes increasingly influential as longitudinal information accumulates. These findings quantitatively verify that the dual-channel design enables HypeMed to adaptively balance retrieved population-level knowledge and patient-specific information, thereby achieving superior robustness and predictive accuracy in cold-start scenarios. 5.7. Retrieval-quality correlation analysis To quantify the advantage of joint representation–retrieval learning over a decoupled retrieval paradigm, we report the retrieval-quality correlation analysis in Fig. 6. Figure 6. Retrieval-quality correlation analysis. We compare HypeMed with HypeMed-MedRep-None, a controlled variant that keeps the same prediction architecture but removes MedRep pre-training, resulting in retrieval within a non-pretrained embedding space. Test visits are grouped into bins by retrieval quality, measured as the F1 score between medications in the retrieved visits and the ground-truth medications. The x-axis denotes retrieval-quality bins (higher is better), and the y-axis reports the final prediction performance. From Fig. 6, we observe two key trends: First, for both HypeMed and HypeMed-MedRep-None, higher retrieval quality generally leads to better prediction performance, confirming that relevant references provide useful complementary signals (Wu et al., 2022; Kim et al., 2024; Wu et al., 2023). Second, under the same retrieval-quality bin, HypeMed consistently achieves higher accuracy than HypeMed-MedRep-None, indicating that the performance gain is not solely explained by retrieving “more correct” cases. This result supports our hypothesis that coupling representation learning with retrieval improves how retrieved context is represented and used. In HypeMed-None, retrieval is performed in a randomly initialized or weakly regularized embedding space; even when the retrieved visits exhibit comparable label overlap, their embeddings may be poorly organized with respect to visit semantics, making the fused context less predictable and less exploitable by the downstream predictor. In contrast, MedRep pre-trains a hyperedge-aware metric space where visit embeddings are explicitly optimized to encode high-order clinical interactions and preserve set-level semantics. Therefore, retrieved neighbors become not only label-relevant but also geometrically compatible with the query visit, providing a more structured and informative signal for fusion, which ultimately translates into consistently better prediction performance. Figure 7. Comparative time complexity across models (FLOPs vs. Parameters) 5.8. Time complexity comparison We perform a time complexity analysis by comparing our approach with state-of-the-art methods, using floating point operations (FLOPs) as the metric. Specifically, we measure the FLOPs for each model during the inference phase. The results are illustrated in Fig. 7. Thanks to its two-stage design, HypeMed reuses MedRep embeddings and avoids additional GNN computations during inference. This significantly reduces its computational load, making its FLOPs count second only to SafeDrug. On the other hand, MoleRec executes a 4-layer GNN across multiple graphs, which results in the highest FLOPs during inference among the evaluated models. 5.9. Case study To illustrate the effectiveness of HypeMed’s recommendations more vividly, we conduct a case study with Patient-908 as an example. Patient-908 from the MIMIC-I test set has two medical visits. During the first visit, the patient was diagnosed with diseases such as cirrhosis, liver cancer, and chronic hepatitis. In the second visit, new conditions were added, including complications from liver transplantation, renal failure, and confirmed sepsis. Tab. 12 displays detailed information. Due to space constraints, we use the ICD codes to represent diagnoses and ATC codes to represent medications. Fig. 8 demonstrates the visualization of diagnosis entities using HGCN and KHGE as encoders in MedRep. Points in the same color correspond to the same ICD category. The KHGE group embeddings show apparent clustering by ICD categories. Nodes within the same category display closer distances. In contrast, the HGCN group shows scattered distribution without clear category distinction. To better demonstrate the effectiveness of KHGE, we select three groups of diagnosis entities from Visit-2, each belonging to the same category in the ICD classification system. The groups are as follows: • ⋆ [rgb]1,0,0 Persons With Potential Health Hazards Related To Personal And Family History (V1083, V168, and V1582); • ▶ [rgb].75,.5,.25 Symptoms (78060, 78552); • + [rgb]1,.5,0+ Varicose veins of other sites (4568, 4561). Different markers are used to highlight entities belonging to different groups. We find that in the HGCN group, the distribution of these three entity groups was dispersed, while in the KHGE group, entities belonging to the same category are clustered together. This suggests that by incorporating the ICD classification system into the modeling process, KHGE can translate the relative distances of entities in ICD into distances in embedding representations, granting similar representations to entities of the same category. In contrast, HGCN, lacking ICD information, fails to capture such information. Figure 8. Comparative Visualization of Diagnosis Entity Representations in MedRep using HGCN and KHGE Encoders. We then compare the effectiveness of medication recommendations for Patient-908 under two scenarios: with and without similar visit information (refer to Tab. 12). During the first visit, where the patient has no prior medical history, direct recommendations are prone to omissions and errors. To address this, HypeMed assigns a higher weight (90.43%) to similar visit information, thereby mitigating these inaccuracies. In the second visit, owing to the continuity of the patient’s condition, HypeMed can recommend the most suitable medications by relying on historical information, which is assigned a weight of 67.55%. However, errors in recommendations persist. At this juncture, information from similar visits plays a role in eliminating inappropriate medications. Ultimately, using similar visit information in the model increases the Jaccard similarity of medication recommendations from 46.15% / 55.17% to 77.27% / 64.00% in Visit-1 / Visit-2. This indicates that information on similar visits can significantly improve the accuracy of recommendations, especially when there is no history of prior medical consultations. Table 12. The example of Patient-908 with two visits for the case study. Medications recommended correctly without using similar visit information are highlighted in bold. Blue and orange are used respectively to denote additional correct medications introduced and erroneous medications eliminated after incorporating similar visit information. Diagnoses Medications Visit-1 V8741, V153, 79029, V1083, 5715, 1550, 07054 N02B, A01A, A02B, A06A, B05C, J05A, A12C, A07A, J01E, N01A, C03C, J01C, N02A, B01A, N05C, D01A, J01D, L04A, D11A, A04A, A07E, D07A, D04A, C03D Visit-2 99682, 78552, 0389, 99592, 5848, 570, 28419, 4561, 5589, 78060, 07070, 5715, V1083, E8780, 4568, V168, V1582 N02B, A01A, A02B, A06A, B05C, A12A, A12C, C01C, A07A, J02A, J05A, V03A, J01E, J01C, N02A, A02A, J01M, B01A, N05C, J01D, B02B, C08C, D11A, A04A, A07E 6. Conclusion and Future Work In conclusion, we propose HypeMed, a hypergraph-based medication recommendation framework that reconstructs a patient’s latent clinical condition by unifying intra-visit set-level coherence modeling with inter-visit reference augmentation. Through a two-stage design, MedRep encodes high-order clinical co-occurrence into a globally consistent, retrieval-friendly embedding space, while SimMR performs visit-conditioned retrieval of similar visits and integrates them with longitudinal history for robust medication prediction. Extensive experiments on three real-world benchmarks (MIMIC-I, MIMIC-IV, and eICU) show that HypeMed consistently achieves strong overall performance and outperforms state-of-the-art baselines on key accuracy metrics while maintaining competitive safety. Ablation and case studies further verify the complementary contributions of MedRep and SimMR and highlight the benefit of representation–retrieval consistency in utilizing retrieved references. Nevertheless, similar to prior methods (Wu et al., 2022; Yang et al., 2023), HypeMed may recommend slightly more medications than the ground truth, leading to suboptimal performance on the #Med. metric. In addition, embedding-based retrieval can occasionally introduce less relevant visits, especially when the current visit is under-specified. In future work, we plan to improve retrieval robustness (e.g., by incorporating stronger relevance filtering or uncertainty-aware retrieval) and enhance prescription compactness to further strengthen the clinical applicability of HypeMed. Acknowledgments This research was supported by the Public Computing Cloud of Renmin University of China and by the Fund for Building World-Class Universities (Disciplines) at Renmin University of China. References S. Bhoi, M. L. Lee, W. Hsu, H. S. A. Fang, and N. C. Tan (2021) Personalizing medication recommendation with a graph-based approach. ACM Transactions on Information Systems (TOIS) 40 (3), p. 1–23. Cited by: §2.1. D. Blumenthal, E. J. Fowler, M. Abrams, and S. R. Collins (2020) Covid-19—implications for the health care system. Vol. 383, Mass Medical Soc. Cited by: §1. P. Braveman and L. Gottlieb (2014) The social determinants of health: it’s time to consider the causes of the causes. Public health reports 129 (1_suppl2), p. 19–31. Cited by: §1. L. R. Brooks, G. R. Norman, and S. W. Allen (1991) Role of specific similarity in a medical diagnostic task.. Journal of Experimental Psychology: General 120 (3), p. 278. Cited by: §1, §4.2. S. Brown (2016) Patient similarity: emerging concepts in systems and precision medicine. Frontiers in physiology 7, p. 561. Cited by: §1, §4.2. Q. Chen, X. Li, K. Geng, and M. Wang (2023) Context-aware safe medication recommendations with molecular graph and ddi graph embedding. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37, p. 7053–7060. Cited by: §2.1, §2.1, 5th item, §5.1.1, §5.1.2, footnote 5. E. Choi, M. T. Bahadori, J. Sun, J. Kulas, A. Schuetz, and W. Stewart (2016) Retain: an interpretable predictive model for healthcare using reverse time attention mechanism. Advances in neural information processing systems 29. Cited by: §1, §2.1, 1st item. R. S. Evans (2016) Electronic health records: then, now, and in the future. Yearbook of medical informatics 25 (S 01), p. S48–S61. Cited by: §1. Y. Feng, H. You, Z. Zhang, R. Ji, and Y. Gao (2019) Hypergraph neural networks. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33, p. 3558–3565. Cited by: §4.1.1, §5.3. J. L. Frossard and F. Bonvin (2011) Charcot’s triad. International journal of emergency medicine 4 (1), p. 18. Cited by: §1. F. Gong, M. Wang, H. Wang, S. Wang, and M. Liu (2021) SMR: medical knowledge graph embedding for safe medicine recommendation. Big Data Research 23, p. 100174. Cited by: §2.1. H. Guo, J. Lian, and X. Zhou (2026) Why not collaborative filtering in dual view? bridging sparse and dense models. ACM Transactions on Information Systems. Cited by: §2.1. H. Guo, Y. Ma, and X. Zhou (2025a) Sorex: towards self-explainable social recommendation with relevant ego-path extraction. ACM Transactions on Information Systems. Cited by: §2.1. H. Guo, J. Yao, X. Zhou, X. Yi, and X. Xie (2025b) Counterfactual reasoning for steerable pluralistic value alignment of large language models. arXiv preprint arXiv:2510.18526. Cited by: §1. T. He and X. Zhou (2025) MotifGPL: motif-enhanced graph prototype learning for deciphering urban social segregation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39, p. 28079–28087. Cited by: §1. Q. Hong, C. Bian, X. Zhou, X. Li, Y. Li, and Z. Zeng (2025) Lost in time? a meta-learning framework for time-shift-tolerant physiological signal transformation. arXiv preprint arXiv:2511.21500. Cited by: §1. Q. Hong, S. Chang, and X. Zhou (2026) WED-net: a weather-effect disentanglement network with causal augmentation for urban flow prediction. arXiv preprint arXiv:2601.22586. Cited by: §1. F. Jiang, Y. Jiang, H. Zhi, Y. Dong, H. Li, S. Ma, Y. Wang, Q. Dong, H. Shen, and Y. Wang (2017) Artificial intelligence in healthcare: past, present and future. Stroke and vascular neurology 2 (4). Cited by: §1. A. E. Johnson, L. Bulgarelli, L. Shen, A. Gayles, A. Shammout, S. Horng, T. J. Pollard, S. Hao, B. Moody, B. Gow, et al. (2023) MIMIC-iv, a freely accessible electronic health record dataset. Scientific data 10 (1), p. 1. Cited by: §5.3, §5. A. E. Johnson, T. J. Pollard, L. Shen, L. H. Lehman, M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. Anthony Celi, and R. G. Mark (2016) MIMIC-i, a freely accessible critical care database. Scientific data 3 (1), p. 1–9. Cited by: §5.3, §5. T. Kim, J. Heo, H. Kim, K. Shin, and S. Kim (2024) VITA:‘carefully chosen and weighted less’ is better in medication recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, p. 8600–8607. Cited by: §1, §1, 7th item, §5.7. H. Le, T. Tran, and S. Venkatesh (2018) Dual memory neural computer for asynchronous two-view sequential learning. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, p. 1637–1645. Cited by: §2.1. D. Lee and K. Shin (2023) I’m me, we’re us, and i’m us: tri-directional contrastive learning on hypergraphs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37, p. 8456–8464. Cited by: §2.2, §4.1.2, §4.1.2, §5.1.1, §5.4. J. Li, X. Li, and X. Zhou (2025a) FAP-cd: fairness-driven age-friendly community planning via conditional diffusion generation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39, p. 28168–28176. Cited by: §1. L. Li, X. Zhou, and Z. Liu (2025b) R2med: a benchmark for reasoning-driven medical retrieval. arXiv preprint arXiv:2505.14558. Cited by: §1. L. Li and X. Zhou (2025) Leave no one behind: enhancing diversity while maintaining accuracy in social recommendation. In International Conference on Database Systems for Advanced Applications, p. 51–67. Cited by: §2.1. X. Li, Y. Zhang, X. Li, H. Wei, and M. Lu (2023) Dgcl: distance-wise and graph contrastive learning for medication recommendation. Journal of Biomedical Informatics 139, p. 104301. Cited by: §1, §2.1, §2.1, 6th item, §5.1.1, footnote 4. X. Li, H. Zhang, and X. Zhou (2025c) Spatio-temporal hierarchical causal models. arXiv preprint arXiv:2511.20558. Cited by: §1. Y. Ma, C. Li, and X. Zhou (2024) Tail-steak: improve friend recommendation for tail users via self-training enhanced knowledge distillation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 38, p. 8895–8903. Cited by: §2.1. N. Menachemi and T. H. Collum (2011) Benefits and drawbacks of electronic health record systems. Risk management and healthcare policy, p. 47–55. Cited by: §1. A. v. d. Oord, Y. Li, and O. Vinyals (2018) Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748. Cited by: §4.1.2. T. J. Pollard, A. E. Johnson, J. D. Raffa, L. A. Celi, R. G. Mark, and O. Badawi (2018) The eicu collaborative research database, a freely available multi-center database for critical care research. Scientific data 5 (1), p. 1–13. Cited by: §5. J. Read, B. Pfahringer, G. Holmes, and E. Frank (2011) Classifier chains for multi-label classification. Machine learning 85, p. 333–359. Cited by: 2nd item. S. Rumsey, J. Winders, and A. D. MacCormick (2017) Diagnostic accuracy of charcot’s triad: a systematic review. ANZ journal of surgery 87 (4), p. 232–238. Cited by: §1. D. Saraswat, P. Bhattacharya, A. Verma, V. K. Prasad, S. Tanwar, G. Sharma, P. N. Bokoro, and R. Sharma (2022) Explainable ai for healthcare 5.0: opportunities and challenges. IEEe Access 10, p. 84486–84517. Cited by: §1. F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini (2008) The graph neural network model. IEEE transactions on neural networks 20 (1), p. 61–80. Cited by: §2.1, §4.1.1, §5.3. J. Shang, C. Xiao, T. Ma, H. Li, and J. Sun (2019) Gamenet: graph augmented memory networks for recommending medication combination. In proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, p. 1126–1133. Cited by: §1, §1, §2.1, §4.2.2, 2nd item, §5.1.2. J. Solomon (2006) Case studies: why are they important?. Nature Clinical Practice Cardiovascular Medicine 3 (11), p. 579–579. Cited by: §1, §4.2. H. Sun, S. Xie, S. Li, Y. Chen, J. Wen, and R. Yan (2022) Debiased, longitudinal and coordinated drug recommendation through multi-visit clinic records. Advances in Neural Information Processing Systems 35, p. 27837–27849. Cited by: §2.1, footnote 4. J. Wu, Y. Dong, Z. Gao, T. Gong, and C. Li (2023) Dual attention and patient similarity network for drug recommendation. Bioinformatics 39 (1), p. btad003. Cited by: §1, §1, §2.1, 9th item, §5.7. J. Wu, X. Yu, K. He, Z. Gao, and T. Gong (2024) PROMISE: a pre-trained knowledge-infused multimodal representation learning framework for medication recommendation. Information Processing & Management 61 (4), p. 103758. Cited by: §1, §1, §2.1, 10th item. J. Wu, X. Wang, F. Feng, X. He, L. Chen, J. Lian, and X. Xie (2021) Self-supervised graph learning for recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, New York, NY, USA, p. 726–735. External Links: ISBN 9781450380379, Link, Document Cited by: §4.1.2, §5.1.1, §5.4. R. Wu, Z. Qiu, J. Jiang, G. Qi, and X. Wu (2022) Conditional generation net for medication recommendation. In Proceedings of the ACM Web Conference 2022, p. 935–945. Cited by: §1, §1, §1, §2.1, §2.1, §4.2.2, §4.2.2, 3rd item, 4th item, §5.1.1, §5.1.2, §5.1.2, §5.7, §6, footnote 4. L. Xia, C. Huang, Y. Xu, J. Zhao, D. Yin, and J. Huang (2022) Hypergraph contrastive collaborative filtering. In Proceedings of the 45th International ACM SIGIR conference on research and development in information retrieval, p. 70–79. Cited by: §2.2. X. Xia, H. Yin, J. Yu, Q. Wang, L. Cui, and X. Zhang (2021) Self-supervised hypergraph convolutional networks for session-based recommendation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35, p. 4503–4511. Cited by: §2.2. C. Yang, C. Xiao, L. Glass, and J. Sun (2021a) Change matters: medication change prediction with recurrent residual networks. arXiv preprint arXiv:2105.01876. Cited by: §4.2.2. C. Yang, C. Xiao, F. Ma, L. Glass, and J. Sun (2021b) SafeDrug: dual molecular graph encoders for safe drug recommendations. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Cited by: §1, §1, §1, §2.1, §4.2.2, §4.2.5, 3rd item, 3rd item, §5.1.1, §5.1.2, §5.1.2. N. Yang, K. Zeng, Q. Wu, and J. Yan (2023) Molerec: combinatorial drug recommendation with substructure-aware molecular representation learning. In Proceedings of the ACM Web Conference 2023, p. 4075–4085. Cited by: §1, 8th item, §5.1.1, §5.1.2, §6. X. Yong, J. Lian, X. Yi, X. Zhou, and X. Xie (2025) MOTIVEBENCH: how far are we from human-like motivational reasoning in large language models?. In Findings of the Association for Computational Linguistics: ACL 2025, p. 20059–20089. Cited by: §1. X. Yong, P. Sun, Z. Wang, and X. Zhou (2026) Intelli-planner: towards customized urban planning via large language model empowered reinforcement learning. arXiv preprint arXiv:2601.21212. Cited by: §1. [51] X. Yong, X. Zhou, Y. Zhang, J. Li, Y. Zheng, and X. Wu Think or not? exploring thinking efficiency in large reasoning models via an information-theoretic lens. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, Cited by: §1. J. Yu, H. Yin, J. Li, Q. Wang, N. Q. V. Hung, and X. Zhang (2021) Self-supervised multi-channel hypergraph convolutional network for social recommendation. In Proceedings of the web conference 2021, p. 413–424. Cited by: §2.2. H. Zhang, Y. Wang, Y. Duan, R. Fu, D. Zhao, S. Fan, S. Cao, W. Guo, and X. Zhou (2026) Social-jepa: emergent geometric isomorphism. arXiv preprint arXiv:2603.02263. Cited by: §1. Y. Zhang, X. Wu, Q. Fang, S. Qian, and C. Xu (2023) Knowledge-enhanced attributed multi-task learning for medicine recommendation. ACM Transactions on Information Systems 41 (1), p. 1–24. Cited by: §2.1. Y. Zhang, R. Chen, J. Tang, W. F. Stewart, and J. Sun (2017) LEAP: learning to prescribe effective and safe treatment combinations for multimorbidity. In proceedings of the 23rd ACM SIGKDD international conference on knowledge Discovery and data Mining, p. 1315–1324. Cited by: §1, §2.1, 1st item. X. Zhou, Z. Zhao, and H. Guo (2025) Tricolore: multi-behavior user profiling for enhanced candidate generation in recommender systems. IEEE Transactions on Knowledge and Data Engineering. Cited by: §2.1. G. Zhu, X. Li, X. Sui, J. Hawes, R. Hussein, Q. Hong, X. Zhou, Z. Zeng, and Y. Li (2025a) Hypertension risk screening using long-term photoplethysmogram and ballistocardiograph measurements from a smartwatch. In 2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), p. 1–6. Cited by: §1. Y. Zhu, S. Duan, X. Zhang, J. Sang, P. Zhang, T. Lu, X. Zhou, J. Yao, X. Yi, and X. Xie (2025b) MoHoBench: assessing honesty of multimodal large language models via unanswerable visual questions. arXiv preprint arXiv:2507.21503. Cited by: §1.