Paper deep dive

Automatic Cardiac Risk Management Classification using large-context Electronic Patients Health Records

Jacopo Vitale, David Della Morte, Luca Bacco, Mario Merone, Mark de Groot, Saskia Haitjema, Leandro Pecchia, Bram van Es

Year: 2026Venue: arXiv preprintArea: cs.CLType: PreprintEmbeddings: 40

Abstract

Abstract:To overcome the limitations of manual administrative coding in geriatric Cardiovascular Risk Management, this study introduces an automated classification framework leveraging unstructured Electronic Health Records (EHRs). Using a dataset of 3,482 patients, we benchmarked three distinct modeling paradigms on longitudinal Dutch clinical narratives: classical machine learning baselines, specialized deep learning architectures optimized for large-context sequences, and general-purpose generative Large Language Models (LLMs) in a zero-shot setting. Additionally, we evaluated a late fusion strategy to integrate unstructured text with structured medication embeddings and anthropometric data. Our analysis reveals that the custom Transformer architecture outperforms both traditional methods and generative \acs{llm}s, achieving the highest F1-scores and Matthews Correlation Coefficients. These findings underscore the critical role of specialized hierarchical attention mechanisms in capturing long-range dependencies within medical texts, presenting a robust, automated alternative to manual workflows for clinical risk stratification.

PDF

Open source PDF →

PDF not stored locally. Use the link above to view on the source site.

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 97%

Last extracted: 3/13/2026, 1:04:23 AM

Summary

This study introduces an automated classification framework for geriatric Cardiovascular Risk Management (CVRM) using unstructured Electronic Health Records (EHRs). By benchmarking classical machine learning, deep learning (ResNet-1D), and generative LLMs against a custom hierarchical Transformer architecture, the authors demonstrate that the Transformer model, particularly with a classification token pooling strategy, achieves superior performance in clinical risk stratification. The study also evaluates a late fusion strategy integrating unstructured clinical narratives with structured medication and anthropometric data.

Entities (6)

Cardiovascular Risk Management · clinical-domain · 100%Electronic Health Records · data-source · 100%Hierarchical Transformer · model-architecture · 100%Linear Support Vector Classifier · model-architecture · 100%University Medical Center Utrecht · organization · 100%GPT-4o · large-language-model · 95%

Relation Signals (3)

University Medical Center Utrecht → providesdataset → Electronic Health Records

confidence 100% · This study leverages the extensive EHR database of the UMCU

Hierarchical Transformer → outperforms → Linear Support Vector Classifier

confidence 95% · hierarchical Transformer models consistently outperform traditional machine learning baselines

Hierarchical Transformer → outperforms → GPT-4o

confidence 95% · hierarchical Transformer models consistently outperform... state-of-the-art zero-shot LLMs

Cypher Suggestions (2)

Identify which models outperform others. · confidence 95% · unvalidated

MATCH (a:ModelArchitecture)-[:OUTPERFORMS]->(b:ModelArchitecture) RETURN a.name, b.name

Find all model architectures evaluated in the study. · confidence 90% · unvalidated

MATCH (m:ModelArchitecture) RETURN m.name

Full Text

39,264 characters extracted from source content.

Expand or collapse full text

Automatic Cardiac Risk Management Classification using large-context Electronic Patients Health Records Jacopo Vitale 1 , David Della Morte 1 , Luca Bacco 1 , Mario Merone 1 , Mark C.H. de Groot 2 , Saskia Haitjema 2 , Leandro Pecchia 1 , Bram van Es 2 1 Research Unit of Intelligent Health Technologies, Universit`a Campus Bio-Medico di Roma, Rome, Italy. 2 Central Diagnostic Laboratory, University Medical Center Utrecht, Utrecht, The Netherlands. Contributing authors: jacopo.vitale@unicampus.it; bes3@umcutrecht.nl; Abstract To overcome the limitations of manual administrative coding in geriatric Car- diovascular Risk Management, this study introduces an automated classification framework leveraging unstructured Electronic Health Records (EHRs). Using a dataset of 3,482 patients, we benchmarked three distinct modeling paradigms on longitudinal Dutch clinical narratives: classical machine learning baselines, specialized deep learning architectures optimized for large-context sequences, and general-purpose generative Large Language Models (LLMs) in a zero-shot setting. Additionally, we evaluated a late fusion strategy to integrate unstruc- tured text with structured medication embeddings and anthropometric data. Our analysis reveals that the custom Transformer architecture outperforms both traditional methods and generative LLMs, achieving the highest F1-scores and Matthews Correlation Coefficients. These findings underscore the critical role of specialized hierarchical attention mechanisms in capturing long-range dependen- cies within medical texts, presenting a robust, automated alternative to manual workflows for clinical risk stratification. 1 arXiv:2603.09685v1 [cs.CL] 10 Mar 2026 Keywords:Cardiovascular Risk Management, Electronic Health Records, Hierarchical Transformer, Learning Healthcare System, Large Language Models, Zero-shot Classification 1 Introduction In the geriatric population, cardiovascular risk represents a significant challenge due to the increased prevalence of cardiovascular disease (CVD) with advancing age [27, 3]. Older adults often exhibit a complex clinical profile, where multiple factors including comorbidities, age-related physiological changes, frailty, and polypharmacy synergis- tically contribute to an elevated cardiovascular risk [12]. Although clinical guidelines exist to support healthcare professionals in managing cardiovascular risk, they are not always consistently implemented in routine practice and may become outdated as new evidence emerges [7, 5]. This results in suboptimal screening and manage- ment of cardiovascular risk factors, ultimately contributing to avoidable morbidity and mortality. A Learning Healthcare System (LHS) provides a promising framework to address these limitations by continuously analyzing data from routine clinical care, identifying gaps and inefficiencies, and translating new evidence into actionable improvements in practice. First introduced in 2007, the LHS was conceptualized as a system in which care delivery and evidence generation are tightly integrated in a continuous cycle of learning and improvement. Formally, an LHS is defined as a healthcare system that systematically learns from clinical experience by generating and applying evidence to support medical decision- making, foster innovation, and enhance the quality, safety, and efficiency of care delivery [13, 23]. Within the LHS framework, Electronic Health Records (EHRs) serve as the fundamental infrastructure that enables the transformation of routine clinical practice into systematic knowledge. EHRs function as a centralized access point for 2 patient data and include both structured information, such as laboratory values, and unstructured content, such as prescribed medications, medical notes, and information related to appointments and hospitalizations. By leveraging the data routinely cap- tured in EHRs, it becomes possible to automatically accumulate longitudinal patient information without the need for dedicated follow-up. This results in the creation of a dynamic and scalable database that supports predictive modeling and real-time risk stratification. Furthermore, the integration of EHR data facilitates the early identification and continuous monitoring of high-risk individuals, thereby enabling the implementation of personalized preventive and therapeutic strategies [9, 10]. In 2016, the Center for Circulatory Health at the University Medical Center Utrecht (UMCU), in the Netherlands, launched the Utrecht Cardiovascular Cohort – Car- diovascular Risk Management (UCC-CVRM) initiative, which constitutes one of the first operationalizations of a cardiovascular LHS. Following the guidelines set by the Utrecht Cardiovascular Cohort, the cardiovascular LHS aims to uniformly assess and record risk indicators for all patients included in the Cardiovascular Risk Management (CVRM) [1]. This approach allows data generated from routine check-ups to be easily analyzed, enabling continuous improvement in the delivery of healthcare services [1]. At UMCU patients are considered eligible to CVRM via codes in their EHR namely, agenda codes, that denote specific types of appointments, consultations, or healthcare services. This eligibility assignment system has significant limitations, including the fact that it does not account for the patient’s physiological parameters or clinical his- tory, as these codes can change over time and may not accurately reflect the patients’ clinical history. Moreover, the assigment code is assigned manually by a nurse leading to selection errors. This study leverages the extensive EHR database of the UMCU, which includes a wide range of patient information such as demographics, medical history, diagnoses, treatments, and laboratory results, representing a valuable resource for enhancing 3 CVRM. Specifically, the contributions of this work are threefold: (i) To collect and preprocess highly unstructured data from the UMCU EHR system, transforming it into a structured dataset suitable for downstream analysis; (i) To develop an auto- mated strategy for assessing patient eligibility for the CVRM program, with particular emphasis on the integration of clinical history data; (i) To explore and evaluate vari- ous model architectures and techniques capable of handling long-context information, in order to improve the identification of high-risk patients. 2 Dataset From UMCU database we retrieved all record from 3482 patients EHRs collected from the geriatric outpatient clinic. Each patient’s medical record is assigned an iden- tification code based on the type of appointment or visit, without considering any physiological parameters. Among the twenty-six possible codes that can be assigned, four designate the automatic enrollement to the CVR management programme and identified as eligible. A detailed summary of the dataset characteristics is presented in Table 1. Patients were categorized into two classes based on their eligibility for the CVRM program: class 0denotesnon-eligiblepatients, whileclass 1denoteseligiblepatients. As shown in Table 1, the dataset is strongly imbalanced with respect to the class distribution. Table 1Demographic and class distribution of the dataset. CharacteristicValue Male samples1871 (53.73%) Female samples1611 (46.27%) Age (mean±SD)M: 74.52±9.05, F: 74.40±9.51 Class 02808 (80.61%) Class 1674 (19.39%) Total number of samples3482 4 Each patient has an undetermined number of medical text or medications based on his/her own medical history. Electronic Health Record Other Medical Info Unstructured Text Consultancies GP Letters Discharge Letters Anthropometrics Medications Fig. 1Graphical representation of an EHR. Highlighted items are the one used in this study, the not highlighted one are present but discarded. 2.1 Consult texts In a geriatric outpatient clinic, a consultation report (consultverslag) serves as a structured medical document summarizing the specialist’s assessment and recommen- dations following a patient visit. These reports include a multidimensional evaluation, addressing the patient’s primary complaint, comorbidities, cognitive status, functional ability, and psychosocial factors. The document follows a standardized format, beginning with the specialty and consultation type, followed by patient demographics, medical history, and present- ing symptoms. A clinical examination includes cognitive screening (e.g., MMSE), fall risk assessment, and relevant diagnostic findings. The conclusion section presents a differential diagnosis, considering age-related conditions such as dementia, Parkin- son disease, or polypharmacy-related effects. The report outlines a treatment plan, including referrals to other specialists, medication adjustments, and recommendations for follow-up care. The structured format ensures continuity of care and facilitates interdisciplinary collaboration in managing the complex needs of older adults. In our 5 dataset, we have multiple consultation reports per patient, each associated with a specific date allowing for longitudinal tracking of patient progress and adjustments in care. 2.2 Anthropometrics The EHR contains a variety of anthropometric data, including age, gender, and poten- tially other measurements such as BMI. However, due to the unstructured nature of the EHR, as highlighted in the literature, extracting and organizing these data points into a usable format presents significant challenges. In our dataset, we focused on using age and gender because these variables are typically well-structured and eas- ier to extract. Other anthropometric measures, such as BMI, were not consistently available or structured in a way that would allow for reliable inclusion in the analy- sis. The difficulties in processing and polishing EHR data to a usable state limit the integration of more detailed anthropometric information, despite its potential value in patient assessment. 2.3 Medication Processing Medications represent a critical unstructured feature, originally recorded in the EHRs by their trade or generic names. To ensure semantic consistency and resolve vari- ability between different denominations for the same active ingredient, all medication entries were mapped to the Anatomical Therapeutic Chemical (ATC) classification system [26], developed by the World Health Organization. For each patient, the pharmacological history spanning the observation year was transformed from raw prescription names into a list of decompressed ATC code descriptions. This standard- ization step converts heterogeneous drug data into a unified textual format, serving as the input for the semantic embedding and aggregation pipeline detailed in the Methodology. 6 3 Methodology Data from consultations, medications, and anthropometric measurements acquired for a single year were selected from patients’ EHRs, utilizing the most recent medical appointment as the reference time-point. To systematically evaluate the impact of integrating these distinct modalities across all our evaluated models, we investigated two modeling strategies: a unimodaltext-onlyapproach and a multimodallate fusion approach (Figure 2) [25, 19]. The text-only strategy serves as a baseline, relying solely on the text encoders to classify the clinical narratives. Conversely, the late fusion strategy processes the unstructured text independently and concatenates the resulting feature representation with the structured data vectors (medications and anthropo- metrics) just prior to the final classification head. We selected this specific fusion architecture because it allows the core sequence-modeling networks to be utilized with- out requiring complex internal modifications for multimodal processing, while still empirically maintaining robust predictive performance [18]. To operationalize this late fusion strategy, the supplementary modalities required mathematical representation. To incorporate the pharmacological data, we lever- aged the Anatomical Therapeutic Chemical (ATC) codes already recorded in the patient EHRs [26]. We extracted the decompressed textual description associated with each code to capture the semantic nuance of the prescribed treatments. Subse- quently, these descriptive strings were converted into numerical embeddings using the BioLORD-2023 model, a Sentence-Transformer [21, 14]. This model was selected for its pre-training on the UMLS medical ontology, which includes descriptions of ATC codes, enabling it to map semantically similar drugs to proximate coordinates within the vector space [2]. By feeding the ATC descriptions into this model, we obtained 768-dimensional embeddings, which were subsequently condensed into a single rep- resentative vector per patient to serve as the structured input for the late fusion layer. 7 We benchmarked our proposed architecture against three distinct families of model- ing paradigms, applying both the text-only and late fusion strategies where applicable. First, to establish traditional machine learning baselines, we employed a stratified dummy classifier and a Linear Support Vector Classifier (LinearSVC) trained on Term Frequency–Inverse Document Frequency (TF-IDF) representations [15, 16]. Second, to evaluate the efficacy of attention mechanisms against convolutional approaches for sequential medical text, we implemented a custom one-dimensional ResNet [8]. Finally, we compared our supervised architectures against general-purpose LLMs applied in a zero-shot classification setting [28, 11]. To process the longitudinal consultation narratives, we developed a cus- tom encoder-only Transformer architecture designed specifically for large-context sequences. The input text is first encoded using a pre-trained tokenizer, mapping each token ID to a 512-dimensional embedding space. The sequence is padded or trun- cated to a maximum length of 8192 tokens. To track token positions within the input text, a rotary positional embedding [22] is applied. For the final classification mecha- nism, we investigated two distinct aggregation strategies. The first approach utilizes a Classification (CLS) token [4] prepended to the embedded sequence. During the back- propagation phase, this randomly initialized token inflates with global classification information; after the Transformer Encoder stack, only the CLS token is pooled and passed to a Multi-Layer Perceptron (MLP) to output class predictions, as detailed in Fig. 2. Alternatively, we explored a Global Average Pooling strategy, where the out- put token sequence is simply averaged to form the final representation, eliminating the need for a CLS token. 8 INPUT: Narrative Consults Max Sequence Length: 8192 Tokens Optional CLS Token Prepend Average the token sequence Pool CLS Token only BPE Tokenizer Token IDs EMBEDDING Rotary Pos. Embedding HIERARCHICAL ENCODER STACK . . . Feed Forward Hierarchical Self Attention Add & Norm. Encoder Block Feed Forward Hierarchical Self Attention Add & Norm. Encoder Block Concatenate Antropometrics Measurements Concatenate Antropometrics Measurements Eligible Not Eligible MLP Eligible Not Eligible MLP Fig. 2Schematic overview of the Hierarchical Transformer classification pipeline. The architecture processes concatenated consults via BPE tokenization and hierarchical encoding, utilizing CLS Clas- sification Token (orange arrows) or Global Average Pooling (purple arrows) strategies before late fusion with anthropometric data (or not). Each Transformer Encoder layer is composed of a Hierarchical Attention Layer and a Feed-Forward network layer, every output of those layer is added via skip con- nection and layer-normalised as explained in [24]. The token sequence flows into the Transformer encoder where hierarchical attention is computed as shown in [29]. This approach allows large context-length text to be processed with acceptable computa- tional time and memory complexity. This modified version of attention requires that the input is padded to a power of two, as attention is computed hierarchically on token pairs, prioritizing local dependencies over distant ones [29]. In parallel, we implemented a custom ResNet-1D architecture tailored to sequential representations, providing a convolutional baseline for comparison with Transformer-based methods. The hyperparameters and structural configurations detailed in Table 2 were selected following a grid search constrained by our available GPU resources. We 9 observed that variations among the top-performing configurations yielded no statisti- cally significant improvements; therefore, for clarity and brevity, we report only the hyperparameter set that achieved the highest validation metrics. Table 2 Model Architecture and Hyperparameter Summary. detailed breakdown of the Hierarchical Transformers (HTrans), ResNet-1D, and LinearSVC baseline. Note the distinct classification head structures and dropout rates. FeatureHTrans (Avg)HTrans (CLS)ResNetLinearSVC Architecture BaseHierarchical TransformerResConsultNet (CNN)SVM (Linear) Input RepresentationLearned EmbeddingsLearned EmbeddingsTF-IDF AggregationGlobal Avg PoolCLS TokenMax Pooling (Inter-layer)Hyperplane Dist. Structural Params Total Layers338– Block CompositionPreNorm→Attn/F→AddConv1D→BN→ReLU– Positional EmbeddingRotary (RoPE)– Heads / Filters4 Heads4 Heads16 (Base)– Dim Head / Kernel32323– Block Size3232– F Multiplier44– Classification Head Structure (Hidden Dims)256→12820– Dropout (Head)0.20.5– Normalization (Head)BatchNorm1dBatchNorm1d– LinearSVC Params Regularization (C)–1.0 Loss Function–Squared Hinge Tolerance–1×10 −4 3.1 Training and Optimization All deep learning models were trained using the AMSGrad [20] optimizer. Given the unbalanced nature of the dataset, we employed a Stratified 5-fold cross-validation strategy. This ensures that the proportion of samples for each class is preserved in each fold, providing a robust evaluation metric. The specific training hyperparameters are detailed in Table 3. 10 Table 3 Training & Optimization Configuration. These hyperparameters are shared across all model variations in the experimental setup. ParameterValue Epochs30 Batch Size12 Learning Rate3×10 −5 Weight Decay1×10 −4 Optimizer FlagAdam (AMSGrad) Data Split Strategy Cross-ValidationStratified K-Fold K-Fold Splits5 Test Size0.2 (20%) Seed42 Finally, we evaluated few LLMs in a zero-shot classification setting with a two- prompt approach with the first prompt for translation from Dutch to English and the second prompt to extract the binary label, where we add an extract of the CVRM guidelines to the system prompt. The LLMs used were GPT-4o, GPT-4o-mini, GPT- 4.1 and GPT-4.1-mini 1 . Before the API is called the consult texts are de-identified using DEDUCE [17] 2 . We prepend the consult texts with the age and the gender of the patient. Note that the CVRM guidelines are over 400 pages. In the spirit of this approach we prompted GPT 5.2 to make an extractive summary of the guidelines regarding the core CVRM factors, see appendix A. We append this extractive summary to the system prompt. Prompt Group:Two-step setup 1 The O-models and the GPT-5.1 models were not available yet in the secure Azure environment for West-Europe. 2 https://github.com/vmenger/deduce 11 Prompt 1.1System You are a faithful and truthful label extractor in the cardio/geriatrics domain. You are an expert in cardiovascular risk management. You assign people to the cardiovascular risk management regime based on the Dutch guidelines. This is an extract of the CVRM guidelines: [SUMMARY OF CVRM GUIDELINES] ,→ ,→ ,→ ,→ Prompt 1.2Translation Translate this Dutch geriatrics consult to English. Prompt 1.3Extraction We want to know whether the patient, based on this medical consult text and the CVRM guidelines, has an elevated risk for cardiovascular disease. Only respond with yes / no. ,→ ,→ Pro. 1Two-step prompt, first we translate, then we extract. 4 Results and Discussion Table 4 reports the performance of all models using consult texts only as input, while the late-fusion approach, which combines textual data with patient anthropometric features, is summarized in Table 5. Both tables report F1-score, Precision, Recall, and 12 Matthews Correlation Coefficient (MCC) as evaluation metrics (reported below). Precision = T P T P+F P Recall = T P T P+F N F1-score = 2× Precision×Recall Precision + Recall = 2T P 2T P+F P+F N MCC = T P×T N−F P×F N p (T P+F P)(T P+F N)(T N+F P)(T N+F N) (1) Table 4Performance of CVRM prediction based on only consult texts as input. Values are shown as mean (std). ModelsF1-Score (%)Precision (%)Recall (%)MCC-Score ⊕ Dummy Classifier31.17 (0.53)19.36 (0.33)79.94 (1.47)-0.0013 (0.016) Linear SVC △ 86.13 (0.59)82.80 (0.57)92.78 (0.48)0.749 (0.011) H-transformer 1D † 89.73 (3.39)89.73 (3.39)91.48 (1.25)0.718 (0.049) H-transformer 1D ‡ 91.48 (1.25)91.48 (1.25)89.73 (3.39)0.646 (0.144) ResNet1D78.54 (2.16)78.54 (2.16)78.54 (2.16)0.371 (0.102) Zero-Shot LLM GPT-4o-mini34.8121.3194.80.12 Zero-Shot LLM GPT-4o35.2921.6395.70.14 Zero-Shot LLM GPT-4.1-mini34.9621.789.30.12 Zero-Shot LLM GPT-4.133.4520.393.30.07 ⊕: Matthew’s Correlation Coefficient;†: Hierarchical Transformer with Classification Token Pooling;‡: Hierar- chical Transformer with average pooling;△: Support Vector Classifier with Linear Kernel; As shown in Table 4 and Table 5, hierarchical Transformer models consistently outperform traditional machine learning baselines and state-of-the-art zero-shot LLMs in both text-only and late-fusion settings, demonstrating that their architecture effectively captures the complex structure of clinical texts. In the text-only setting (Table 4), the classification token variant ( † ) achieved an F1-score of 92.48% with an MCC of 0.758, while the average pooling variant ( ‡ ) yielded slightly lower yet still 13 strong performance (F1-score: 91.02%, MCC: 0.730). When anthropometric features were incorporated via late fusion (Table 5), both variants maintained top performance, with F1-scores ranging from 89.73% to 91.48% and MCC values between 0.646 and 0.718. Table 5Performance of CVRM prediction based on consult texts and anthropometric measures as input. Values are shown as mean (std). ModelsF1 (%)Precision (%)Recall (%)MCC ⊕ Dummy Classifier50.76 (0.0)50.75 (0.0)50.78 (0.0)0.015 (0.0) Linear SVM △ 86.30 (0.56)82.97 (0.54)92.87 (0.43)0.752 (0.010) H-transformer1D † 92.48 (0.67)92.48 (0.67)92.48 (0.67)0.758 (0.024) H-transformer1D ‡ 91.02 (1.11)91.02 (1.11)91.02 (1.11)0.730 (0.028) ResNet1D85.48 (0.69)85.48 (0.69)85.48 (0.69)0.565 (0.023) ⊕: Matthews Correlation Coefficient;†: Hierarchical Transformer with Classification Token Pooling; ‡: Hierarchical Transformer with Average Pooling;△: Support Vector Classifier with Linear Kernel; Although the choice of pooling strategy has a minimal impact on overall perfor- mance, the classification token-based variant consistently achieves the highest scores, suggesting that this strategy is better suited to capturing the most informative con- tent present in medical consultation texts. Among the evaluated baselines, the SVM achieved strong performance across the considered metrics. Despite its relatively sim- ple architecture, SVM yielded results that were only moderately lower than those obtained by our proposed model, confirming its effectiveness in this clinical prediction task. This result is consistent with previous evidence highlighting the robustness of SVM in clinical and healthcare applications, particularly in settings characterized by limited data availability and high-dimensional feature spaces. Prior to the widespread adoption of transformer-based architectures, SVMs were considered a strong method- ological standard in clinical machine learning and continue to represent a competitive baseline in such scenarios [6]. In contrast, ResNet shows substantially lower perfor- mance, particularly when trained using only medical consultations. This suggests that 14 deep learning architectures may struggle in clinical tasks when training data are lim- ited. Such models typically rely on large amounts of labeled data to effectively learn task-relevant representations, a requirement that is often difficult to satisfy in real- world clinical settings. Indeed, this trend is even more pronounced for LLMs evaluated in a zero-shot configuration, which exhibited the weakest performance. These find- ings suggest that, in the absence of domain adaptation or fine-tuning, LLMs may lack adequate understanding of the clinical context required for reliable prediction. Nonetheless, all models achieved performance levels superior to the dummy classifier, demonstrating their capacity to extract meaningful information from both structured and unstructured data. All in all, the superior performance of the H-transformer- 1D highlights the importance of hierarchical attention when processing the complex, multi-dimensional evaluations typical of geriatric consultation reports. Unlike stan- dard models, the hierarchical approach efficiently captures long-range dependencies within the clinical history, which is essential given that geriatric patients often present with multiple comorbidities and polypharmacy that synergistically contribute to cardiovascular risk. 5 Conclusion This study demonstrates that deep learning architectures can successfully automate the identification of cardiovascular risk management eligibility using unstructured electronic health records. The proposed custom Hierarchical Transformer effectively handled longitudinal clinical narratives, yielding higher classification performance compared to both traditional machine learning baselines and general-purpose Large Language Models (LLMs) applied in a zero-shot setting. Our findings indicate that while general-purpose generative models offer broad capabilities, they currently lack the precision required for this specific non-English 15 clinical task without explicit supervision. Although evaluating fine-tuned, domain- specific clinical LLMs presents an interesting future avenue, the choice to develop a bespoke encoder architecture was driven by a clear operational rationale. Strict data privacy regulations necessitate that sensitive patient EHRs remain within secure, compute-constrained, on-premise hospital environments. In such infrastructural set- tings, fine-tuning massive autoregressive generative models is often computationally prohibitive. Conversely, our proposed Transformer, while architecturally sophisticated in its attention mechanism, maintains a significantly reduced training footprint. It is inherently optimized for sequence classification rather than text generation, making it both highly efficient to train from scratch and practically deployable. Consequently, task-specific encoder architectures remain a necessary and pragmatic solution for reliable risk stratification. By mitigating the selection errors associated with manual administrative coding, this automated approach supports the operational goals of a Learning Healthcare System. Future implementation of such models in routine practice offers a scalable, privacy-compliant method to utilize accumulated patient data for continuous quality improvement and more accurate geriatric risk assessment. Declarations Ethics approval and consent to participate The UMCU quality assurance research officer confirmed under project number 22U- 0292 that this study does not fall under the scope of the Dutch Medical Research Involving Human Subjects Act (WMO) and therefore does not require approval from an accredited medical ethics committee. The study was performed compliant with local legislation and regulations. All patient data were deidentified in compliance with the European Union General Data Protection Regulation, and as a result, written informed consent was not required by the UMCU ethical committee. 16 Availability of data and materials The datasets generated and/or analysed during the current study are not publicly available due to potential privacy-sensitive information, but are available from the cor- responding author upon reasonable request and local institutional approval. Research code is publicly available on GitHub, via the code repository for this work. Competing interests The authors declare that they have no competing interests. Funding The work was supported in part by the ODIN Project, as part of the European Union’s Horizon 2020 Research and Innovation Program, under Grant 101017331. Authors’ contributions Jacopo Vitale: Conceptualisation, Methodology, Data Curation, Writing - Original Draft, Investigation.David Della Morte: Methodology, Investigation, Data Cura- tion, Writing original draft.Luca Bacco: Methodology, Writing Review & Editing. Mario Merone: Writing Review & Editing, Supervision.Mark C.H. de Groot: Writing Review & Editing, Resources.Saskia Haitjema: Resources, Writing Review & Editing.Leandro Pecchia: Project Administration, Funding Acquisition, Super- vision.Bram Van Es: Conceptualisation, Writing Original Draft, Data Curation, Methodology, Investigation, Supervision. 17 A GPT summary of CVRM guidelines Prompt: I want to use ChatGPT to extract labels from patient consults regarding cardiovascular risk management. To guide the model I want to include guidelines but these are expansive, with over 400 pages of text. Could you extract from these guidelines the core points regarding co-morbidities, symptoms, prior history, risk scores and other factors that are likely included in these consults? Keep it brief. No extra explanatory text, just an extractive summary. Response: ## Core CVRM factors commonly documented in patient consults ### Demographics - Age - Sex ### Symptoms / Clinical presentation - Chest pain / angina - Dyspnea - Palpitations (e.g. atrial fibrillation) - Neurological deficits / TIA / stroke symptoms - Claudication (peripheral arterial disease) - Signs of heart failure ### Prior history (established disease) 18 - Documented atherosclerotic cardiovascular disease (coronary artery disease, myocardial infarction, stroke/TIA, peripheral arterial disease, aortic aneurysm) - Diabetes mellitus (type 1 or 2; duration, complications) - Chronic kidney disease (eGFR, albuminuria) - Familial hypercholesterolemia - Atrial fibrillation - Heart failure - Hypertension - Prior revascularization or vascular procedures ### Risk factors (anamnestic) - Smoking status (current/former, pack-years) - Family history of premature cardiovascular disease - Diet quality - Physical inactivity / sedentary behavior - Alcohol use - Psychosocial stress, depression, low socioeconomic status ### Risk factors (clinical / measurements) - Systolic blood pressure - Body mass index (BMI) - Waist circumference ### Laboratory factors - LDL-cholesterol - Non-HDL-cholesterol - Total cholesterol 19 - HDL-cholesterol - Triglycerides - Fasting glucose - Serum creatinine / eGFR - Urine albumincreatinine ratio ### Risk scores - SCORE2 (ages 4070) - SCORE2-OP (ages 7090) - SMART2 / SMART-REACH (established cardiovascular disease) - DIAL2 (diabetes mellitus) ### Risk modifiers / additional factors - Coronary artery calcium score - Psychosocial factors - Ethnic background - Chronic inflammatory diseases (rheumatoid arthritis, psoriatic arthritis, ankylosing spondylitis) - COPD - Gout - HIV infection - Inflammatory bowel disease - Obstructive sleep apnea - History of pre-eclampsia or pregnancy-related hypertension - Severe psychiatric disorders - Prior chemo- or radiotherapy ### Treatment-related context often noted 20 - Current antihypertensive therapy - Current lipid-lowering therapy (statins, ezetimibe, PCSK9 inhibitors) - Blood pressure target attainment - LDL-C target attainment - Medication adherence - Polypharmacy and frailty (especially in older adults) References [1] Folkert W Asselbergs, Frank LJ Visseren, Michiel L Bots, Gert J de Borst, Marc P Buijsrogge, Jan M Dieleman, Baukje GF van Dinther, Pieter A Doevendans, Imo E Hoefer, Monika Hollander, Pim A de Jong, Steven V Koenen, Gerard Pasterkamp, Ynte M Ruigrok, Yvonne T van der Schouw, Marianne C Verhaar, and Diederick E Grobbee. Uniform data collection in routine clinical practice in cardiovascular patients for optimal care, quality control and research: The utrecht cardiovascular cohort.European Journal of Preventive Cardiology, 24(8):840–847, January 2017. [2] Andrew L Beam, Benjamin Kompa, Allen Schmaltz, Inbar Fried, Griffin Weber, Nathan Palmer, Xu Shi, Tianxi Cai, and Isaac S Kohane. Clinical concept embeddings learned from massive sources of multimodal medical data.Pac Symp Biocomput, 25:295–306, 2020. [3] E. J. Benjamin, P. Muntner, A. Alonso, M. S. Bittencourt, C. W. Callaway, A. P. Carson, et al. Heart disease and stroke statistics—2019 update: a report from the american heart association.Circulation, 139(10):e56–e528, 2019. [4] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In 21 Proceedings of the 2019 Conference of the North American Chapter of the Asso- ciation for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. [5] Neal M Dixit, Shivani Shah, Boback Ziaeian, Gregg C Fonarow, and Jeffrey J Hsu. Optimizing guideline-directed medical therapies for heart failure with reduced ejection fraction during hospitalization.US Cardiology Review 2021;15:e07., 2021. [6] Khyathi G, Indumathi K P, Jumana Hasin A, Lisa Flavin Jency M, Sibyl Siluvai, and Krishnaprakash G. Support vector machines: A literature review on their application in analyzing mass data for public health.Cureus, January 2025. [7] Maaike Giezeman, Mats Arne, and Kersti Theander. Adherence to guidelines in patients with chronic heart failure in primary health care.Scandinavian Journal of Primary Health Care, 35(4):336–343, 2017. PMID: 29105550. [8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition, 2015. [9] R. E. Hoyt and A. K. Yoshihashi.Health informatics: practical guide for healthcare and information technology professionals. Lulu.com, 2014. [10] Dimitrios G. Katehakis and Manolis Tsiknakis.Electronic Health Record. John Wiley & Sons, Ltd, 2006. [11] Yanis Labrak, Mickael Rouvier, and Richard Dufour. A zero-shot and few- shot study of instruction-finetuned large language models applied to clinical and biomedical tasks. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessan- dro Lenci, Sakriani Sakti, and Nianwen Xue, editors,Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 2049–2066, Torino, Italia, May 2024. ELRA and ICCL. 22 [12] E. G. Lakatta and D. Levy. Arterial and cardiac aging: major shareholders in cardiovascular disease enterprises: Part i: aging arteries: a “set up” for vascular disease.Circulation, 107(1):139–146, 2003. [13] Sara Laurijssen, Rieke van der Graaf, Ewoud Schuit, Melina den Haan, Wouter van Dijk, Rolf Groenwold, Saskia le Sessie, Diederick Grobbee, and Martine de Vries. Learning healthcare systems in cardiology: A qualitative interview study on ethical dilemmas of a learning healthcare system.Learning Health Systems, 8(1):e10379, 2024. [14] Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. Biobert: a pre-trained biomedical language rep- resentation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 09 2019. [15] Yu-Chen Lin, Si-An Chen, Jie-Jyun Liu, and Chih-Jen Lin. Linear classifier: An often-forgotten baseline for text classification. In Anna Rogers, Jordan Boyd- Graber, and Naoaki Okazaki, editors,Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1876–1888, Toronto, Canada, July 2023. Association for Computational Linguistics. [16] Christopher D Manning.Introduction to information retrieval.Syngress Publishing,, 2008. [17] Vincent Menger, Floor Scheepers, Lisette Maria van Wijk, and Marco Spruit. Deduce: A pattern matching method for automatic de-identification of dutch medical text.Telematics and Informatics, 35(4):727 – 736, 2018. [18] Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, Andrew Y Ng, et al. Multimodal deep learning. 2011. 23 [19] Alvin Rajkomar, Eyal Oren, Kai Chen, Andrew M Dai, Nissan Hajaj, Michaela Hardt, Peter J Liu, Xiaobing Liu, Jake Marcus, Mimi Sun, Patrik Sundberg, Hec- tor Yee, Kun Zhang, Yi Zhang, Gerardo Flores, Gavin E Duggan, Jamie Irvine, Quoc Le, Kurt Litsch, Alexander Mossin, Justin Tansuwan, De Wang, James Wexler, Jimbo Wilson, Dana Ludwig, Samuel L Volchenboum, Katherine Chou, Michael Pearson, Srinivasan Madabushi, Nigam H Shah, Atul J Butte, Michael D Howell, Claire Cui, Greg S Corrado, and Jeffrey Dean. Scalable and accurate deep learning with electronic health records.npj Digital Medicine, 1(1):18, May 2018. [20] Sashank J Reddi, Satyen Kale, and Sanjiv Kumar. On the convergence of adam and beyond.arXiv preprint arXiv:1904.09237, 2019. [21] Fran ̧cois Remy, Kris Demuynck, and Thomas Demeester. BioLORD-2023: seman- tic textual representations fusing large language models and clinical knowledge graph insights.Journal of the American Medical Informatics Association, page ocae029, 02 2024. [22] Jianlin Su, Yu Lu, Shengfeng Pan, Bo Wen, and Yunfeng Liu.Roformer: Enhanced transformer with rotary position embedding.CoRR, abs/2104.09864, 2021. [23] Ming-Lung Tsai, Kuan-Fu Chen, and Pei-Chun Chen. Harnessing electronic health records and artificial intelligence for enhanced cardiovascular risk predic- tion: A comprehensive review.Journal of the American Heart Association, March 2025. [24] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, L ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. 24 [25] Chengdi Wang, Jiechao Ma, Shu Zhang, Jun Shao, Yanyan Wang, Hong-Yu Zhou, Lujia Song, Jie Zheng, Yizhou Yu, and Weimin Li. Development and validation of an abnormality-derived deep-learning diagnostic system for major respiratory diseases.npj Digital Medicine, 5(1):124, August 2022. [26] World Health Organisation. Anatomical therapeutic chemical (atc) classification. Accessed June 24, 2025. [27] World Health Organization. Cardiovascular diseases (cvds) fact sheet. https: //w.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds), 2021. [28] Yinghao Zhu, Zixiang Wang, Junyi Gao, Yuning Tong, Jingkun An, Weibin Liao, Ewen M. Harrison, Liantao Ma, and Chengwei Pan. Prompting large language models for zero-shot clinical prediction with structured longitudinal electronic health record data, 2024. [29] Zhenhai Zhu and Radu Soricut. H-transformer-1d: Fast one-dimensional hierar- chical attention for sequences.arXiv preprint arXiv:2107.11906, 2021. 25