Paper deep dive

A Lightweight Multi-Cancer Tumor Localization Framework for Deployable Digital Pathology

Brian Isett, Rebekah Dadey, Aofei Li, Ryan C. Augustin, Kate Smith, Aatur D. Singhi, Qiangqiang Gu, Riyue Bao

Year: 2026Venue: arXiv preprintArea: cs.CVType: PreprintEmbeddings: 36

Abstract

Abstract:Accurate localization of tumor regions from hematoxylin and eosin-stained whole-slide images is fundamental for translational research including spatial analysis, molecular profiling, and tissue architecture investigation. However, deep learning-based tumor detection trained within specific cancers may exhibit reduced robustness when applied across different tumor types. We investigated whether balanced training across cancers at modest scale can achieve high performance and generalize to unseen tumor types. A multi-cancer tumor localization model (MuCTaL) was trained on 79,984 non-overlapping tiles from four cancers (melanoma, hepatocellular carcinoma, colorectal cancer, and non-small cell lung cancer) using transfer learning with DenseNet169. The model achieved a tile-level ROC-AUC of 0.97 in validation data from the four training cancers, and 0.71 on an independent pancreatic ductal adenocarcinoma cohort. A scalable inference workflow was built to generate spatial tumor probability heatmaps compatible with existing digital pathology tools. Code and models are publicly available at this https URL.

PDF

Open source PDF →

PDF not stored locally. Use the link above to view on the source site.

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 98%

Last extracted: 3/13/2026, 12:55:49 AM

Summary

The paper introduces MuCTaL, a lightweight multi-cancer tumor localization framework using a DenseNet169 architecture trained on 79,984 tiles from four cancer types (melanoma, hepatocellular carcinoma, colorectal cancer, and non-small cell lung cancer). The model demonstrates high performance (ROC-AUC 0.97) and generalizes to unseen pancreatic ductal adenocarcinoma (PDAC) with an AUC of 0.71, providing a scalable, deployable solution for digital pathology workflows.

Entities (8)

Colorectal cancer · cancer-type · 100%DenseNet169 · architecture · 100%Hepatocellular carcinoma · cancer-type · 100%Melanoma · cancer-type · 100%MuCTaL · model · 100%Non-small cell lung cancer · cancer-type · 100%Pancreatic ductal adenocarcinoma · cancer-type · 100%QuPath · software · 95%

Relation Signals (3)

MuCTaL → trainedon → Melanoma

confidence 100% · MuCTaL was trained on 79,984 non-overlapping tiles from four cancers (melanoma...)

MuCTaL → usesarchitecture → DenseNet169

confidence 100% · A multi-cancer tumor localization model (MuCTaL) was trained... using transfer learning with DenseNet169.

MuCTaL → generalizesto → Pancreatic ductal adenocarcinoma

confidence 90% · the model was applied to an independent dataset of PDAC slides... achieving an AUC of 0.71

Cypher Suggestions (2)

Find all cancer types used to train the MuCTaL model · confidence 95% · unvalidated

MATCH (m:Model {name: 'MuCTaL'})-[:TRAINED_ON]->(c:CancerType) RETURN c.name

Identify the architecture used by the model · confidence 95% · unvalidated

MATCH (m:Model {name: 'MuCTaL'})-[:USES_ARCHITECTURE]->(a:Architecture) RETURN a.name

Full Text

35,855 characters extracted from source content.

Expand or collapse full text

A Lightweight Multi-Cancer Tumor Localization Framework for Deployable Digital Pathology Brian Isett, PhD 1,2,* ; Rebekah Dadey, PhD 1,2 ; Aofei Li, MD 3,+ ; Ryan C. Augustin, MD 1,2,++ ; Kate Smith, MS 1 ; Aatur D. Singhi, MD 3 ; Qiangqiang Gu, PhD 3,* ; Riyue Bao, PhD 1,2,* 1 UPMC Hillman Cancer Center, Pittsburgh, PA, USA; 2 Malignant Hematology and Medical Oncology Division, Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA; 3 Department of Pathology, University of Pittsburgh, Pittsburgh, PA, USA; + Current address: Division of Dermatopathology, Indiana University, Indianapolis, IN, USA; ++ Current address: Hematology and Medical Oncology Division, Mayo Clinic, Rochester, MN, USA; * Corresponding authors: B.I., bri8@pitt.edu; Q.G., qug14@pitt.edu; R.B., rib37@pitt.edu Abstract Accurate localization of tumor regions from hematoxylin and eosin-stained whole-slide images is fundamental for translational research including spatial analysis, molecular profiling, and tissue architecture investigation. However, deep learning-based tumor detection trained within specific cancers may exhibit reduced robustness when applied across different tumor types. We investigated whether balanced training across cancers at modest scale can achieve high performance and generalize to unseen tumor types. A multi-cancer tumor localization model (MuCTaL) was trained on 79,984 non-overlapping tiles from four cancers (melanoma, hepatocellular carcinoma, colorectal cancer, and non-small cell lung cancer) using transfer learning with DenseNet169. The model achieved a tile-level ROC-AUC of 0.97 in validation data from the four training cancers, and 0.71 on an independent pancreatic ductal adenocarcinoma cohort. A scalable inference workflow was built to generate spatial tumor probability heatmaps compatible with existing digital pathology tools. Code and models are publicly available at https://github.com/AivaraX-AI/MuCTaL. Introduction The adoption of whole-slide imaging (WSI) in digital pathology workflows has enabled large-scale computational analysis of tumor morphology and microenvironment architecture from routine hematoxylin and eosin (H&E) histology slides 1,2 . Manual tumor annotation on WSIs is labor-intensive and often infeasible at scale, motivating the development of automated tumor localization methods 3–7 . Deep learning approaches have substantially advanced tumor detection in histopathology, with many studies demonstrating high performance in specific cancer types. For example, in stomach cancer and colorectal cancer (CRC), convolutional neural networks (CNNs) were used to identify tumor and predict clinically relevant molecular phenotypes such as microsatellite instability directly from H&E slides, supporting the concept that tumor-associated morphological patterns can be learned from routine histology images 8 . In skin cancer, CNNs achieved dermatopathologist-level competency in classifying melanoma (MEL) and other malignant lesions from benign tissue 9 , and collective intelligence of machine-learning algorithms outperformed experts on the diagnosis task of pigmented lesions 10 . In hepatocellular carcinoma (HCC), the inception V3 network 11 demonstrated high accuracy on predicting malignancy, tumor differentiation, and mutated genes 12 . Similarly, in non-small cell lung cancer (NSCLC), deep learning models trained on H&E WSIs have shown automated identification of regions containing neoplastic cells and subsequent slide-level disease subtype classification 13,14 and survival prediction 15,16 . A persistent challenge in computational pathology is the limited generalizability of models trained on single-cancer datasets. Variability in tissue morphology, staining protocols, slide preparation, and scanner characteristics can introduce substantial domain shift between datasets, leading to degraded performance when models are applied to slides originating from different tumor types 17,18 . Conversely, large-scale foundation models trained on thousands of WSIs have demonstrated impressive model generalizability across cancers 19–26 . While these models represent a major step toward universal histopathologic feature extraction, their development typically requires extensive multi- institutional data aggregation, centralized harmonization, and substantial computational infrastructure that may not be readily available in many translational research settings 1,27 . Recent studies have shown that deep learning models trained on histopathology images from one cancer type can capture morphological features that generalize across other tumor types without requiring foundation-scale datasets, revealing conserved spatial patterns in tumor histology 28 . In practice, research teams at community hospitals and academic medical centers often work with limited, heterogeneous datasets assembled across multiple independent projects over time 29 . Developing computational frameworks that leverage balanced, modest-scale multi-cancer training cohorts therefore represents a practical and scalable approach for translational applications 29 , yet remains underexplored. Here, we investigate whether a lightweight multi-cancer training strategy can provide sufficient morphological diversity to support robust tumor localization across heterogeneous histopathology datasets, while remaining computationally tractable. We developed a multi-cancer tumor localization (MuCTaL) model trained on image tiles derived from four tumor types (MEL, HCC, CRC, and NSCLC), and evaluated its performance within each cancer as well as its ability to generalize to an unseen tumor type, pancreatic ductal adenocarcinoma (PDAC). To support integration with digital pathology workflows, we generated heatmap-based visualization to highlight tumor regions within WSIs and export the spatial coordinates of tile-wise classes compatible with open-source tools such as QuPath 30 , providing interpretable insights into tumor localization within complex histologic landscapes. Materials and Methods Study cohorts. Human specimens of MEL, HCC, and PDAC were collected at University of Pittsburgh Medical Center (UPMC). Specimens were evaluated by two pathologists (A.L. for MEL, A.S. for HCC and PDAC) and one oncologist (R.C.A.). Published datasets were obtained for CRC 31 and NSCLC 32 . All patients were consented prior to sample collection. The study was approved by the University of Pittsburgh Institutional Review Board (IRB) (IRB18-177). H&E staining and image acquisition. Data were generated for the institutional cohorts following published protocols 33–35 . All staining procedures were performed at the UPMC Translational Oncologic Pathology Services (TOPS). In brief, formalin-fixed paraffin-embedded (FFPE) tissue blocks were sectioned at 4 μm thickness, and slides were baked at 60 °C for one hour prior to processing. Sections were subsequently cooled, deparaffinized, and rehydrated in distilled water. H&E staining was performed using Hematoxylin 560 MX (Cat# 3801576) and Eosin Phloxine 515 (Cat# 3801606) with Define MX-aq (Cat# 3803598) and Blue Buffer 8 (Cat# 3802918) reagents (Leica Biosystems), according to the manufacturer’s protocols. Stained slides were digitized at 40× magnification using a Leica AT2 whole-slide scanner to generate high-resolution digital images for this study. Training, validation, and test sets. We created a dataset of 79,984 tiles from the four tumor types, with 90% for training, and 10% reserved for validation. An independent cohort of PDAC (7,346 tiles) was used as test set. Training dataset construction. We constructed a multi-cancer tile-level training dataset across four tumor types (MEL, HCC, CRC, NSCLC). From each cancer, non-overlapping image tiles (224x224 pixels) were extracted from annotated tumor and non-tumor regions on WSIs. Preprocessing included several steps: (1) tiles were filtered to remove artifacts and blank ones, retaining only tiles with >70% tissue, (2) blurred tiles and tiles containing blood clots were removed using OpenCV 36 -based image quality filtering, and (3) color normalization and stain augmentation were applied using the Macenko method 37 . To minimize class imbalance and ensure comparable representation across tumor types, we resampled tiles to generate a balanced dataset with a 50:50 tumor /non-tumor tile ratio with approximately 20,000 tiles per tumor type, totaling 79,984 tiles across four tumor types for training. Tile extraction and preprocessing were implemented using the PathML framework 38 (v2.1.1) for standardized slide- to-tile conversion. To ensure compatibility between published and institutional cohorts, evaluation was performed at the tile level, which is a commonly used approach when involving publicly available histopathology tile datasets. For the institutional cohorts where de-identified patient IDs were available (MEL and HCC), data was balanced by patient within each tumor type, ensuring equal representation of patients among the ~20k tiles per cancer via resampling. Model architecture and training procedure. We implemented a CNN using transfer learning from a pretrained DenseNet169 backbone in PyTorch 39 . The output layer was modified to perform binary classification (tumor versus non-tumor). The model was trained for 10 epochs with early layer weights frozen, followed by 20 epochs of fine tuning with max learning rate (LR) of 3e -3 , 5 epochs with base LR 1e -5 and 5 epochs with base LR of 5e -5 to estimate learning rate gradient adjustments. During training, tiles were pre-sized, randomly rotated up to +/- 45 degrees, flipped and cropped with no re-scaling as part of data augmentation. Training was performed on an Nvidia A100 GPU with “batch_size = 375” using the Fastai (v2.7) deep learning framework 40 . All training jobs were executed on the high-performance computing (HPC) clusters at the University of Pittsburgh Center for Research Computing and Data (CRCD). Inference procedure. To enable scalable deployment on WSIs, we built a distributed inference workflow by assigning each WSI to a separate job within the HPC environment using a SLRUM scheduler at the University of Pittsburgh CRCD. For each slide, non-overlapping tiles were extracted using the same preprocessing pipeline applied during training. Tile-level tumor detection probabilities were computed using the trained classifier. Predicted probabilities were spatially reassembled to generate slide-level tumor probability heatmaps. To improve spatial coherence, probability maps were smoothed using Gaussian filtering prior to thresholding. Tumor regions were delineated by applying a probability threshold of 0.5 (reflecting the 50:50 class distribution during training), followed by extraction of contiguous regions using contour detection. Identified tumor contours were rescaled to the original slide resolution and exported as GeoJSON objects compatible with QuPath 30 (v5.0). Model evaluation strategy. Model performance was evaluated at the tile level on the validation and test sets using F1 score, sensitivity, specificity, and receiver operating characteristic-area under the curve (ROC-AUC). Performance metrics were computed both overall and stratified by tumor type to assess variability across datasets. Tile-level classification metrics were computed to quantify performance. Results The overall workflow for multi-cancer tumor localization is illustrated in Fig. 1. WSIs were partitioned into tiles, filtered for quality, normalized, and classified using DenseNet169 trained on a multi-cancer dataset with transfer learning (Fig. 1A). Tile-level tumor predictions were subsequently aggregated to generate spatial heatmaps of tumor detection probability across entire slides (Fig. 1B). Across all validation tiles, the multi-cancer classifier achieved high overall performance with a ROC-AUC of 0.97, F1-score of 0.90, sensitivity of 0.94, and specificity of 0.86 (Fig. 2A). These results indicate that the model can accurately distinguish tumor from non-tumor regions at the tile level across heterogeneous cancer datasets. Figure 1. MuCTaL framework for multi-cancer tumor localization. (A) Training and validation workflow. Whole-slide images (WSIs) from multiple tumor types were partitioned into image tiles and subjected to preprocessing including tissue filtering, artifact removal, and stain normalization. Tiles were then used to train a convolutional neural network (CNN) using transfer learning from a pretrained DenseNet169 architecture to classify tumor versus non-tumor at tile level. (B) Slide-level visualization workflow. Tile-level tumor probabilities were assigned color values and spatially reconstructed to generate spatial heatmaps and contiguous tumor localization masks across the whole slide. When stratified by tumor type, classification performance varied across datasets (Fig. 2B; Table 1). The model achieved near-perfect performance for CRC (AUC = 0.9999) and NSCLC (AUC = 1.00), with corresponding F1 scores approaching 1.0. MEL tiles also demonstrated high performance (AUC = 0.96, F1 = 0.89). Performance was lower for HCC (AUC = 0.79, F1 = 0.74). Examination of misclassification rates revealed that the highest proportion of misclassified tiles occurred in the HCC cohort (27%), followed by MEL (11%), whereas CRC and NSCLC cohorts exhibited minimal errors. To evaluate whether the multi-cancer classifier shows generalization, the model was applied to an independent dataset of PDAC slides that were not included in training, achieving an AUC of 0.71 (Fig. 2B, Table 1). Together, these findings demonstrated that multi-cancer training accurately detected tumor localization and retained the ability to identify malignant morphology that extend beyond individual tumor types. Beyond classification, we reconstructed spatial tumor localization across WSIs. As shown in Fig. 2C, tile-level tumor probabilities were aggregated to generate slide-level heatmaps. These heatmaps highlight tumor-enriched regions and spatial patterns of tumor localization within the surrounding tissue architecture. To facilitate downstream analysis, the probability maps were subsequently processed using Gaussian smoothing and thresholding to generate contiguous tumor masks. Extracted tumor contours were then rescaled to the original slide resolution and exported as GeoJSON objects. These results can be directly imported into open-source platforms such as QuPath for interactive exploration, providing a practical tool for integrative analysis with translational research workflows. Discussion In this study, we evaluated whether multi-cancer training at modest scale can support robust tumor localization in translational research environments. Our results demonstrate that introducing morphological diversity through multi- cancer training captures shared features of malignancy across tumor types while remaining computationally efficient. This strategy provides a practical framework for developing deployable tumor localization models that can operate across heterogeneous histopathology datasets. Prior studies often focused on training individual cancer paradigms, using highly curated datasets such as Camelyon16 and Camelyon17 41–43 , and the Breast Cancer Histology 44–46 (BACH) challenge datasets. While these models can achieve high performance within domain, their performance may degrade when applied to slides from different tumor types. These observations are consistent with broader findings that emphasizes domain shift remains a major challenge for generalizable histopathology models 17 . At the other end of the spectrum, large-scale models trained on tens of thousands of WSIs across diverse cancers have shown strong cross-cancer transferability. For example, a pan-cancer classifier trained on 27,000+ WSIs from 19 tumor types achieved a ROC-AUC of 0.99 on tumor/normal status 28 . Moreover, while foundation models offer powerful general-purpose representations 19–26 , their deployment often relies on access to pretrained models and additional fine-tuning in specialized infrastructure, which may not always be readily available in translational research environments. Our work occupies an intermediate space between these paradigms. Rather than maximizing scale, we investigated whether balanced sampling across multiple tumor types can support model robustness while maintaining a modest dataset size. In this context, multi-cancer training can be viewed as a data-efficiency strategy for learning tumor- associated histological patterns that potentially generalize across cancers. Consistent with this hypothesis, the multi- cancer classifier demonstrated high tumor localization performance across MEL, HCC, CRC, NSCLC and retained the ability to detect malignant morphology in PDAC. We observed differential error rates across tumor cohorts, with increased misclassification in the institutional HCC cohort. This may reflect the inherent morphological heterogeneity of HCC tumors 47 . Similar challenges have been reported in prior studies, which highlight the sensitivity of histopathology models to tissue heterogeneity 18,48 . Proposed mitigation strategies include stain normalization, color augmentation, and domain-adversarial training 18,49 . While our current framework incorporated standard augmentation approaches, future work will explore explicit domain adaptation strategies to further improve cross-tumor generalization. Beyond classification accuracy, we built a deployable inference workflow with the reconstruction of whole-slide tumor probability maps and tumor contours as GeoJSON objects compatible with digital pathology platforms. Our framework enables rapid tumor region identification and extraction without manual slide annotation for downstream spatial and molecular analyses, and facilitates the integration of tumor localization models into practical research workflows. Several limitations should be noted. Our evaluation was conducted on five datasets and did not include large-scale multi-institution benchmarking. Slide-level metadata were not available for some public datasets used in this study, preventing strict enforcement of slide-level independence between training and validation tiles. Finally, although the model demonstrated generalization to an unseen tumor type, broader validation across additional cancers will be Figure 2. Performance and spatial visualization of the multi-cancer tumor localization model. (A) Receiver operating characteristic (ROC) curve for tile-level classification across all validation datasets. (B) ROC curves stratified by tumor type for melanoma (MEL), hepatocellular carcinoma (HCC), colorectal cancer (CRC), and non-small cell lung cancer (NSCLC). Pancreatic ductal adenocarcinoma (PDAC), which was not included during model training, was used as an independent evaluation dataset to assess model generalization. (C) Example of slide-level tumor localization generated from tile-level predictions (shown is MEL). Tumor probability scores were assigned to individual tiles and spatially reconstructed into a whole-slide probability heatmap. For the visualization of tumor-enriched regions within the tissue section, warmer colors indicate higher predicted probability of tumor presence. P(Pos) denotes predicted probability of tiles on the positive class (tumor). Table 1. Tile-level classification performance of the multi-cancer tumor localization classifier across tumor types. Performance metrics are reported for each tumor cohort in the validation and test sets. necessary to fully characterize cross-tumor robustness. Future studies will extend this framework to larger and more diverse datasets and systematically evaluate domain-shift effects across institutions and imaging platforms. In conclusion, multi-cancer training at modest scale provides a practical strategy for tumor localization across heterogeneous cancer datasets. Lightweight multi-cancer models offer a feasible alternative to both single-cancer models and foundation-scale approaches, providing a scalable pathway for broader adoption of artificial intelligence-driven tumor localization in translational research environments. Acknowledgement We thank the patients and families for their participation in this study. We thank Dr. Fangping Mu for technical assistance at the University of Pittsburgh Center for Research Computing and Data (CRCD) high-performance computing clusters (HPC). We acknowledge Hillman Career Acceleration Fellow for Innovative Cancer Research (R.B.) from the Hillman Program made possible by the Henry L. Hillman Foundation. Funding. This work was supported in part by National Cancer Institute (NCI) through the UPMC Hillman Cancer Center CCSG award P30CA047904 (R.B.), and The University of Pittsburgh CRCD through the resources provided, specifically the GPU clusters supported by NIH S10OD028483. This project used the UPMC HCC Cancer Bioinformatics Facility (CBS) and Translational Oncologic Pathology Services (TOPS). Role of funding sources. The funding sources had no role in the study design, data collection, data analysis, interpretation, or writing of the manuscript. Author’s contributions R.B. and B.I. conceived the study. R.B. supervised the study and provided funding. B.I. designed the methodology, implemented and built the codebase, and generated results, including image processing, model training, and validation. R.D. obtained specimens and organized data generation. A.L. performed pathology annotation in melanoma samples. A.D.S. performed pathology annotation in liver and pancreatic samples. R.C.A. provided oncology insights and reviewed clinical data. K.S. scanned the H&E slides and acquired the images. Q.G. organized and cleaned the codebase. B.I., Q.G., and R.B. wrote the manuscript. R.B. and Q.G. edited the manuscript. All authors provided feedback and approved the manuscript. Declaration of Competing Interests R.B. declares PCT/US15/612657 (Cancer Immunotherapy), PCT/US18/36052 (Microbiome Biomarkers for Anti- PD-1/PD-L1 Responsiveness: Diagnostic, Prognostic and Therapeutic Uses Thereof), PCT/US63/055227 (Methods and Compositions for Treating Autoimmune and Allergic Disorders). Other authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Data availability Public datasets used in this study, including CRC and NSCLC datasets, are available from the original publications cited in the manuscript. Institutional datasets used in this study are not publicly available due to patient privacy and institutional data use restrictions. Correspondence and requests for materials should be addressed to R.B. (rib37@pitt.edu). Code and trained models for the MuCTaL framework are publicly available at https://github.com/AivaraX-AI/MuCTaL. References 1. Bera K, Schalper KA, Rimm DL, Velcheti V, Madabhushi A. Artificial intelligence in digital pathology - new tools for diagnosis and precision oncology. Nat Rev Clin Oncol. 2019 Nov;16(11):703–15. doi:10.1038/s41571- 019-0252-y PubMed PMID: 31399699; PubMed Central PMCID: PMC6880861. 2. Marx V. Method of the Year: spatially resolved transcriptomics. Nat Methods. 2021 Jan;18(1):9–14. doi:10.1038/s41592-020-01033-y PubMed PMID: 33408395. 3. Campanella G, Hanna MG, Geneslaw L, Miraflor A, Werneck Krauss Silva V, Busam KJ, et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med. 2019 Aug;25(8):1301–9. doi:10.1038/s41591-019-0508-1 PubMed PMID: 31308507; PubMed Central PMCID: PMC7418463. 4. Janowczyk A, Madabhushi A. Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases. J Pathol Inform. 2016;7:29. doi:10.4103/2153-3539.186902 PubMed PMID: 27563488; PubMed Central PMCID: PMC4977982. 5. Verma S, Malusare A, Wang M, Wang L, Mahapatra A, English A, et al. AnnotateAnyCell: Open-Source AI Framework for Efficient Annotation in Digital Pathology [Internet]. Pathology; 2025 [cited 2026 Mar 5]. Available from: http://biorxiv.org/lookup/doi/10.1101/2025.11.02.686114 doi:10.1101/2025.11.02.686114 6. Doerfler R, Chen J, Kim C, Smith JD, Harris M, Singh KB, et al. Integrating artificial intelligence-driven digital pathology and genomics to establish patient-derived organoids as new approach methodologies for drug response in head and neck cancer. Oral Oncol. 2025 Dec 1;171:107742. doi:10.1016/j.oraloncology.2025.107742 7. Xia R, Littlefield N, Bao R, Gu Q. Beyond algorithmic performance: translational gaps in implementing artificial intelligence for clinical digital pathology. J Histotechnol. 2026 Jan 2;49(1):1–3. doi:10.1080/01478885.2026.2630525 8. Kather JN, Pearson AT, Halama N, Jäger D, Krause J, Loosen SH, et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat Med. 2019 Jul;25(7):1054–6. doi:10.1038/s41591-019-0462-y PubMed PMID: 31160815; PubMed Central PMCID: PMC7423299. 9. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017 Feb 2;542(7639):115–8. doi:10.1038/nature21056 PubMed PMID: 28117445; PubMed Central PMCID: PMC8382232. 10. Tschandl P, Codella N, Akay BN, Argenziano G, Braun RP, Cabo H, et al. Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web- based, international, diagnostic study. Lancet Oncol. 2019 Jul;20(7):938–47. doi:10.1016/S1470- 2045(19)30333-X PubMed PMID: 31201137; PubMed Central PMCID: PMC8237239. 11. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the Inception Architecture for Computer Vision [Internet]. arXiv; 2015 [cited 2026 Mar 6]. Available from: https://arxiv.org/abs/1512.00567 doi:10.48550/ARXIV.1512.00567 12. Liao H, Long Y, Han R, Wang W, Xu L, Liao M, et al. Deep learning-based classification and mutation prediction from histopathological images of hepatocellular carcinoma. Clin Transl Med. 2020 Jun;10(2):e102. doi:10.1002/ctm2.102 PubMed PMID: 32536036; PubMed Central PMCID: PMC7403820. 13. Wei JW, Tafe LJ, Linnik YA, Vaickus LJ, Tomita N, Hassanpour S. Pathologist-level classification of histologic patterns on resected lung adenocarcinoma slides with deep neural networks. Sci Rep. 2019 Mar 4;9(1):3358. doi:10.1038/s41598-019-40041-7 PubMed PMID: 30833650; PubMed Central PMCID: PMC6399447. 14. Diosdado J, Gilabert P, Seguí S, Borrego H. LungHist700: A dataset of histological images for deep learning in pulmonary pathology. Sci Data. 2024 Oct 5;11(1):1088. doi:10.1038/s41597-024-03944-3 PubMed PMID: 39368979; PubMed Central PMCID: PMC11455975. 15. Wang S, Chen A, Yang L, Cai L, Xie Y, Fujimoto J, et al. Comprehensive analysis of lung cancer pathology images to discover tumor shape and boundary features that predict survival outcome. Sci Rep. 2018 Jul 10;8(1):10393. doi:10.1038/s41598-018-27707-4 PubMed PMID: 29991684; PubMed Central PMCID: PMC6039531. 16. Kludt C, Wang Y, Ahmad W, Bychkov A, Fukuoka J, Gaisa N, et al. Next-generation lung cancer pathology: Development and validation of diagnostic and prognostic algorithms. Cell Rep Med. 2024 Sep 17;5(9):101697. doi:10.1016/j.xcrm.2024.101697 PubMed PMID: 39178857; PubMed Central PMCID: PMC11524894. 17. Stacke K, Eilertsen G, Unger J, Lundstrom C. Measuring Domain Shift for Deep Learning in Histopathology. IEEE J Biomed Health Inform. 2021 Feb;25(2):325–36. doi:10.1109/JBHI.2020.3032060 PubMed PMID: 33085623. 18. Tellez D, Litjens G, Bándi P, Bulten W, Bokhorst JM, Ciompi F, et al. Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology. Med Image Anal. 2019 Dec;58:101544. doi:10.1016/j.media.2019.101544 PubMed PMID: 31466046. 19. Ding T, Wagner SJ, Song AH, Chen RJ, Lu MY, Zhang A, et al. A multimodal whole-slide foundation model for pathology. Nat Med. 2025 Nov;31(11):3749–61. doi:10.1038/s41591-025-03982-3 PubMed PMID: 41193692; PubMed Central PMCID: PMC12618242. 20. Vorontsov E, Bozkurt A, Casson A, Shaikovski G, Zelechowski M, Severson K, et al. A foundation model for clinical-grade computational pathology and rare cancers detection. Nat Med. 2024 Oct;30(10):2924–35. doi:10.1038/s41591-024-03141-0 PubMed PMID: 39039250; PubMed Central PMCID: PMC11485232. 21. Yang Z, Wei T, Liang Y, Yuan X, Gao R, Xia Y, et al. A foundation model for generalizable cancer diagnosis and survival prediction from histopathological images. Nat Commun. 2025 Mar 10;16(1):2366. doi:10.1038/s41467-025-57587-y PubMed PMID: 40064883; PubMed Central PMCID: PMC11894166. 22. Lu MY, Chen B, Williamson DFK, Chen RJ, Liang I, Ding T, et al. A visual-language foundation model for computational pathology. Nat Med. 2024 Mar;30(3):863–74. doi:10.1038/s41591-024-02856-4 PubMed PMID: 38504017; PubMed Central PMCID: PMC11384335. 23. Huang Z, Bianchi F, Yuksekgonul M, Montine TJ, Zou J. A visual-language foundation model for pathology image analysis using medical Twitter. Nat Med. 2023 Sep;29(9):2307–16. doi:10.1038/s41591-023-02504-3 PubMed PMID: 37592105. 24. Neidlinger P, El Nahhas OSM, Muti HS, Lenz T, Hoffmeister M, Brenner H, et al. Benchmarking foundation models as feature extractors for weakly supervised computational pathology. Nat Biomed Eng. 2025 Oct 1. doi:10.1038/s41551-025-01516-3 PubMed PMID: 41034516. 25. Xu H, Usuyama N, Bagga J, Zhang S, Rao R, Naumann T, et al. A whole-slide foundation model for digital pathology from real-world data. Nature. 2024 Jun;630(8015):181–8. doi:10.1038/s41586-024-07441-w PubMed PMID: 38778098; PubMed Central PMCID: PMC11153137. 26. Wang X, Zhao J, Marostica E, Yuan W, Jin J, Zhang J, et al. A pathology foundation model for cancer diagnosis and prognosis prediction. Nature. 2024 Oct;634(8035):970–8. doi:10.1038/s41586-024-07894-z PubMed PMID: 39232164; PubMed Central PMCID: PMC12186853. 27. Lu MY, Williamson DFK, Chen TY, Chen RJ, Barbieri M, Mahmood F. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat Biomed Eng. 2021 Jun;5(6):555–70. doi:10.1038/s41551- 020-00682-w PubMed PMID: 33649564; PubMed Central PMCID: PMC8711640. 28. Noorbakhsh J, Farahmand S, Foroughi Pour A, Namburi S, Caruana D, Rimm D, et al. Deep learning-based cross-classifications reveal conserved spatial behaviors within tumor histological images. Nat Commun. 2020 Dec 11;11(1):6367. doi:10.1038/s41467-020-20030-5 PubMed PMID: 33311458; PubMed Central PMCID: PMC7733499. 29. Hosseini MS, Bejnordi BE, Trinh VQH, Chan L, Hasan D, Li X, et al. Computational pathology: A survey review and the way forward. J Pathol Inform. 2024 Dec;15:100357. doi:10.1016/j.jpi.2023.100357 PubMed PMID: 38420608; PubMed Central PMCID: PMC10900832. 30. Bankhead P, Loughrey MB, Fernández JA, Dombrowski Y, McArt DG, Dunne PD, et al. QuPath: Open source software for digital pathology image analysis. Sci Rep. 2017 Dec 4;7(1):16878. doi:10.1038/s41598-017- 17204-5 PubMed PMID: 29203879; PubMed Central PMCID: PMC5715110. 31. Kather JN, Halama N, Marx A. 100,000 Histological Images Of Human Colorectal Cancer And Healthy Tissue [Internet]. Zenodo; 2018 [cited 2026 Mar 2]. Available from: https://zenodo.org/record/1214456 doi:10.5281/ZENODO.1214456 32. Borkowski A, Bui M, Thomas LB, Wilson CP, DeLand LA, Mastorides SM. Lung and Colon Cancer Histopathological Image Dataset (LC25000) [Internet]. arXiv; 2019 [cited 2026 Mar 2]. Available from: https://arxiv.org/abs/1912.12142 doi:10.48550/ARXIV.1912.12142 33. Dadey RE, Li R, Griner J, Chen J, Singh A, Isett B, et al. Multiomics identifies tumor-intrinsic SREBP1 driving immune exclusion in hepatocellular carcinoma. J Immunother Cancer. 2025 Jun 15;13(6):e011537. doi:10.1136/jitc-2025-011537 PubMed PMID: 40518290; PubMed Central PMCID: PMC12314812. 34. Augustin RC, Newman S, Li A, Joy M, Lyons M, Pham MP, et al. Identification of tumor-intrinsic drivers of immune exclusion in acral melanoma. J Immunother Cancer. 2023 Oct;11(10):e007567. doi:10.1136/jitc-2023- 007567 PubMed PMID: 37857525; PubMed Central PMCID: PMC10603348. 35. Nguyen MK, Jelinek M, Singh A, Isett B, Myers ES, Mullett SJ, et al. Clinical and translational study of ivosidenib plus nivolumab in advanced solid tumors harboring IDH1 mutations. The Oncologist. 2025 Nov 11;30(11):oyaf362. doi:10.1093/oncolo/oyaf362 PubMed PMID: 41138165; PubMed Central PMCID: PMC12605714. 36. Bradski G. The OpenCV Library. Dr Dobbs J Softw Tools. 2000. 37. Niethammer M, Borland D, Marron JS, Woosley J, Thomas NE. Appearance Normalization of Histology Slides. Mach Learn Med Imaging MLMI. 2010;6357:58–66. doi:10.1007/978-3-642-15948-0_8 PubMed PMID: 25360444; PubMed Central PMCID: PMC4211434. 38. Rosenthal J, Carelli R, Omar M, Brundage D, Halbert E, Nyman J, et al. Building Tools for Machine Learning and Artificial Intelligence in Cancer Research: Best Practices and a Case Study with the PathML Toolkit for Computational Pathology. Mol Cancer Res MCR. 2022 Feb;20(2):202–6. doi:10.1158/1541-7786.MCR-21- 0665 PubMed PMID: 34880124; PubMed Central PMCID: PMC9127877. 39. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An Imperative Style, High- Performance Deep Learning Library [Internet]. arXiv; 2019 [cited 2026 Mar 6]. Available from: http://arxiv.org/abs/1912.01703 doi:10.48550/arXiv.1912.01703 40. Howard J, Gugger S. Fastai: A Layered API for Deep Learning. Information. 2020 Feb 16;11(2):108. doi:10.3390/info11020108 41. Litjens G, Bandi P, Ehteshami Bejnordi B, Geessink O, Balkenhol M, Bult P, et al. 1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset. GigaScience. 2018 Jun 1;7(6):giy065. doi:10.1093/gigascience/giy065 PubMed PMID: 29860392; PubMed Central PMCID: PMC6007545. 42. Bandi P, Geessink O, Manson Q, Van Dijk M, Balkenhol M, Hermsen M, et al. From Detection of Individual Metastases to Classification of Lymph Node Status at the Patient Level: The CAMELYON17 Challenge. IEEE Trans Med Imaging. 2019 Feb;38(2):550–60. doi:10.1109/TMI.2018.2867350 PubMed PMID: 30716025. 43. Kim YG, Kim S, Cho CE, Song IH, Lee HJ, Ahn S, et al. Effectiveness of transfer learning for enhancing tumor classification with a convolutional neural network on frozen sections. Sci Rep. 2020 Dec 14;10(1):21899. doi:10.1038/s41598-020-78129-0 44. Aresta G, Araújo T, Kwok S, Chennamsetty S, Safwan M, Alex V, et al. BACH: Grand challenge on breast cancer histology images. Med Image Anal. 2019 Aug;56:122–39. doi:10.1016/j.media.2019.05.010 PubMed PMID: 31226662. 45. Gu Q. Deep Learning in Digital Pathology [Internet]. 2023 Jul [cited 2026 Mar 7]. Available from: https://hdl.handle.net/11299/259718 46. Gu Q, Prodduturi N, Hart SN. Deep Learning in Automating Breast Cancer Diagnosis from Microscopy Images. In. American Society of Mechanical Engineers Digital Collection; 2024 [cited 2026 Mar 7]. Available from: https://dx.doi.org/10.1115/DMD2024-1017 doi:10.1115/DMD2024-1017 47. Torbenson MS. Hepatocellular carcinoma: making sense of morphological heterogeneity, growth patterns, and subtypes. Hum Pathol. 2021 Jun;112:86–101. doi:10.1016/j.humpath.2020.12.009 PubMed PMID: 33387587; PubMed Central PMCID: PMC9258523. 48. Komura D, Ochi M, Ishikawa S. Machine learning methods for histopathological image analysis: Updates in 2024. Comput Struct Biotechnol J. 2025;27:383–400. doi:10.1016/j.csbj.2024.12.033 PubMed PMID: 39897057; PubMed Central PMCID: PMC11786909. 49. Lafarge MW, Pluim JPW, Eppenhof KAJ, Moeskops P, Veta M. Domain-adversarial neural networks to address the appearance variability of histopathology images [Internet]. 2017. doi:10.48550/ARXIV.1707.06183