Paper deep dive

AI End-to-End Radiation Treatment Planning Under One Second

Simon Arberet, Riqiang Gao, Martin Kraus, Florin C. Ghesu, Wilko Verbakel, Mamadou Diallo, Anthony Magliari, Venkatesan Karuppusamy, Sushil Beriwal, REQUITE Consortium, Ali Kamen, Dorin Comaniciu

Year: 2026Venue: arXiv preprintArea: eess.IVType: PreprintEmbeddings: 82

Abstract

Abstract:Artificial intelligence-based radiation therapy (RT) planning has the potential to reduce planning time and inter-planner variability, improving efficiency and consistency in clinical workflows. Most existing automated approaches rely on multiple dose evaluations and corrections, resulting in plan generation times of several minutes. We introduce AIRT (Artificial Intelligence-based Radiotherapy), an end-to-end deep-learning framework that directly infers deliverable treatment plans from CT images and structure contours. AIRT generates single-arc VMAT prostate plans, from imaging and anatomical inputs to leaf sequencing, in under one second on a single Nvidia A100 GPU. The framework includes a differentiable dose feedback, an adversarial fluence map shaping, and a plan generation augmentation to improve plan quality and robustness. The model was trained on more than 10,000 intact prostate cases. Non-inferiority to RapidPlan Eclipse was demonstrated across target coverage and OAR sparing metrics. Target homogeneity (HI = 0.10 $\pm$ 0.01) and OAR sparing were similar to reference plans when evaluated using AcurosXB. These results represent a significant step toward ultra-fast standardized RT planning and a streamlined clinical workflow.

PDF

Open source PDF →

PDF not stored locally. Use the link above to view on the source site.

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 94%

Last extracted: 3/13/2026, 12:18:43 AM

Summary

AIRT is an end-to-end deep-learning framework that generates deliverable single-arc VMAT prostate radiation treatment plans from CT images and structure contours in under one second. It utilizes a differentiable dose-feedback mechanism and adversarial fluence map shaping to achieve plan quality comparable to clinical standards like RapidPlan Eclipse, significantly reducing planning time and inter-planner variability.

Entities (5)

AIRT · framework · 100%AcurosXB · dose-engine · 95%RapidPlan Eclipse · software · 95%VMAT · technique · 95%Nvidia A100 · hardware · 90%

Relation Signals (3)

AIRT → comparedto → RapidPlan Eclipse

confidence 95% · Non-inferiority to RapidPlan Eclipse was demonstrated

AIRT → generates → VMAT

confidence 95% · AIRT generates single-arc VMAT prostate plans

AIRT → runson → Nvidia A100

confidence 90% · in under one second on a single Nvidia A100 GPU

Cypher Suggestions (2)

Identify comparative studies between AI frameworks and clinical software · confidence 95% · unvalidated

MATCH (a:Framework)-[:COMPARED_TO]->(s:Software) RETURN a.name, s.name

Find all frameworks and their associated hardware requirements · confidence 90% · unvalidated

MATCH (f:Framework)-[:RUNS_ON]->(h:Hardware) RETURN f.name, h.name

Full Text

81,529 characters extracted from source content.

Expand or collapse full text

AI End-to-End Radiation Treatment Planning Under One Second Simon Arberet 1 , Riqiang Gao 1 , Martin Kraus 2 , Florin C. Ghesu 2 , Wilko Verbakel 3 , Mamadou Diallo 2 , Anthony Magliari 3 , Venkatesan Karuppusamy 1 , Sushil Beriwal 3 , REQUITE Consortium, Ali Kamen 1 , Dorin Comaniciu 1 1 Digital Technology and Innovation, Siemens Healthineers, Princeton, NJ, USA. 2 Digital Technology and Innovation, Siemens Healthineers, Erlangen, Germany. 3 Varian Medical Affairs, Siemens Healthineers, Palo Alto, CA, USA. Contributing authors: simon.arberet@siemens-healthineers.com; Abstract Artificial intelligence-based radiation therapy (RT) planning has the potential to reduce planning time and inter-planner variability, improving efficiency and consistency in clinical workflows. Most existing automated approaches rely on multiple dose evaluations and corrections, resulting in plan generation times of several minutes. We introduce AIRT (Artificial Intelligence-based Radiotherapy), an end-to-end deep-learning framework that directly infers deliverable treatment plans from CT images and structure contours. AIRT generates single-arc VMAT prostate plans, from imaging and anatomical inputs to leaf sequencing, in under one second on a single Nvidia A100 GPU. The framework includes a differen- tiable dose feedback, an adversarial fluence map shaping, and a plan generation augmentation to improve plan quality and robustness. The model was trained on more than 10,000 intact prostate cases. Non-inferiority to RapidPlan Eclipse was demonstrated across target coverage and OAR sparing metrics. Target homo- geneity (HI = 0.10± 0.01) and OAR sparing were similar to reference plans when evaluated using AcurosXB. These results represent a significant step toward ultra-fast standardized RT planning and a streamlined clinical workflow. Keywords: Radiation therapy, Treatment planning, Fluence map optimization, Multi-leaf collimator (MLC), Leaf sequencing algorithm, Dose distribution, Deep learning, VMAT (Volumetric Modulated Arc Therapy), Automated planning 1 arXiv:2603.06338v1 [eess.IV] 6 Mar 2026 1 Introduction The main goals of automatic radiation therapy are to reduce planning time, improve inter-planner dose consistency, and maintain or enhance plan quality [1, 2]. How- ever, conventional inverse planning techniques rely on lengthy iterative optimization algorithms [3, 4] and require substantial planner expertise [2, 5]. These depen- dences prevent the availability of fast, consistent and high-quality treatment planning everywhere in the world, without the need for an experienced planner. Most recent AI-based approaches for automatic radiation therapy planning are based on reinforcement learning (RL) [6–11] or model predictive control (MPC) [12]. These methods often involve multiple time-consuming iterative steps and/or interac- tions with the treatment planning system (TPS), resulting in planning times on the order of minutes. Some recent RL methods can achieve sub-minute inference time [10], but then rely on additional preprocessing and coarse dose resolutions to remain tractable, which can limit their plan quality. In addition, RL-based methods typi- cally decompose their planning into a sequence of local control-point steps, favoring computational tractability. This may come at the expense of global plan optimality. In this work, we present a novel deep-learning (DL) method that can generate single-arc VMAT plans for prostate (without lymph nodes), including leaf sequencing, in under one second. Built around recent studies by the authors on AI for auto- matic radiation therapy planning [13–17], the method produces clinically deliverable plans directly from patient anatomy and achieves, for the first time, fully automated single-arc VMAT planning with a dosimetric quality similar to clinically used TPS optimization (RapidPlan Eclipse). The method learns from a large dataset of TPS plans while leveraging a differentiable dose-feedback mechanism that refines the fluence maps to improve clinically relevant objectives, particularly target dose homogeneity, all within an end-to-end pipeline with sub-second inference time. The dose-feedback mechanism includes a DL dose computation internally, distinct from the final dose evaluation. This mechanism improves plan quality relative to a simple feed-forward approaches [13, 18] and improves planning speed compared to methods requiring multiple dose calculations [6–11]. This dose feedback mechanism also enables the implementation of a MCO-like approach where the user can manipulate a slider to vary the target homogeneity vs OAR sparing trade-off. However, this feature was only partially explored and is not the main focus of the present work. By generating high-quality and consistent plans automatically, the present approach could improve access to advanced radiation ther- apy techniques in regions of the world where there is a scarcity of qualified dosimetrists and physicists. 2 Related Work We can identify four directions of work in the field of AI-based automatic radiation treatment planning: 1) Reinforcement learning (RL) approaches that predict fluence maps of machine parameters [6–10]; 2) RL methods that adjust the dose-volume objec- tives with the treatment planning system (TPS) in the loop [11]; 3) model-predictive 2 control (MPC) that predict the plan quality and optimize objectives [12]; 4) direct approaches which predict fluence maps with a feed-forward network [18]. Recent studies have demonstrated the feasibility of deep RL approaches for IMRT fluence map painting and machine parameter optimization, including single and dual-arc VMAT [6–8]. A recent study by the authors [9] also uses deep RL for leaf- sequencing VMAT fluence maps, but without providing a robust TPS-free planning pipeline from CT imaging to final plan. A very recent TPS-free RL method [10] reports a multi-second inference time. However it depends on additional pre-processing on the order of more than 20 seconds. Other works adjust objective constraints while invok- ing TPS after each step [11]. MPC was also used to predict future dose responses and optimize discrete objective changes [12]. In this line of work, the treatment planning system (TPS) is used for iterative refinement and/or episodic rewards. When TPS warm-start and refinements are used, planning time ranges from 80 to 100 seconds. Beyond RL/MPC, other approaches explored learning-based dose and fluence predic- tion in order to accelerate planning. For example, transformer models have been used to predict fluence maps for multi-beam IMRT plans, but it does not close the loop with a differentiable dose feedback [18]. In another line of research, physics-informed differentiable dose engines enabled gradient-based optimization, but they still require an optimization loop to yield a complete clinical artifact [19] and require even longer optimization time than classical TPS optimization. This approach differs by multiple aspects: It is a full end-to-end pipeline without any TPS interaction. It contains a sequence of feed-forward modules for ultra-fast execution: auto-contouring, dose proposal, Bev2Fluence prediction, differentiable dose computation, single-pass dose-error correction, and finally leaf sequencing. One main contribution is a differentiable dose feedback mechanism which can optimize dose met- rics accurately in a single feed-forward pass as opposed to multiple TPS refinements. We also address sequencability using an adversarial loss on the fluence maps and an ad-hoc VMAT leaf-sequencer. The pipeline is fast and can generate single arc VMAT plans in <1 s (including leaf-sequencing) which is faster, by multiple degrees of magni- tude, to RL+TPS pipelines that require≈ 80–100 s including TPS refinement [7, 8]. By coupling a full AI feed-forward end-to-end pipeline with a differentiable dose-feedback mechanism before sequencing, this method addresses missing capabilities essential for ultra-fast, deployable VMAT planning. OARs sparing user control ERR Predicted fluence maps 3D dose error map Leaf- sequencing E2E Network CT RTstruct Deliverable Plan RT Plan DICOM BEV Proj BEV Proj DoseProposer Bev2Fluence Network DL Dose computa�on Bev2Fluence Correc�on Network Dose feedback Mechanism Fig. 1: AIRT end-to-end pipeline for AI VMAT plan generation. 3 3 Results 3.1 Overview The end-to-end AI-based VMAT planning pipeline (AIRT) was evaluated on a dataset of intact prostate cases, against RapidPlan plans [20, 21] optimized in Eclipse using the Photon Optimizer (PO) algorithm with AcurosXB (see Methods section 5 for details). Performance was assessed in terms of dosimetric quality, statistical non-inferiority to clinical plans, computational efficiency, adaptability to user-defined organ sparing, and robustness across different patient anatomies. For the dosimetric evaluations, multiple dose computation engines were employed: 1) a physics-informed DL engine [15] which is also integrated in the dose feedback mechanism (see Methods section), 2) an in-house LTBE solver (LTBS in short) derived from the AcurosXB codebase as described in [15], 3) the Eclipse A dose engine, 4) the Eclipse AcurosXB dose engine. Unlike Eclipse AcurosXB and A, the DL dose and LTBS dose engines are using a simplified source model. As a result, metrics may differ when evaluated with clinical dose engines (A and AcurosXB). For conciseness, only results obtained with the DL dose engine and Eclipse AcurosXB are reported in the main manuscript, while results from all four dose engines are provided in the supplement material. The pipeline delivers VMAT plans in under one second per case on a single GPU, demonstrating that it is both clinically feasible and computationally efficient under steady-state inference conditions (i.e. after one-time model and GPU initialization). A compute time breakdown analysis of our pipeline is available in Supplementary Table S4. Figure 1 illustrates the full pipeline. 3.2 Dosimetric Performance All dose metrics were evaluated using DL dose engine and AcurosXB (see above). It is important to note that AcurosXB, while widely considered a clinical refence standard for dose calculation, is itself a computational model and not the absolute ground truth. Each planning method, when evaluated using its “native” dose engine (AIRT with DL dose, Eclipse with AcurosXB), will naturally make it look best. Plans generated by the AIRT method were comparable in quality to those of the Eclipse plans. When evaluated with DL dose engine, AIRT plans had lower target homogeneity index (HI) values (mean PTV HI = 0.11) compared to Eclipse plans (mean PTV HI = 0.16). However this difference likely reflects the difference between the DL dose and AcurosXB rather than improvement in plan quality. To unsure fair comparison, we also evaluated plans generated by both methods using the AcurosXB dose engine. In this case, HI values were similar between the two methods even though the AIRT method used DL dose in both its training and inference. This demonstrates that AIRT closely matches Eclipse optimized plan quality and suggests that further improvements may be achieved if the DL dose model was more closely aligned with AcurosXB. Table 1 shows dose-volume histogram (DVH) metrics of the AIRT method vs Eclipse plans using AcurosXB. DVHs using the other dose engines are provided in the supplementary material. Organs-at-risk (OARs) received comparable dose metrics with both planning methods. When evaluating with the DL dose engine, rectum doses 4 were slightly lower for AIRT plans while bladder doses were slightly higher. Similar trends were observed for the Rectum D50 when evaluated with AcurosXB. Table 1: Comparison of Eclipse and AIRT planning for intact prostate VMAT cases. All Eclipse plans were optimized using AcurosXB and AIRT plans used the DL dose. For fair comparison, dose for both plan types, was calculated using the DL dose and AcurosXB engines. The Homogeneity Index (HI) is defined as (D 2% −D 98% )/D 50% following the ICRU convention (lower is better). Data are presented as mean (SD). A * indicates p < 0.05 (Wilcoxon signed-rank test). Dose Evaluation Engine→DL doseEclipse Acuros StructureMetricEclipseAIEclipseAI PTV HI0.16 (0.03)*0.11 (0.02)*0.10 (0.01)*0.10 (0.01)* D98 (Gy)38.4 (0.8)*38.8 (0.4)*39.3 (0.2)39.3 (0.2) Bladder Dmean (Gy)6.7 (3.3)*6.9 (3.5)*7.3 (3.5)*7.7 (3.8)* D50 (Gy)2.9 (2.5)3.0 (2.7)3.3 (2.7)*3.6 (3.0)* D2 (Gy)34.8 (6.8)35.1 (6.6)35.8 (6.0)*37.0 (5.9)* Rectum Dmean (Gy)5.5 (1.4)*5.3 (1.4)*5.4 (1.4)5.5 (1.4) D50 (Gy)2.7 (1.2)*2.5 (1.1)*2.7 (1.0)*2.6 (1.0)* D2 (Gy)30.1 ( 5.5)*30.5 (5.8)*31.4 (5.6)*32.5 (6.0)* 3.3 Statistical Validation and Non-Inferiority To evaluate statistical equivalence between the AIRT method and Eclipse planning, we performed a non-inferiority test on the 62 validation cases of the dataset using Eclipse AcurosXB dose engine. For this test, we used a margin of 0.01 for the homogeneity index (HI), and a margin of 1.5 Gy to all other dose metrics. Among the tested DVH metrics, the AIRT method met the non-inferiority test margins at p < 0.05. Details results for each DVH metric are provided in Supplementary Table S3. 3.4 Qualitative Results: Representative Cases Figure 2 shows the DVH curves of the AIRT method and Eclipse planning on six representative cases spanning the full range of PTV sizes of the validation dataset and evaluated using the Eclipse AcurosXB dose engine. To ensure an impartial and systematic selection process, avoiding cherry picking, we ranked all cases of the val- idation dataset by PTV volume sizes and selected the 0th (minimum), 20th, 40th, 50th (median), 80th, and 100th (maximum) percentiles. This approach provides both a representative coverage and an unbiased selection of cases. Additionaly, for full transparency, we also provide the mean DVH curves across patients, as well as the results for every individual cases, for the AIRT method and the Eclipse optimized plans in the Supplementary Material. Results using other dose engine models (DL dose, LTBS, Eclipse A) are also provided in the supplementary material. Visualization of the corresponding 3D dose distributions with the various 5 dose engines are provided in supplementary Figures S15 to S22. These results demon- strate that the AIRT method maintained PTV coverage and OAR sparing over the full range of PTV sizes, showcasing the robustness of the AIRT method to various anatomies. 6 (a) Case 1(b) Case 2 (c) Case 3(d) Case 4 (e) Case 5(f) Case 6 PTV (Eclipse)PTV (AI)Rectum (Eclipse) Rectum (AI)Bladder (Eclipse)Bladder (AI) Fig. 2: Dose–volume histograms (DVHs) for six cases corresponding to the 0th (min- imum), 20th, 40th, 50th (median), 80th, and 100th (maximum) percentiles of PTV size in the validation dataset using Eclipse AcurosXB dose engine. 7 3.5 Fluence Map Quality and Leaf Sequencability In order to maintain the leaf sequenceability of the fluence maps, an adversarial loss was used during training. Supplementary Figure S1 show some examples of fluence maps of the AIRT method, compared to the fluence maps generated by the pipeline without adversarial loss, as well as the target fluence maps from Eclipse optimization. The fluence maps generated with adversarial loss look more homogeneous and closer in distribution than the reference fluence maps. The results from the ablation study in the next section also show that without adversarial loss, the PTV homogeneity is significantly worse, indicating issues in leaf sequencability if adversarial loss is not used. Note that all the dose metrics and DVH results reported in this study are based on generated plans after leaf sequencing. 3.6 Ablation Study To isolate the contribution of the main components of the AIRT method, we conducted an ablation study. In particular, we isolated: data augmentation (Aug), dose-feedback correction (DF), and adversarial loss (GAN). Across the variants, data augmentation systematically improved the results. Dose feedback improved PTV homogeneity, but without adversarial loss, DF alone can also push the fluence maps away from the manifold of leaf-sequenced fluence maps and thus complicate the leaf-sequencability (see section 3.5). For this reason, the best results are obtained when DF and GAN are used together. Note that when DF is used, relative metrics can get worse because DF focuses on improving absolute metrics (PTV homogeneity in this study), which can be in contradiction with the relative metrics. This is, for example, the case when the AIRT plan has a PTV homogeneity better than the target. Relative metricsAbsolute metrics Fluence-domainDose-domainPTVBladderRectum MethodPSNR↑SSIM↑MAE↓PTV MAE↓HI↓CI∼mean↓D2↓mean↓D2↓ AIRT (-Aug, -DF, -GAN)29.410.9440.1911.7110.1620.7916.9435.484.6928.58 AIRT (-DF, -GAN)29.680.9490.1741.5690.1540.8016.7135.194.7728.82 AIRT (-GAN)29.670.9510.1661.2930.1480.8016.9335.984.9829.43 AIRT (-Aug)28.700.9420.1741.1670.1440.8026.7634.935.1929.39 AIRT (full)29.090.9450.1671.5750.1070.8226.8535.115.2930.51 Eclipse Planning N.A. N.A. N.A. N.A.0.1580.7646.7334.815.4630.14 Table 2: Ablation results for AIRT variants, evaluated on 62 cases of the validation set. Relative metrics are computed in comparison with target plans. Except for the fluence metrics (PSNR and SSIM), all the other metrics are dose-domain metrics, which are computed after leaf sequencing and using the DL dose engine [15]. Absolute DVH metrics are reported for PTV, bladder and rectum. “-Aug” indicates that data augmentation was removed, “-DF” that dose feedback was removed, and “-GAN” that adversarial loss was removed. “full” indicates that all the components of the AIRT method are used. Best values per metric are bold. Units: Bladder and Rectum dose metrics are in Gy; PSNR is in dB; SSIM, HI and CI are unitless. 8 3.7 Adaptability: OARs Sparing Adjustments An additional benefit of the AIRT framework is its flexibility. As decribed in the Methods section, the dose feedback mechanism allows the user to optionally apply a scalar penalties to excess dose in OARs during inference. While this functionality is not the main focus of the present work and is not yet fully explored, we include it here to demonstrate the flexibility of the AIRT framework. With appropriate training to accept scalar control inputs, the system enables per-patient dose adaptation for OAR sparing using simple user controls at inference time. For each organ at risk, the user can input a scalar value to increase the dose penalty in that organ. This enables the generation of plans with various PTV-OAR trade-offs, without the need to retrain or replan. This approach differs from the author’s previous work [16], which focused on dose prediction rather than direct generation of deliverable plans. To validate this adaptability, we generated all the plans of the validation dataset for different OAR scalar values for the bladder and rectum. Figure 3, shows the aver- aged DVHs, over the validation dataset, evaluated using the Dl dose engine, of AIRT for four different settings (no control, increased bladder sparing, increased rectum sparing, increased sparing on both bladder and rectum). Additional DVH trade-offs are provided in the Supplementary Material. Results show that increasing the sparing factor of the bladder or rectum independently decreases the dose in the corresponding organ, with very limited effect on the other organ. However, as expected, increasing the OAR sparing on either of the organs, and even more on both, decreases the PTV homogeneity. This reflects the classical trade-off between PTV homogeneity and OAR sparing in radiation therapy. 9 Fig. 3: Averaged DVHs (over the 62 cases of the validation dataset) of the AIRT method for various OAR sparing controls. The “(baseline)” planning, in the legend, means that no input OAR sparing control was used (equivalent to s r = s b = 0). s b = 2% in the legend means that the dose feedback mechanism tried to decrease the dose in the bladder by 2% voxel-wise (likewise for the rectum when s r = 2%) compared to its input dose distribution. 4 Discussion In this work, we show that an AI end-to-end pipeline can generate clinically deliver- able single-arc VMAT plans from CT and contours in under one second. By combining a differentiable dose-feedback correction with an adversarial fluence loss, our pipeline produces fluence maps that can reliably be translated into MLC sequences. Trained on > 10,000 intact-prostate plans generated in Eclipse using a RapidPlan model specif- ically developed for this indication [20–22], it achieves plan quality competitive with TPS baselines in under one second, avoiding the iterative TPS optimization loop (PO iterations). All results reported are obtained without warm-start refinements; however, the method may optionally be used as a warm-start for further manual refinement if desired. This would extend the usability and scope of our approach by enabling the clinitian to customize plans for special cases or needs beyond those addressed by our AI pipeline. Reducing planning times from minutes to under a second could transform clinical workflow. By generating standardized, clinically deliverable plans almost instanta- neously, the pipeline can increase throughput in high-volume centers and deliver standardized plans learned from a large set of cases, with consistent planning quality. This planning speed enables an interactive mode in which clinicians can review mul- tiple candidate plans in real time, discuss trade-offs, and converge on an acceptable 10 deliverable plan within a single session. Because the pipeline produces DICOM RT Plans, it can easily be integrated into clinical systems. In this work, we showed that we can achieve clinically deliverable fluence maps with a single feed-forward network, by using a differentiable dose error correction module, sufficient network capacity, and a large training set size. Unlike RL and MPC methods, which rely on multiple TPS calls and episodic rewards, our current pipeline maintains end-to-end differentiability during training, allowing dose objectives to be optimized via backpropagation through the different modules of the pipeline. Adversarial loss enforces consistency with deliverable fluence maps and allows leaf-sequencing with a single shot without TPS refinements. The ablation study demonstrated the impor- tance of the dose-feedback combined with the adversarial training in improving target homogeneity. Importantly, the dose-feedback mechanism itself is adaptable to multiple dose objectives: while we report results for a specific set of dose objectives chosen as proof- of-concept (e.g., reducing PTV inhomogeneity and possibly adding scalar penalty to OAR sparing), the same framework could be adapted to other differentiable dose objectives. This functionality presents an alternative to traditional Multi-Criteria Optimization (MCO), allowing clinitians to explore a range of trade-offs, adjusting objectives and viewing the results almost instantaneously, but with the difference that no replaning optimization is needed. Generalization is key to translating this method into clinical practice. While large- scale training and data augmentation improve generalization, as validated in our study, it is essential to validate the performance of our method across various clinical institutions and anatomies. In this context, the American Oncology Institute (AOI) in India has independently tested our AI-generated plans, achieved target coverage comparable to manually crafted plans and successfully passed patient-specific quality assurance testing for deliverability. These clinical results, currently unpublished, will be submitted as a conference abstract and a journal publication. This study has some limitations. It is focused on single-arc VMAT intact prostate planning, and would require adaptations to address different body regions (Lung, Breast, Head & Neck) or radiation therapy techniques such as multi-arc VMAT or simultaneous integrated boost (SIB). The pipeline’s architecture could be extended to multiple arcs or other more complex protocols, by adapting the network input- s/outputs, generating corresponding large scale datasets, and possibly using multiple dose-feedback loops to address the increased underdeterminacy of such problems. The current model was trained and validated for the Varian Millennium 120 (M120) MLC. Retraining may be necessary to adapt to other MLC models, especially if the leaf width differs, as the resulting fluence maps would have a different distribution. A flexible fundation model adressing multiple configurations could also be developped. Our dose engine, while efficient and differentiable, is an approximation of clini- cal dose calculation. It relies on a simplified source model, with a spatially constant photon spectrum and without electron contamination effects. In contrast, AcurosXB is using a spatially varying spectrum and models secondary electrons. The observed differences between the DL dose engine and AcurosXB, particularly with respect to the homogeneity index, suggests that a closer alignment of source and beam models 11 between the two could further improve the overall quality of the AI-generated plans. Interestingly, the AI generated plans showed less variation in homogeneity index across the different dose engines compared to Eclipse plans, suggesting that the AI method may be more robust to changes in dose modeling and potentially to the true clini- cal dose. Future improvements will focus on more realistic source modeling to better match clinical solvers such as AcurosXB. Also, in order to facilitate rapid inference and manageable memory use, the deep- learning dose engine used in this study operates on a 4 m grid resolution, which is slightly coarser than typical TPS dose engines. Progress in differentiable dose engine efficiency could enable higher dose resolution in our end-to-end pipeline, and poten- tially improve PTV-OAR sparing or in general, be able to address more targeted dose objectives (e.g. SIB). Future work includes generalizing our end-to-end pipeline to multiple arcs VMAT, addressing other body regions (Lung, Breast, Head & Neck), as well as more complex and adaptable dose objectives. Overall, by eliminating TPS in-the-loop optimiza- tion while preserving deliverability, this pipeline points toward real-time radiotherapy planning that can complement current clinical workflows with faster iteration and consistent plan quality. 5 Methods 5.1 Dataset and Augmentation The dataset consists of CT images, primarily from the REQUITE prostate dataset (1001 scans) and a few smaller in-house CT datasets, resulting in a total of 1,277 CT patient scans after curation. The first step of the data creation process is the automatic segmentation of organs and helper structures, which are used by Rapid- Plan. AI-Rad Companion Organs RT [23] was used as the auto-contouring solution. Helper structures were generated via an in-house data curation pipeline implemented in C ++ [17]. The full list of contours and helper structures is depicted in the supple- mentary Table S1. Then for each CT and its associated contour set (RT STRUCT), we created an RT plan in Eclipse with a publicly available RapidPlan model for prostate SBRT [20, 21], adapted for intact prostate (with PTV, CTV, and PTV ∩ Rectum all set to 40Gy) and delivered as a single arc VMAT with a Varian Millennium 120 (M120) multileaf collimator (MLC). Out of the 1,277 patients, 1,122 were randomly selected for training, 62 for validation, and 60 for testing. In order to increase network performance and generalization, we developed and applied a data-augmentation strategy to expand the training dataset from 1,122 train- ing plans to 12,302. The augmentation strategy includes the following transforms applied on the CT and contours: random image scaling, slight rotation, non-rigid deformation, and margin adjustments to PTV and rectum contours within clinically plausible ranges. All the augmented CT and their corresponding contours were then used to create new plans in Eclipse, following the same RapiPlan model. 12 5.2 Pipeline Architecture The modular structure of the end-to-end pipeline is depicted in Figure 1. It takes the CT volume and contours (Body, PTV, OARs) as input and produces a leaf-sequences plan as output. This feed-forward design allows ultra-fast (<1 s) generation of VMAT plan. In the following, we detail each module of the pipeline. Dose proposer The first module of the pipeline is the doseProposer, which takes the CT volume and contours (Body, PTV, OARs) as input and predicts a 3D dose distribution [14]. The architecture of this network is a 3D ResUNet, i.e. a U-Net encoder-decoder architecture with ResNet-style residual blocks and skip connections. BEV projection (parameter-free) The 3D dose volume from the doseProposer is projected into the beam’s eye view (BEV) of every control point (CP) of the VMAT plan, in order to place the dose in the same geometry as the fluence maps [13]. This alignment simplifies the task of the Bev2Fluence network by creating a spatial correspondence between its input (dose BEV) and its output (fluence maps). Bev2Fluence The Bev2Fluence network predicts the 180 fluence maps for the VMAT plan from its stack of 180 input beam’s eye view dose projections [13]. These fluence maps are predicted jointly due to the strong coupling between control points. This coupling is due to the VMAT delivery constraints (e.g. maximum leaf speed), but also because the resulting dose is the accumulation of every control point contribution. The architecture of the Bev2Fluence network is a 3D convolutional network based on the MedNeXT backbone [24]. Physics-informed dose computation In order to perform inference-time dose correction, a differentiable dose engine, which computes the dose from the input fluence maps, is used in the pipeline. This dose engine is a deep-learning network using a physics-informed approach [15]. Note that this dose engine was trained on an LTBE solver leveraging transport physics equations similar to AcurosXB, but employs a simplified beam model. Its architecture is based on a two-step design, where the fluence maps are first accumulated into 3d volume using spherical harmonics, together with the CT, and then an image-to-image network based on the MedNeXT backbone [24] is used to predict the final dose. Its inference time scales sub-linearly with the number of control points. For stability purposes, the network was trained [15] and is kept frozen. We are using the 4 m dose resolution version of this network in order to keep the inference time below one second. Dose error module (parameter-free): Err This module generates a 3D dose error map by comparing the predicted dose and the desired dose distribution. We focus primarily on the PTV homogeneity by using an 13 asymmetrical metric that penalizes the hot spots more than the cold spots. Optionally, this module can take scalar dose inputs, to enable the user to penalize the excess dose in each OAR. Each scalar input acts as a weight that scales how much dose need to be removed from a particular OAR in the baseline plan (i.e. the plan generated with a zero or no OAR scalar dose input). By adjusting these values the clinician can interactively explore different trade-offs between target homogeneity and OAR sparing. When this scalar is used, a 3D dose error component proportional to that weight is added to the 3D dose error map, which is then inputed to the Bev2Fluence correction module which role is to adjust the fluence maps to reduce that error. The dose error framework is general and can, in principle, accommodate any differentiable dose metric (see details in Section 5.3 and Supplementary Note S1). This scalar dose input mechanism is one possible implementation. The resulting dose error map is projected into the BEV before being provided to the Bev2Fluence correction network. Bev2Fluence correction A second Bev2Fluence network refines fluence maps using as inputs: the initial fluence maps prediction and the BEV-projected dose error. This network, which is based on a similar architecture as the first Bev2Fluence but adapted to take additional inputs, corrects the input fluence maps using the BEV dose error as guidance. This module closes the dose feedback loop of the pipeline. Leaf sequencing (rule-based) VMAT fluence exhibits a near two-level structure (open aperture vs. near-zero out- side). Fluence maps are finally converted into leaf positions and monitor units with a rule-based leaf sequencer which models partial-pixel effects at moving leaf boundaries, and dosimetric leaf gap. The code is parallelized and implemented in Cython/C for speed. Further implementation details are provided in Section 5.6. RT Plan export Finally, the leaf sequences and MU values can be exported into a DICOM RT Plan in order to be imported into a TPS such as Eclipse for review or direct delivery. Alternatively, to streamline the workflow, the pipeline could be integrated into a TPS. 5.3 Dose Feedback Mechanism One of the main contributions of the AIRT method is a fully differentiable dose feed- back mechanism embedded in the architecture of our network. It allows the network to refine the fluence maps based on dose discrepancies, to improve target homogene- ity and OAR sparing. Indeed, the end-to-end pipeline begins with the DoseProposer network, which predicts a clinically plausible 3D dose distribution. From this dose, a first Bev2Fluence network infers 180 fluence maps. Coordinating these fluence maps so that their combined effect yields the desired dose distribution is a highly challenging task. To tackle this challenge, the dose feedback mechanism operates as follows: 14 • The dose distribution delivered by the initial predicted fluence maps is computed with the differentiable dose engine. • A 3D dose error map is computed by the Err module based on the dose discrep- ancy with some dose metrics. This dose error map creation is very general and can, in principle, be instantiated by the gradient with respect to the dose of any differ- entiable dose planning objective. In this study, we implement it as a penalization of non-homogeneity within the PTV and, optionally, user-steered scalar penalties for excess dose in OARs. Futher details on this penalty impementation are provided in the Supplementary Material. • This dose error map is projected into beam’s-eye view and combined with the initial fluence maps to feed the inputs of a second Bev2Fluence network, which is trained to produce a new set of fluence maps that reduce that dose error. In order to train the Bev2Fluence Correction module to reduce that dose error, the same error metric (produced by the Err module) is recomputed at training time, on the updated fluence maps, and a loss term penalizing the L1 loss of that final dose error map is added to the training optimizer. This dose feedback correction could be applied multiple times, like an unrolled iter- ative gradient correction [25]. However, in order to maintain computational efficiency, and as the results were already sufficient with a single pass, we used only one iteration of the dose feedback. Without dose feedback, achieving target homogeneity and clin- ically acceptable dose distributions with a single prediction step would be extremely difficult, due to the complex coordination required among the large number of fluence maps. 5.4 Adversarial Fluence Shaping When optimizing with a dose-based loss, the predicted fluence maps can deviate from the manifold of deliverable VMAT fluence maps, which have a characteristic two- level pattern that facilitates feasible leaf sequencing. In order to keep the fluence maps deliverable, we introduce an adversarial loss with a discriminator network that is trained to distinguish between fluence maps generated by the pipeline and target fluence maps generated from Eclipse-optimized plans. Additional technical details are provided in Supplementary Note S2. 5.5 Training Strategy We train the AIRT network in two stages: (i) The first stage aims to train the pipeline to predict fluence maps under a stable loss function, and (i) the second stage aims to enforce deliverable fluence maps and dose-feedback behavior under optional user- specified controls (Fig. 4). Stage 1 (non-adversarial pretraining). We first train the full pipeline with reconstruction losses and without the adversarial loss. The goal is to establish a stable training without the complications introduced by the deliverability constraints and the adversarial loss. As depicted in Figure 4, the reconstruction losses include: 15 • the “Dose Proposer loss” and the “Dose loss stage 1” which are L1 losses between predicted doses at different stages of the pipeline and the target dose, • the “Fluence maps loss stage 1” and “Fluence maps loss”, which are L1 losses between predicted fluence maps at different stages of the pipeline and the target fluence maps. • the “Dose error loss stage 1” and the “Dose error loss”, which are L1 losses on the dose error maps at different stages of the pipeline. Stage 2 (fluence correction with adversarial regularization and controls). In order to facilitate leaf-sequencability, an adversarial loss is introduced in the second stage of training. Its goal is to force the fluence maps to stay within the manifold of realistic VMAT fluence maps. During that training stage, optional user-control parameters are randomly generated in a range consistent with their expected usage and passed to the two Err modules (the one in the network pipeline and the one outside of the pipeline which is used for the “Dose error loss” computation). This ensures consistency between the dose error correction module and the dose loss function. For stability and efficiency reasons, the part of the network before the Bev2Fluence Correction module is frozen during that training stage. See supplementary note S2 for additional details. Dose error loss CT RTstruct Fluence maps loss Dose Proposer loss Fluence maps loss stage 1Dose loss stage 1 Dose error loss stage 1 ERR Predicted fluence maps 3D dose error map E2E Network BEV Proj BEV Proj DoseProposer Bev2Fluence Network DL Dose computa�on Bev2Fluence Correc�on Network DL Dose computa�on ERR Stage 1 DL Dose computa�on targetfluence maps fake real Hinge loss Dose error loss CT RTstruct Fluence maps loss ERR Predicted fluence maps 3D dose error map E2E Network OARs sparing user control BEV Proj BEV Proj DoseProposer Bev2Fluence Network Bev2Fluence Correc�on Network DL Dose computa�on ERR Discriminator Legend: Trainable module — Neural network trained within this pipeline Frozen module — Pretrained and not updated during training Non-trainable block — Fixed algorithmic opera�on Evalua�on block — Module used only for loss computa�on or metrics Frozen indicator — Marks modules reused without retraining Discriminator — Trainable network outside the main pipeline Stage 2 Fig. 4: Two-stage training of the AIRT end-to-end VMAT planning pipeline. Top: Stage 1 Full pipeline training without adversarial loss. Bottom: Stage 2: Fluence cor- rection network training with adversarial loss. 16 5.6 Leaf Sequencing In order to translate the fluence maps of the AIRT pipeline into an RT Plan, we devel- oped a rule-based leaf-sequencing algorithm adapted for VMAT plans. The algorithm starts by converting the fluence maps of each control point (CP) into a discrete aper- ture, defined by the positions of the left and right leaves of each row of the multi-leaf collimator (MLC). For each row of the MLC, the algorithm identifies the largest con- tiguous region of significant fluence in order to predict an initial aperture shape. The Monitor Unit (MU), i.e., the radiation intensity of this control point, is computed by averaging the fluence pixels in that initial aperture shape. Some post-processing steps refine the positions of the leaves to: 1) account for the fact that leaf positions can be located in sub-pixel locations, leading to partially exposed pixels, and 2) account for the dosimetric leaf gap. To efficiently process the 180 fluence maps of the VMAT plans, the leaf-sequencing algorithm is optimized for speed, runs in parallel for each control point, and is implemented in C. Supplementary information. Supplementary material is available for this paper. Disclaimer. The concepts and information presented in this paper/presentation are based on research results that are not commercially available. Future commercial availability cannot be guaranteed. Acknowledgements. We thank James Robar, Medical Physicist and Professor of Radiation Oncology at Nova Scotia Health Authority, for his valuable clinical insight and feedback. We also thank Laura Balascuta, Liang Gao and Ioan-Marius Popdan for their help with data curation, ESAPI scripting and dose computation in Eclipse. We thank all the contributors to the REQUITE project, including the patients, clinicians and nurses. The core REQUITE consortium consists of David Azria, Erik Briers, Jenny Chang-Claude, Alison M. Dunning, Rebecca M. Elliott, Corinne Faivre- Finn, Sara Guti ́errez-Enriquez, Kerstie Johnson, Zoe Lingard, Tiziana Rancati, Tim Rattay, Barry S. Rosenstein, Dirk De Ruysscher, Petra Seibold, Elena Sperk, R. Paul Symonds, Hilary Stobart, Christopher Talbot, Ana Vega, Liv Veldeman, Tim Ward, Adam Webb and Catharine M.L. West. 17 Supplementary Note S1: Architecture and Implementation Details Structures and derivations The following table S1 lists the structures, i.e., segmented organs and additional “helper” structures derived from a mathematical formula. These helper structures are used by RapidPlan / Eclipse optimization. The structures used by our AI pipelines are: PTV, bladder, rectum, and body. Table S1: Target and organ-at-risk (OAR) structures used in our Prostate RapidPlan. NameTypeDefinition / Formula CTVTargetProstate∪ (Prostate + 3 m)∩ Seminal Vesicles PTVTargetCTV + 3 m PTV ∩ RectumDerivedPTV∩ Rectum PTV RingDerived(PTV + 20 m)− (PTV + 5 m) PTV–CTVDerivedPTV− CTV 50% RingDerived(PTV + 50 m)− (PTV + 15 m) Body RingDerivedBody− (Body− 35 m) RectumOAR BowelSmallOAR BowelLargeOAR BladderOAR FemurHeadLOAR FemurHeadROAR BodyReference Data Augmentation We developed a Python script that applies different randomized transforms to the CT and contours in order to produce new pairs of CT and RT STRUCT files used to create additional plans in Eclipse. The different transformations were: uniform scaling of the image size (in the range[−20%, +20%]), rotations in different axes (in the range [−5 ◦ , +5 ◦ ]), non-rigid deformation using the Continuous Piecewise-Affine based (CPAB) transformation [26, 27], and random margins adjustments for the PTV and rectum contours in the range [3 m, 6 m] and [0 m, 5 m], respectively. Each of those transforms was applied with a probability of 0.5, and 10 rounds of augmentation were performed, resulting in 11,180 additional plans to the original 1,277 plans, for a total of 12,457 plans. Auto-Contouring (pre-processing) Our end-to-end pipeline takes the CT and contours as input. As the contours are derived from the CT. The only pre-processing step to use our pipeline is to segment the 18 organs at risk (OARs), the body, and the planning target volume (PTV). Additional contours and helper structures are used by RapiPlan [20, 22] for the planning of our targets, but are not used during inference. We used AI-Rad Companion Organs RT [28] as our automatic segmentation tool. PyESAPI [29] and Python were used for scripting and interaction with the Eclipse treatment planning system (TPS). A deeper description of our methodology is available in our previous publication [17]. In the following, we describe the modules of our end-to-end pipeline and use the same module names as in Fig. 4 of the main manuscript. We are using the following notation to describe each module as a function: f ModuleName (·). Dose proposer Network The first module of our end-to-end network is the dose proposer [14], a CNN with a 3D Res-UNet backbone. It processes four input channels: the CT volume and RT structures (Body, PTV, OARs). The two OARs contours, i.e. bladder and rectum, are merged into one channel using integer labels (1 for bladder, 2 for rectum, 3 for their overlap). CT Hounsfield Units values are clipped in the range [–900, +900] and rescaled to the range [0, 1]. We center the target dose around the isocenter and crop it along the inferior-superior direction to an extent of 64m. The dose proposer module can be interpreted as a function that predicts a dose distribution D target conditionally on its input CT and structures S. D target = f DoseProposer (CT,S).(1) The architecture of the network is depicted in Figure S1. The network contains 17 millions parameters. The network begins with a 3D convolutional (conv3D) stem (kernel size 3x3x3, stride 1, padding 1), followed by four residual blocks. Each resid- ual block contains a conv3D (kernel size 3x3x3, stride 1, padding 1), followed by an instance normalization and a ReLU activation, another conv3D and another instance normalization. In parallel, the shortcut path employs a 1x1x1 convolution and an instance normalization. After that a 3D max pooling layer (kernel size 3, stride (2,2,2), padding 1) performs the downsampling step. The number of channels increases from 4 at the input to 16 after the stem convolution, and then to 64, 128, and 256 at the first conv3Dlayer of the next three residual blocks. The final residual block maintains 256 channels. The decoder is using conv blocks containing a conv3D (kernel size 3, stride 1, padding 1), followed by an instance normalization and a ReLU activation. The updampling step is implemented with trilinear interpolations. The end of the network contains a 1x1x1 convolution layer to project the result to a single channel, followed by a Softplus activation function. Beam’s-Eye-View projections: BEV Proj The beam’s-eye-view (BEV) projection module, , depicted in Figure S2, projects a 3D dose into the BEV corresponding to each control point of the VMAT plan, i.e., in the same reference geometry as the fluence maps. This geometric alignment between the inputs and outputs of the Bev2Fluence helps the network to perform its task efficiently. 19 conv3D (4 � 16) instance norm3D ReLU conv3D instance norm3D ReLU conv3D instance norm3D conv 1x1x1 instance norm3D ReLU ResBlock maxPool3D ResBlock maxPool3D ResBlock maxPool3D ResBlock maxPool3D conv3D instance norm3D ReLU . interpolate conv block conv blockconv block interpolate conv blockconv blockconv block . interpolate conv block conv blockconv block . interpolate conv blockconv block . So � Plus conv 1x1x1 predicted 3D dose map 32×128×128 CT Contours (Body, PTV, OARs) 32×128×128 3x32×128×128 Fig. S1: Architecture of the DoseProposer network. The implementation of that projection was detailed in our previous article [13] on the Bev2Fluence network. We re-implemented this module, which has no trainable parameters, in PyTorch to have a differentiable module running efficiently on GPU. Our BEV has a 5m isotropic resolution and a 20 cm x 20 cm field-of-view. We compute the BEV projection Π (cp) BEVProj (·) on each control point cp ∈ 1,...,N cp and stack the result to create 3D volume: B target = h Π (cp) BEVProj ( D target ) i N cp cp=1 .(2) 3D dose in BEV space BEV transform 3D dose map 32×128×128 40×40×180 Fig. S2: Beam’s eye view (BEV) transform. Bev2Fluence Network The Bev2Fluence network [13], depicted in Figure S3, is a 40.8 M parameters 3D CNN, with a MedNeXT backbone architecture [24], that predicts the 180 VMAT fluence maps from the 180 BEV input projections. The 3D MedNeXt backbone [24] is an encoder-decoder structure with four reso- lution levels. Each level contains MedNeXt blocks, followed by downsampling layers 20 implemented via strided depthwise convolutions. The decoder mirrors the encoder structure with transposed convolutions and skip connections. Each MedNeXt block contains internal channel expansion including in the bottleneck. Each convolution block uses group normalization and GELU activation functions. The Bev2Fluence network [13], based on the MedNeXt backbone, first project the input BEV dose volume to 64 channels via a 1x1x1 stem convolution. The number of channels then increases from 64, to 128, 256, and 512 at each downsampling step. The bottleneck operates at 1024 channels with internal expansion to 4096 channels. The decoder mirrors the encoder structure and a final 1x1x1 transposed convolution project the 64 channel volume into a single channel output, followed by an abs activation layer to prevent negative coefficients from being inputed into the DL dose module. We refer the reader to our previous work [13] and the original MedNeXT [24] article, which contain additional details about this network architecture. The functional description of the Bev2Fluence network is as follows: F (0) = f Bev2Fluence ( B target ) ,(3) where F (0) = h F (0) cp i N cp cp=1 is the stack of predicted fluence maps. conv 1x1x1 downBlock convNeXtBlocks convNeXtBlocks downBlock convNeXtBlocks downBlock convNeXtBlocks downBlock convNeXtBk upBlock convNeXtBlocks upBlock convNeXtBlocks upBlock convNeXTBlocks upBlock convNeXtBlocks conv transpose 1x1x1 depthwise conv 3x3x3 group norm conv 1x1x1 (expansion) GELU conv 1x1x1 (compression) depthwise conv 3x3x3 group norm conv 1x1x1 (exp.) GELU conv 1x1x1 (comp.) conv transpose 3x3x3 group norm conv 1x1x1 (expansion) GELU conv 1x1x1 (compression) pad pad gantry dim to 192 crop gantry dim to 180 depthwise conv 3x3x3 group norm conv 1x1x1 (expansion) GELU conv 1x1x1 (compression) Fluence maps at 180 control points 40×40×180 Dose BEV at 180 control points 40×40×180 Fig. S3: Architecture of the Bev2Fluence network. Dose Computation Network The dose computation network uses a physics-informed approach [15] that predicts the 3D dose volume from a stack of input fluence maps and their physical descriptions (gantry angles, collimator angle, field-of-view, etc.). 21 We use this module as a differentiable evaluator to obtain the dose delivered by the predicted fluence maps: D (0) = f DLDoseComputation CT,F (0) .(4) It adopts a two-stage design, depicted in Figure S4, where first un-collided 3D fluence is computed and accumulated as spherical harmonics coefficients per voxel, and then in stage 2, these coefficients together with the CT are processed throught a MedNeXt image-to-image network to compute the final 3D dose map. This network contains 2.8 M parameters and begins with a 1x1x1 stem convolution layer that project the 26-channel input volume into a 16 channel feature map. The number of feature channels increases at each downsampling operation from 16 to 32, 64, 128, and 256. The bottleneck operates at 256 channels with internal expansion to 1024 channels. The decoder mirros the encoder structure and ends with a 1x1x1 con- volution projecting the 16-channel feature maps to a single-channel output, followed by a leaky-ReLU activation (negative slope 1e-2). Dose Error Correction: (Err) The purpose of the dose error correction module (Err), which has no trainable param- eters, is to generate a dose error signal, based on the input dose map and some dose metrics (e.g. target inhomogeneity). This dose error signal is first used in the pipeline to correct the initial set of fluence maps (at both training and inference), as well as in the loss to reduce that error based on the training population data (at training only). Without it, predicting 180 VMAT fluence maps in a single shot is extremely challenging. In the following, we denote the voxel-wise error map as E (0) . Error map as the gradient of a dose objective. We define the error map in general terms as the gradient with respect to the 3D dose map of a differentiable objective function J (D): E (0) = f Err D (0) ,· ≈ ∇ D J D (0) ,(5) where D (0) ∈ R |Ω| is the dose computed from the initial fluence maps, and |Ω| is the number of voxels in the dose grid. In its generic form, the dose objective J (D) can be written as a sum over the structures s (PTV, bladder, rectum, etc), and for each structure, a sum over structure- specific objectives i as: J (D) = X s∈S X i∈Ω M s (i)φ s D i − R s,i ,(6) where M s (i)≥ 0 is a mask (over a grid) or weight (scalar) for structure s. R s ∈ R |Ω| is a clinical goal or reference, and φ s (·) is a function that encodes the penalty (e.g. L2 norm, ReLU). 22 Compute un-collided 3D fluence CT 128×128×128 spherical harmonics coefficients (up to degree l=4) 25x128×128×128 Fluence maps at 180 control points 40×40×180 a Stage 1: Uncollided fluence computation conv 1x1x1 downBlock convNeXtBlocks convNeXtBlocks downBlock convNeXtBlocks downBlock convNeXtBlocks downBlock convNeXtBk upBlock convNeXtBlocks upBlock convNeXtBlocks upBlock convNeXTBlocks upBlock convNeXtBlocks conv transpose 1x1x1 depthwise conv 3x3x3 group norm conv 1x1x1 (expansion) GELU conv 1x1x1 (compression) depthwise conv 3x3x3 group norm conv 1x1x1 (exp.) GELU conv 1x1x1 (comp.) conv transpose 3x3x3 group norm conv 1x1x1 (expansion) GELU conv 1x1x1 (compression) pad depthwise conv 3x3x3 group norm conv 1x1x1 (expansion) GELU conv 1x1x1 (compression) CT 128×128×128 128×128×128 spherical harmonics coefficients 25x128×128×128 computed 3D dose map Leaky-ReLU b Stage 2: Final dose computation Fig. S4: Two-stage deep learning dose computation. a, The first stage predicts the voxel-wise spherical harmonics coefficients from the planning CT and fluence maps. b, The second computes the 3D dose map from the CT and spherical harmonics coefficients using a MedNeXT network architecture. PTV term: asymmetric hot-spot suppression. In our study, we designed the PTV term in an asymmetric manner, in order to penalize hot spots (overdose) more strongly than cold spots (underdose). Our PTV homogeneity term objective can be written as: J PTV (D) = X i∈Ω M PTV (i) λ + 2 ReLU(x i ) 2 + λ − 2 ReLU(−x i ) 2 ,(7) where M PTV denotes the PTV mask, D Rx the prescribed dose (scalar), and x i = D i − D Rx,i . 23 Deriving the (voxel-wise) gradient of Eq (7) with respect to the input dose D leads to the definition of the corrected signal we used in our study for the PTV: h E (0) PTV i i = [ ∇ D J PTV (D) ] i D=D (0) = M PTV (i) λ + ReLU D (0) i − D Rx,i − λ − ReLU D Rx,i − D (0) i . (8) In order to penalize the hot spots more than the cold spots, we used λ + > λ − > 0. Optional OAR term: user-controlled dose suppression. Rather than using a complex set of DVH-based clinical goals, we opted for providing the user with a simple way to control OAR sparing. We define a single scalar suppres- sion factor s o which can steer the baseline plan, i.e., the plan predicted by our pipeline when no OAR sparing is used. For a given OAR o and its suppression factor s o , we define the reference field as: R o,i = (1− s o )D (0) i ,(9) where D (0) is the dose predicted at stage 0. We then use the squared-hinge objective function: J o (D) = 1 2 X i∈Ω M o (i) max D i − R o,i , 0 2 .(10) which gradient, evaluated at D (0) produces the OAR dose correction term used in our implementation: h E (0) OAR,o i i = [ ∇ D J o (D) ] i D=D (0) = M o (i) max D (0) i − R o,i , 0 .(11) We can note that when s o = 0, the OAR penalty is disabled (R o = D (0) ), while increasing s o > 0 decreases the reference R o and increases the dose penalty for that OAR. Total error map and BEV projection. The full error map is obtained by combining all the error terms (PTV and optional OARs): E (0) = E (0) PTV + X o∈O E (0) OAR,o ,(12) and is then projected into the BEV of each control point: B (0) err = h Π (cp) BEVProj E (0) i N cp cp=1 .(13) 24 This projected tensor is then used as the input of the Bev2Fluence correction net- work, which completes the dose feedback loop. The Bev2Fluence correction network, depicted in Figure S5, has the same architecture as the Bev2Fluence network, except that it accepts two input volumes instead of one. conv 1x1x1 downBlock convNeXtBlocks convNeXtBlocks downBlock convNeXtBlocks downBlock convNeXtBlocks downBlock convNeXtBk upBlock convNeXtBlocks upBlock convNeXtBlocks upBlock convNeXTBlocks upBlock convNeXtBlocks conv transpose 1x1x1 depthwise conv 3x3x3 group norm conv 1x1x1 (expansion) GELU conv 1x1x1 (compression) depthwise conv 3x3x3 group norm conv 1x1x1 (exp.) GELU conv 1x1x1 (comp.) conv transpose 3x3x3 group norm conv 1x1x1 (expansion) GELU conv 1x1x1 (compression) pad pad gantry dim to 192 crop gantry dim to 180 depthwise conv 3x3x3 group norm conv 1x1x1 (expansion) GELU conv 1x1x1 (compression) Corrected fluence maps at 180 control points 40×40×180 Dose BEV error at 180 control points Fluence maps at 180 control points 40×40×180 40×40×180 Fig. S5: Architecture of Bev2Fluence correction network. Fluence Correction Network The fluence correction network adopts the same architecture as the Bev2Fluence net- work, except that its number of input channels is increased to receive the initial fluence maps in addition to the BEV dose error projection. It outputs the corrected fluence maps F (1) : F (1) = f Bev2FluenceCorrection F (0) ,B (0) err .(14) This module could be used for multiple iterations k: F (k+1) = f Bev2FluenceCorrection F (k) ,B (k) err , with either shared or separated weights (unrolling) in order to improve further the dose performance of the pipeline at the expense of added computation. In our study, we used only one correction loop in order to preserve computational efficiency and minimize memory usage. 25 Adversarial Fluence Shaping: Implementation Details Our adversarial training approach, applied to our fluence maps, consists of a discrimi- nator trained to classify whether a fluence map has been generated by our end-to-end pipeline or from a VMAT plan optimized in Eclipse. Our discriminator implementa- tion is similar to the one used in StyleGAN2 [30, 31]. The architecture of the network is depicted in Figure S6. It consists of a sequence of downsampling blocks that pro- gressively downsample the input image while increasing the feature dimension. The network has 22.7 M parameters and receives a single 64 x 64 2D fluence map as input which was zero-padded from its original 40 x 40 size. The 180 fluence maps are pro- cessed in parallel by staking them on the batch dimension. The first block project the 2D input to 64 feature channels using a 1 x 1 convolution with stride 2, followed by two 3 x 3 convolutions with leaky-ReLU activations (negative slope 0.2) as well as a down- sampling operation performed via a combination of a low-pass filtering (Blur) and a strided 3 x 3 convolution. The downsampling block ends with a residual connection. This downsampling block is repeated five times with feature resolutions decreasing from 64 to 2 by factors of 2 while the number of feature channels are increasing from 64 to 512 by a factor of 2 (the final convolution block maintains the 512 channels). After that, the network ends with a final 3 x 3 convolution, a flattening layer reshap- ing the features to a single vector of size 2048, and a linear (fully-connected) layer to produce the scalar logit output. conv2D Leaky-ReLU conv2D Leaky-ReLU downBlock 1x 64x64 → 64x32x32 conv2D Blur conv2D conv2D Leaky-ReLU conv2D Leaky-ReLU conv2D botleneckBlock conv2D Fla � en 512x2x2 → 2048 Linear 2048 → 1 downBlock 64x32x32 → 128x16x16 downBlock 128x16x16 → 256x8x8 downBlock 256x8x8 → 512x4x4 downBlock 5 12x4x4 → 512x2x2 logit Fluence map (zero-padded) 64×64 Fig. S6: Architecture of fluence map discriminator network. The discriminator was trained using a Hinge loss with a R1 gradient penalty on real (target fluence maps) samples in order to encourage local smoothness of the discrimina- tor function. I other words, if D(·) denote the discriminator output (without sigmoid), and G(·) the generator (the end-to-end pipeline in our case), the discriminator can be 26 written as: L total D =L D + γ 2 E x∼p data h ∥∇ x D(x)∥ 2 2 i ,(15) where the second term is the R1 penalty, and the first term L D is the adversarial Hinge loss defined by: L D = E x∼p data [ max(0, 1− D(x)) ] + E z∼p(z) [ max(0, 1 + D(G(z))) ] .(16) The generator loss, i.e., the loss term used to regularize the fluence maps gener- ated by our end-to-end pipeline, was a generator Hinge loss, which is the negative of the average discriminator score of “fake” samples (i.e. generated by our end-to-end pipeline). L G =−E z∼p(z) [ D(G(z)) ] .(17) The effects of our adversarial training on the resulting fluence maps produced by our pipeline are illustrated in Fig. S7. Supplementary Note S2: Training Strategy and Optimization Details Stage 1: Full pipeline training without adversarial loss We first train the pipeline without the adversarial loss, as illustrated in the top panel of Fig. 4 of the main manuscript. In both training stages, we use the ADAM optimizer from PyTorch with a fixed learning rate of 1e-4 and a batch size of 1. We have the following losses: • Dose Proposer loss: To train the Dose Proposer network to generate plausible 3D dose distributions, we minimized the L1 norm loss between the network’s predicted dose and the target dose, where the target was computed using our deep-learning dose calculation applied to the Eclipse-optimized plans of our training dataset. • Fluence maps loss stage 1 : To train the Bev2Fluence network to generate plausible fluence maps, we minimized the L1 norm loss between this network’s predicted fluence maps and the target fluence maps, i.e. the fluence maps of the Eclipse- optimized plans of our training dataset. • Dose loss stage 1 : The fluence maps predicted by the Bev2Fluence module are con- verted to a 3D dose map using our DL dose computation and a L1 loss is computed between this predicted dose map and the target dose map. • Dose error loss stage 1 : The Dose error loss stage 1 is a light regularizer (we used it with a very small weight) that encourages PTV homogeneity. Notably, this loss is only meaningful when the Err metric does not include variable OAR sparing controls, since at this stage, the network cannot yet respond to such commands: it needs first to predict a baseline for the stage 1. The “customization” of that baseline happens in stage 2 of training, using the Bev2Fluence correction network. • Fluence map loss: To train the Bev2Fluence Correction Network which predicts the final fluence maps, we still apply a L1 loss between these predicted fluence maps and the target fluence maps, similarly as the Fluence maps loss stage 1. 27 a AIRT (GAN) b AIRT (full) c Eclipse Planning #27 #68 #118 #142 Fig. S7: Adversarial fluence shaping results. (Top) Mosaic view comparing flu- ence maps generated by (a) AIRT (-GAN, i.e. without adversarial loss), (b) AIRT (full, i.e. with adversarial loss), and (c) Eclipse clinical plans. (Bottom) Zoom-in of rep- resentative fluence maps highlighing the improved realism achieved with adversarial training. Both views illustrate the discriminator’s impact in enforcing clinically plau- sible, approximate two-level structure in generated fluence maps. • Dose error loss: Finally, to train the Bev2Fluence Correction Network to effectively utilize its input 3D dose error map as a correction signal, we recompute a new 3D dose error map using the DL dose and Err module, and apply a loss function that minimizes the L1 norm of that final error. We used the following weights for each loss of stage 1 training: dose proposer loss: 0.3, fluence maps loss stage 1 : 0.1, dose loss stage 1 : 0.1, dose error loss stage 1 : 0.0003, fluence maps loss: 0.4, dose error loss: 0.001 28 Stage 2: Fluence correction network training with adversarial loss The second stage of training has two main objectives: (1) Adding the adversarial loss to enforce the fluence maps predicted by the Bev2Fluence correction module to be close to the manifold of VMAT optimized fluence maps, and thereby to facilitate leaf- sequencability. (2) Training the Bev2Fluence correction module to adjust its behavior to variable user-specified OAR sparing controls, in addition to maintaining PTV homo- geneity. To implement these two objectives, we (1) trained a discriminator with the loss function Eq. (15) and added the term Eq. (17) in the generator (the pipeline) loss, as detailed in the subsection on Adversarial Fluence shaping in section 1, and (2) randomized the parameters of the Err module for each training batch to cover the variability of dose objectives (OAR sparing factors) use cases. During this second stage of training, we froze all the network components before the Bev2Fluence Correction module, as these components, which were trained in stage 1 are expected to predict an initial baseline fluence maps, not influenced by the variabil- ity of dose objectives. Unfreezing them would create contradictory backpropagation gradients, making the training unstable. Another advantage of freezing these compo- nents is that training is significantly faster, as it eliminates the need to backpropagate to all these modules, particularly the DL dose computation module embedded in the pipeline, which is the most computationally intensive module. We used the following weights for each loss of stage 2 training: For the generator (pipeline) part: fluence maps loss: 0.2 , generator hinge loss: 0.04, dose error loss: 0.001, and for the discriminator part, we used the Hinge loss function of Eq. (15), with γ = 0.0025. Supplementary Note S3: Dosimetric Performance In Table S2 are depicted the dosimetric performance of the AIRT method and the RapidPlan Eclipse evaluated with four different dose engines: DL dose, LTBS, Eclipse A and Eclipse AcurosXB. Supplementary Note S4: Statistical Validation and Non-Inferiority In order to assess statistical equivalence between our end-to-end AI plans and the Eclipse plans, we conducted a non-inferiority test on the main DVH metrics using the Eclipse AcurosXB dose engine for the evaluation. For each metric, we used a statistical significance of p < 0.05 and a clinical margin of 0.01 for the homogeneity index metrics, and 1.5 Gy for the organ at risk (OAR) dose metrics. As illustrated in Table S3, AI planning met the non-inferiority criteria on all the tested metrics. 29 Supplementary Note S5: Qualitative Results: Representative Cases We depict the DVHs of six representative cases spanning the full spectrum of PTV sizes (0th (minimum), 20th, 40th, 50th (median), 80th, and 100th (maximum) percentiles) of our validation dataset using different dose engines (DL dose, LTBS, Eclipse A). The results using the Eclipse AcurosXB dose engine are in the main document. 30 (a) Case 1(b) Case 2 (c) Case 3(d) Case 4 (e) Case 5(f) Case 6 PTV (Eclipse)PTV (AI)Rectum (Eclipse) Rectum (AI)Bladder (Eclipse)Bladder (AI) Fig. S8: Dose–volume histograms (DVHs) for six cases corresponding to the 0th (minimum), 20th, 40th, 50th (median), 80th, and 100th (maximum) percentiles of PTV size in the validation dataset using DL dose engine. 31 (a) Case 1(b) Case 2 (c) Case 3(d) Case 4 (e) Case 5(f) Case 6 PTV (Eclipse)PTV (AI)Rectum (Eclipse) Rectum (AI)Bladder (Eclipse)Bladder (AI) Fig. S9: Dose–volume histograms (DVHs) for six cases corresponding to the 0th (minimum), 20th, 40th, 50th (median), 80th, and 100th (maximum) percentiles of PTV size in the validation dataset using LTBS dose engine. 32 (a) Case 1(b) Case 2 (c) Case 3(d) Case 4 (e) Case 5(f) Case 6 PTV (Eclipse)PTV (AI)Rectum (Eclipse) Rectum (AI)Bladder (Eclipse)Bladder (AI) Fig. S10: Dose–volume histograms (DVHs) for six cases corresponding to the 0th (minimum), 20th, 40th, 50th (median), 80th, and 100th (maximum) percentiles of PTV size in the validation dataset using Eclipse A dose engine. 33 We depict the mean curves accross patients for our AIRT method and the Eclipse optimized RapidPlan evaluated on the DL dose engine in Figure S11, evaluated on LTBS dose engine in Figure S12, evaluated on Eclipse A in Figure S13 and evaluated on Eclipse AcurosXB in Figure S14. Figure S15 shows the dose distribution of the median validation case (by PTV volume) using DL dose evaluation, complementing the DVH comparisons in Figure S8. Figure S16 shows the dose distribution of the median validation case (by PTV volume) using LTBS dose evaluation, complementing the DVH comparisons in Figure S9. Figure S17 shows the dose distribution of the median validation case (by PTV volume) using Eclipse A dose evaluation, complementing the DVH comparisons in Figure S10. Figure S18 shows the dose distribution of the median validation case (by PTV volume) using Eclipse AcurosXB evaluation, complementing the DVH comparisons in Figure 2 of the main manuscript. (a) DVHs curves of our AIRT method.(b) DVHs curves of Eclipse optimized plans. Fig. S11: DVHs curves accross all the validation patients showing dose distributions for PTV and OARs using DL dose engine. In bold are the mean curves accross patients. 34 (a) DVHs curves of our AIRT method.(b) DVHs curves of Eclipse optimized plans. Fig. S12: DVHs curves accross all the validation patients showing dose distributions for PTV and OARs using LTBS dose engine. In bold are the mean curves accross patients. (a) DVHs curves of our AIRT method.(b) DVHs curves of Eclipse optimized plans. Fig. S13: DVHs curves accross all the validation patients showing dose distributions for PTV and OARs using Eclipse A dose engine. In bold are the mean curves accross patients. 35 (a) DVHs curves of our AIRT method.(b) DVHs curves of Eclipse optimized plans. Fig. S14: DVHs curves accross all the validation patients showing dose distributions for PTV and OARs using Eclipse Acuros dose engine. In bold are the mean curves accross patients. (a) Case 4: Left = Target dose, Right = AIRT prediction Fig. S15: Dose distributions evaluated with the DL dose engine for the median vali- dation case (Case 4, as shown in the corresponding DVH figure). Left = reference dose (Eclipse/target), Right = AIRT predicted dose. CT slices and contours overlaid. 36 (a) Case 4: Left = Target dose, Right = AIRT prediction Fig. S16: Dose distributions evaluated with the LTBS dose engine for the median validation case (Case 4, as shown in the corresponding DVH figure). Left = reference dose (Eclipse/target), Right = AIRT predicted dose. CT slices and contours overlaid. (a) Case 4: Left = Target dose, Right = AIRT prediction Fig. S17: Dose distributions evaluated with the Eclipse A dose engine for the median validation case (Case 4, as shown in the corresponding DVH figure). Left = reference dose (Eclipse/target), Right = AIRT predicted dose. CT slices and contours overlaid. 37 (a) Case 4: Left = Target dose, Right = AIRT prediction Fig. S18: Dose distributions evaluated with the Eclipse AcurosXB dose engine for the median validation case (Case 4, as shown in the corresponding DVH figure). Left = reference dose (Eclipse/target), Right = AIRT predicted dose. CT slices and contours overlaid. Supplementary DVHs demonstrating user-controlled OAR sparing To illustrate the adaptability of our end-to-end AI method to user-controlled inputs, we show the average DVH curves across the entire validation dataset for the PTV, rectum, and bladder for various user inputs s b (for bladder) and s r (for rectum), controlling the degree of additional dose reduction in each OAR. Fig. S19 illustrates the achievable dose tradeoffs, evaluated using the Dl dose engine, with more aggressive attenuation as in the main document. 38 Fig. S19: Mean DVHs for the PTV, bladder, and rectum, averaged over the entire val- idation set. Suppression factors are reported as a percentage of the predicted baseline dose. The “baseline” curve corresponds to no additional OAR control (s b = s r = 0). The figure shows that parameters s b and s r independently reduce the OAR dose, with an expected tradeoff on the PTV homogeneity, which worsens as OAR sparing increases. Supplementary Note S6: Computational Performance Table S4 reports the module-wise computational time of the proposed end-to- end VMAT planning pipeline (organ auto-contouring and DICOM export are not included). All runtime measurements were performed on a single NVIDIA A100 GPU (80 GB). Reported values are the average steady-state inference runtime per patient over the 62 validation cases. Two dummy cases were run before timing to account for the one-time initialization overheads associated with model loading, CUDA con- text creation and GPU kernel autotuning and caching. The most computationally intensive step is deep-learning-based dose calculation (654 ms). As a comparison, the corresponding optimization in Eclipse takes several (3–5) minutes per plan. References [1] Thompson, R. F. et al. Artificial intelligence in radiation oncology: a specialty- wide disruptive transformation? Radiotherapy and Oncology 129, 421–426 (2018). [2] Hussein, M., Heijmen, B. J., Verellen, D. & Nisbet, A. Automation in intensity modulated radiotherapy treatment planning—a review of recent innovations. The British journal of radiology 91, 20180270 (2018). 39 [3] Bortfeld, T. Imrt: a review and preview. Physics in Medicine & Biology 51, R363 (2006). [4] Shepard, D. M., Ferris, M. C., Olivera, G. H. & Mackie, T. R. Optimizing the delivery of radiation therapy to cancer patients. Siam Review 41, 721–744 (1999). [5] Nelms, B. E. et al. Variation in external beam treatment plan quality: an inter- institutional study of planners and planning systems. Practical radiation oncology 2, 296–305 (2012). [6] Dongrong, Y. et al. Breast radiation therapy fluence painting with multi-agent deep reinforcement learning. Medical Physics 52, 2015–2024 (2025). [7] Hrinivich, W. T. et al. Clinical vmat machine parameter optimization for localized prostate cancer using deep reinforcement learning. Medical physics 51, 3972–3984 (2024). [8] Mekki, L., Hrinivich, W. T. & Lee, J. Dual-arc vmat machine parameter opti- mization for localized prostate cancer using deep reinforcement learning. Physics in Medicine & Biology 70, 225007 (2025). [9] Gao, R. et al. Multi-agent reinforcement learning meets leaf sequencing in radiotherapy. International Conference on Machine Learning (2024). [10] Shaffer, N., Mudireddy, A. R. & St-Aubin, J. A tandem reinforcement learning framework for localized vmat machine parameter optimization. Medical Physics 53 (2026). [11] Yang, D. et al. Reinforcement learning-driven automated head and neck simulta- neous integrated boost (sib) radiation therapy: flexible treatment planning aligned with clinical preferences. Physics in Medicine & Biology 70, 085019 (2025). [12] Yang, D. et al. Foresight planning: Radiotherapy plan optimization via self- supervised model predictive control. Medical Physics 52, e70132 (2025). [13] Arberet, S. et al. Beam’s eye view to fluence maps 3d network for ultra fast vmat radiotherapy planning. Medical Physics 52, 3183–3190 (2025). [14] Gao, R., Lou, B., Xu, Z., Comaniciu, D. & Kamen, A.Flexible-cm gan: Towards precise 3d dose prediction in radiotherapy. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 715–725 (2023). [15] Kraus, M. F. et al. Single shot full plan deep learning dose computation for radiation therapy using spherical harmonics. Medical Physics 53 (2026). URL http://dx.doi.org/10.1002/mp.70223. [16] Gao, R. et al. Generative ai helps radiotherapy planning with user prefer- ence. NeurIPS 2025 GenAI4Health (2025). URL https://openreview.net/pdf? 40 id=euBfZb1iKZ. [17] Gao, R. et al. Automating high quality rt planning at scale. arXiv preprint arXiv:2501.11803 (2025). [18] Mgboh, U., Sultan, R. I., Kim, J., Thind, K. & Zhu, D.Fluenceformer: Transformer-driven multi-beam fluence map regression for radiotherapy planning. arXiv preprint arXiv:2512.22425 (2025). [19] Simk ́o, A. et al. A physics-informed, plug-and-play dose engine for gradient-based radiotherapy treatment planning. arXiv preprint arXiv:2512.18863 (2025). [20] Varian Medical Affairs. Prostatesbrt sib 40/36gy (artia) — vmat2. https: //medicalaffairs.varian.com/pelvis-prostateSBRT-vmat2 (2024).Accessed 10 December 2024. [21] Sackett, J. et al.Sharing a flexible urethral sparing sbrt prostate rapid- plan model and quantifying plan quality via dosimetric scorecard with clinical implementation. Radiation Oncology 21, 12 (2026). [22] Varian.Rapidplanknowledge-basedplanning.https: //w.varian.com/products/radiotherapy/treatment-planning/ rapidplan-knowledge-based-planning (2024). Accessed 19 September 2024. [23] Siemens Healthineers. AI-Rad Companion Organs RT. https://w.varian.com/ products/radiosurgery/treatment-planning/ai-rad-companion-organs-rt (2023). Medical device software for automatic organ-at-risk segmentation in radiotherapy. [24] Roy, S. et al.Mednext: transformer-driven scaling of convnets for medical image segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention 405–415 (2023). [25] Gregor, K. & LeCun, Y. Learning fast approximations of sparse coding. Proceed- ings of the 27th international conference on international conference on machine learning 399–406 (2010). [26] Freifeld, O., Hauberg, S., Batmanghelich, K. & Fisher, J. W. Transformations based on continuous piecewise-affine velocity fields. IEEE transactions on pattern analysis and machine intelligence 39, 2496–2509 (2017). [27] Detlefsen, N. S. libcpab. https://github.com/SkafteNicki/libcpab (2018). [28] Siemens Healthineers. Ai-rad companion. https://w.siemens-healthineers. com/en-us/digital-health-solutions/ai-rad-companion (2024).Accessed 19 September 2024. [29] Varian Medical Affairs. Python interface to eclipse scripting api. https://github. com/VarianAPIs/PyESAPI (2024). Accessed 10 December 2024. 41 [30] Karras, T. et al. Analyzing and improving the image quality of stylegan. Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition 8110–8119 (2020). [31] lucidrains. stylegan2-pytorch. https://github.com/lucidrains/stylegan2-pytorch/ tree/ce7f830bc10d037cddc75d6166904baf07945cf6 (2025).GitHub snapshot, accessed 2025-10-13. 42 Table S2: Comparison of Eclipse and AIRT planning for intact prostate VMAT cases across 4 dose calculation engines. The homo- geneity index (HI) is defined as (D 2% − D 98% )/D 50% following the ICRU convention (lower is better). All metrics are reported as mean ± standard deviation over the validation dataset. P-values are from the Wilcoxon signed-rank test. DL dose engine StructureMetricEclipse PlansAIRT Plansp-value PTV HI0.16 ± 0.030.11 ± 0.02< 0.001 D98 (Gy)38.4 ± 0.838.8 ± 0.4< 0.001 Bladder Dmean (Gy)6.7 ± 3.36.9 ± 3.50.01 D50 (Gy)2.9 ± 2.53.0 ± 2.70.29 D2 (Gy)34.8 ± 6.835.1 ± 6.60.07 Rectum Dmean (Gy)5.5 ± 1.45.3 ± 1.40.002 D50 (Gy)2.7 ± 1.22.5 ± 1.1< 0.001 D2 (Gy)30.1 ± 5.530.5 ± 5.80.04 LTBS dose engine StructureMetricEclipse PlansAIRT Plansp-value PTV HI0.19 ± 0.030.15 ± 0.03< 0.001 D98 (Gy)37.8 ± 0.938.3 ± 0.6< 0.001 Bladder Dmean (Gy)6.8 ± 3.46.9 ± 3.50.05 D50 (Gy)2.8 ± 2.32.8 ± 2.50.66 D2 (Gy)36.3 ± 7.536.5 ± 7.30.50 Rectum Dmean (Gy)5.1 ± 1.34.9 ± 1.30.002 D50 (Gy)2.5 ± 1.02.2 ± 0.9< 0.001 D2 (Gy)30.2 ± 6.030.5 ± 6.40.03 Eclipse AcurosXB engine StructureMetricEclipse PlansAIRT Plansp-value PTV HI0.10 ± 0.010.10 ± 0.010.03 D98 (Gy)39.3 ± 0.239.3 ± 0.20.36 Bladder Dmean (Gy)7.3 ± 3.57.7 ± 3.8< 0.001 D50 (Gy)3.3 ± 2.73.6 ± 3.0< 0.001 D2 (Gy)35.8 ± 6.037.0 ± 5.9< 0.001 Rectum Dmean (Gy)5.4 ± 1.45.5 ± 1.40.66 D50 (Gy)2.7 ± 1.02.6 ± 1.00.003 D2 (Gy)31.4 ± 5.632.5 ± 6.0< 0.001 Eclipse A engine StructureMetricEclipse PlansAIRT Plansp-value PTV HI0.10 ± 0.010.10 ± 0.010.28 D98 (Gy)39.4 ± 0.239.3 ± 0.20.12 Bladder Dmean (Gy)7.6 ± 3.68.0 ± 3.9< 0.001 D50 (Gy)3.6 ± 2.73.9 ± 3.1< 0.001 D2 (Gy)36.1 ± 6.037.2 ± 5.9< 0.001 Rectum Dmean (Gy)5.7 ± 1.45.7 ± 1.50.61 D50 (Gy)3.0 ± 1.12.8 ± 1.10.003 D2 (Gy)31.5 ± 5.732.6 ± 6.0< 0.001 43 Table S3: Non-inferiority test results comparing our AI planning to Eclipse planning on DVH metrics evaluated using Eclipse AcurosXB dose engine. A metric is considered non-inferior if the one-sided non-inferiority test is statistically significant (p < 0.05), and the difference does not exceed the predefined margin. Here, the mean difference is reported as the AI metric minus the Eclipse one. MetricMean Diff.P-valueNon-inferiorMarginN PTV HI Mean0.0040.00Yes0.0162 PTV HI Median0.0040.00Yes0.0162 Bladder Mean Dose0.3920.00Yes1.5062 Bladder D500.3460.00Yes1.5062 Bladder D21.1350.03Yes1.5062 Rectum Mean Dose0.0310.00Yes1.5062 Rectum D50-0.1260.00Yes1.5062 Rectum D21.1370.03Yes1.5062 Table S4: Module-wise compute time breakdown for the proposed pipeline. ModuleNotesCompute Time (ms) DoseProposer pre-processingPrepare/crop CT and structure tensors 35 DoseProposerInitial dose prediction4 BEV Projection (1st pass)Beam’s eye view (BEV) projection of the dose 56 Bev2FluenceDose BEV-to-Fluence network18 DL Dose ComputationPredicts dose from fluence maps654 Dose Error Construction (Err)Computes 3D dose error map1 BEV Projection (2nd pass)Projects 3D dose error to BEV55 Fluence CorrectionNetwork to refines fluence maps18 FM CPU to GPUTransfer the fluence maps from GPU to CPU 18 Leaf SequencingConverts fluence maps to deliverable MLC sequence 30 TotalEnd-to-end pipeline time889 44