← Back to papers

Paper deep dive

AI-Driven Phase Identification from X-ray Hyperspectral Imaging of cycled Na-ion Cathode Materials

Fayçal Adrar, Nicolas Folastre, Chloé Pablos, Stefan Stanescu, Sufal Swaraj, Raghvender Raghvender, François Cadiou, Laurence Croguennec, Matthieu Bugnet, Arnaud Demortière

Year: 2026Venue: arXiv preprintArea: cond-mat.mtrl-sciType: PreprintEmbeddings: 68

Abstract

Abstract:Na-ion batteries have emerged as viable candidates for large-scale energy storage applica- tions due to resource abundance and cost advantages. The constraints imposed on their performance and durability, for instance, by complex phase transformations in positive electrode materials during electrochemical cycling, can be addressed and are thus not detrimental to their development. However, diffusion-limited Na-ion transport can drive spatially heterogeneous phase nucleation and propagation, leading to multiphase coexis- tence and locally non-uniform electrochemical activity, generating complex reaction path- ways that challenge both mechanistic understanding and predictive material optimization. These challenges can be addressed by investigating single-crystalline regions of materials, i.e. down to the scale of individual particles, although such analyses are often constrained by energetically and/or spatially sparse hyperspectral datasets. Here, we developed an AI-driven method to process hyperspectral data under sparse sampling conditions and generate multiphase maps with nanometer-scale resolution over a micrometer-scale field of view. We applied this processing on scanning transmission X-ray microscopy (STXM) data to determine the distribution and coexistence of phases in individual particles of NaxV2(PO4)2F3 cathode materials, at different states of charge. The methodology relies on a workflow which combines a Gaussian mixture variational autoencoder (GMVAE) algorithm with the Pearson corre- lation coefficient to identify the sodium content and map their spatial distribution. Our approach reveals nanoscale phase heterogeneity and evolution within individual particles, and improves the reliability of phase detection by identifying ambiguity zones, false assign- ments, and transition phases localized at grain boundaries.

Tags

ai-safety (imported, 100%)cond-matmtrl-sci (suggested, 92%)preprint (suggested, 88%)

Links

PDF not stored locally. Use the link above to view on the source site.

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 95%

Last extracted: 3/13/2026, 12:35:55 AM

Summary

The paper presents an AI-driven methodology for phase identification in Na-ion battery cathode materials (Na_xV2(PO4)2F3) using scanning transmission X-ray microscopy (STXM). By combining Pearson correlation coefficients with a Gaussian mixture variational autoencoder (GMVAE), the authors successfully map nanoscale phase heterogeneity and resolve spectral ambiguities in sparsely sampled hyperspectral datasets.

Entities (4)

GMVAE · algorithm · 100%Na_xV2(PO4)2F3 · material · 100%STXM · technique · 100%Pearson correlation coefficient · algorithm · 95%

Relation Signals (2)

STXM analyzes Na_xV2(PO4)2F3

confidence 100% · We applied this processing on scanning transmission X-ray microscopy (STXM) data to determine the distribution and coexistence of phases in individual particles of NaxV2(PO4)2F3 cathode materials

GMVAE processes STXM

confidence 90% · The methodology relies on a workflow which combines a Gaussian mixture variational autoencoder (GMVAE) algorithm with the Pearson correlation coefficient to identify the sodium content and map their spatial distribution.

Cypher Suggestions (2)

Find all materials analyzed by a specific technique · confidence 90% · unvalidated

MATCH (m:Material)-[:ANALYZED_BY]->(t:Technique {name: 'STXM'}) RETURN m.name

Identify algorithms used in the workflow · confidence 85% · unvalidated

MATCH (a:Algorithm)-[:USED_IN]->(w:Workflow {name: 'Phase Mapping'}) RETURN a.name

Full Text

67,809 characters extracted from source content.

Expand or collapse full text

AI-Driven Phase Identification from X-ray Hyperspectral Imaging of cycled Na-ion Cathode Materials Fay ̧cal Adrar 1,2,4 , Nicolas Folastre 1,2,4 , Chlo ́e Pablos 1,2,6 , Stefan Stanescu 3 , Sufal Swaraj 3 , Raghvender Raghvender 1,2 , Fran ̧cois Cadiou 1 , Laurence Croguennec 2,4,6 , Matthieu Bugnet 5,* , and Arnaud Demorti`ere 1,2,4,* 1 Laboratoire de R ́eactivit ́e et de Chimie des Solides (LRCS), CNRS UMR 7314, Universit ́e de Picardie Jules Verne, Hub de l’Energie, 15 Rue Baudelocque, Amiens, France 2 R ́eseau sur le Stockage Electrochimique de l’Energie (RS2E), CNRS FR 3459, Hub de l’Energie, 15 Rue Baudelocque, Amiens, France 3 Synchrotron SOLEIL, L’Orme des Merisiers, 91190 Saint-Aubin, France 4 ALISTORE-European Research Institute, CNRS FR 3104, Hub de l’Energie, Rue Baudelocque, Amiens, France 5 CNRS, INSA Lyon, Universit ́e Claude Bernard Lyon 1, MATEIS, UMR 5510, 69621 Villeurbanne, France 6 Univ. Bordeaux, CNRS, Bordeaux INP, ICMCB, UMR 5026, F-33600 Pessac, France * corresponding author: matthieu.bugnet@cnrs.fr, arnaud.demortiere@cnrs.fr March 10, 2026 1 Abstract Na-ion batteries have emerged as viable candidates for large-scale energy storage applica- tions due to resource abundance and cost advantages. The constraints imposed on their performance and durability, for instance, by complex phase transformations in positive electrode materials during electrochemical cycling, can be addressed and are thus not detrimental to their development. However, diffusion-limited Na-ion transport can drive spatially heterogeneous phase nucleation and propagation, leading to multiphase coexis- tence and locally non-uniform electrochemical activity, generating complex reaction path- ways that challenge both mechanistic understanding and predictive material optimization. These challenges can be addressed by investigating single-crystalline regions of materials, i.e. down to the scale of individual particles, although such analyses are often constrained by energetically and/or spatially sparse hyperspectral datasets. Here, we developed an AI-driven method to process hyperspectral data under sparse sampling conditions and generate multiphase maps with nanometer-scale resolution over a micrometer-scale field of view. We applied this processing on scanning transmission X-ray microscopy (STXM) data to determine the distribution and coexistence of phases in individual particles of Na x V 2 (PO 4 ) 2 F 3 cathode materials, at different states of charge, i.e. x = 3.0, 2.4, 2.0, 1.0, and 1-y (0.1 < y < 0.5). The methodology relies on a workflow which combines a Gaussian mixture variational autoencoder (GMVAE) algorithm with the Pearson corre- lation coefficient to identify the sodium content and map their spatial distribution. Our approach reveals nanoscale phase heterogeneity and evolution within individual particles, and improves the reliability of phase detection by identifying ambiguity zones, false assign- ments, and transition phases localized at grain boundaries. This study demonstrates the relevance and efficiency of coupling advanced VAE-based statistical modeling with modern hyperspectral imaging techniques to interpret phase transformations in energy materials prone to structural changes. Keywords: deep learning, Pearson correlation coefficient, Gaussian mixture varia- tional autoencoder, Na-ion battery cathode, Na 3 V 2 (PO 4 ) 2 F 3 , STXM, hyperspectral data 1 arXiv:2603.07666v1 [cond-mat.mtrl-sci] 8 Mar 2026 2 Introduction The transition to a sustainable energy landscape requires integrating renewable energy sources with efficient and scalable storage technologies [1, 2]. Lithium-ion batteries cur- rently dominate energy storage due to their high energy density and long cycle life [3, 4, 5]. However, their large-scale use in stationary storage is limited by the high cost, limited availability, and environmental impact of critical raw metals such as lithium, nickel, and cobalt [6, 7]. These challenges have stimulated growing interest in sodium-ion (Na-ion) batteries as a promising alternative, benefiting from the elemental abundance and low cost of sodium while offering comparable electrochemical performance [8]. Among Na-ion pos- itive electrode materials (called cathodes in the following), Na 3 V 2 (PO 4 ) 2 F 3 (NVPF) has attracted particular attention due to its high cycling stability, elevated operating voltage, and excellent rate capability compared to sodium-based layered oxides and other polyan- ionic compounds [9, 10]. Despite these advantages, NVPF cathodes are not immune to degradations, which can arise from structural and interfacial instabilities [11, 12, 13, 14]. A key challenge in Na-ion battery development lies in the strong interplay between elec- trochemical processes, mechanical effects and diffusion mechanisms, manifested through complex phase transformations within electrode materials [15], which are further influ- enced by microstructural factors such as grain boundaries, surface reactions, porosity and carbon percolation network’s. In Na-ion cathodes, the comparatively large ionic radius of Na + and its stronger electrostatic interactions with host frameworks, relative to Li + , give rise to distinct mechanims, even in similar host structures [16]. Layered sodium oxides frequently undergo multiple phase transitions, which can hinder fast Na + diffusion. In ad- dition, their pronounced sensitivity to air exposure poses significant challenges for storage and handling. Polyanionic compounds, on the other hand, are predominantly constrained by their low intrinsic electronic conductivity, which requires them to be synthesised as carbon-coated nanoparticles (<100 nm in diameter for the individual particles). For in- stance, in Na 3 V 2 (PO 4 ) 2 F 3 , the overall volume change upon cycling is relatively small (∼ 3%) [9], reflecting the structural robustness of this polyanionic framework. During (de)sodiation, progressive Na + extraction drives vanadium oxidation and structural phase transitions, associated to Na + and charge ordering,with the bulk material largely follow- ing the equilibrium phase diagram reported in Ref. [17]. However, spatial heterogeneities in Na + distribution within individual particles can promote local phase coexistence, gen- erating regions with distinct electrochemical responses that influence reaction kinetics, reversibility, and long-term nanoscale stability. To fully understand these mechanisms, it is crucial to dynamically monitor charge and discharge processes by correlating electro- chemical behavior with nanoscale chemical information derived from vanadium oxidation states, which serve as sensitive indicators of structural phase evolution. While mesoscale structural studies of Na x V 2 (PO 4 ) 2 F 3 have established its equilibrium phase evolution dur- ing cycling [18], additional complexity is expected to arise at the nanoscale due to local compositional and kinetic heterogeneities. Accordingly, advanced spectroscopic analyses were employed to map phase distributions via spatial variations in vanadium and oxygen valence states, enabling direct insight into local sodium-ion diffusion mechanisms across different states of charge. For this purpose, we employed synchrotron scanning transmission X-ray microscopy (STXM) to study Na 3 V 2 (PO 4 ) 2 F 3 (referred to as Na 3.0 VPF) and its desodiated phases. STXM is particularly well-suited for this local investigation due to its ability to provide chemical and electronic state information at the nanoscale through hyperspectral imag- ing, where each scan point contains a full X-ray absorption spectrum. This technique enables direct visualization of phase distributions and chemical heterogeneities within individual cathode particles. Changes in lattice parameters leading to phase transfor- mations are closely linked to variations in X-ray absorption near edge spectra (XANES) spectra, reflecting alterations in the local vanadium environment. However, the complex phase transition behavior during electrochemical cycling poses significant challenges for 2 conventional characterization techniques, which often lack the local sensitivity to resolve these transformations. Many studies report the use of synchrotron STXM to investigate battery materials, focusing on phase transformations and electrocatalytic reaction perfor- mance during cycling [19, 20, 21, 22, 23, 24]. Notably, Ohmer et al.. [21] tracked phase boundary propagation in single-crystalline LiFePO 4 during lithiation and delithiation us- ing in situ STXM. A comprehensive overview of such studies can be found in the review of Kim et al. [25], and the relevance of STXM for battery research in the review of Temprano et al. [26]. Despite these advances, high-resolution mapping remains limited due to the intrinsic difficulty to disentangle the subtle and overlapping spectral features associated to each phase, thus restricting the ability to fully resolve nanoscale heterogeneities within individual particles. STXM produces large and complex datasets that require specialized analysis tools. Traditional softwares such as aXis2000 [27], STXM Reader [28], and MANTiS [29] are powerful for general data processing. However, common phase-mapping approaches such as least-squares linear combination fitting (LS-LCF) become less reliable when spectral resolution is low [30], and singular value decomposition (SVD), often used for phase map- ping [31], can also fail under sparse spectral sampling. This limitation often arises in high spatial resolution STXM experiments, where fewer energy points are collected to mini- mize radiation damage and acquisition time, resulting in sparse hyperspectral datasets. Such datasets, characterized by limited spectral sampling, challenge conventional unmixing methods because subtle spectral variations cannot be robustly captured or distinguished [32, 33]. To overcome these limitations, we developed a custom Python-based approach combining Pearson correlation coefficient [34] (PCC) and Gaussian mixture variational autoencoder (GMVAE) methods [35, 36, 37, 38]. Variational autoencoder models are in- creasingly used to analyze high-dimensional scientific imaging and hyperspectral datasets, including electron microscopy [39], liquid-phase TEM (LPTEM) [40], and diffraction map- ping [41]. Physics-augmented VAE frameworks have been shown to disentangle latent physical factors and delineate microstructural boundaries in hyperspectral microscopy [39], transformer-VAE architectures capture stochastic nanoparticle dynamics in LPTEM using physics-informed loss functions [40]. Similar latent-space encodings have been used to derive reduced diffraction representations that reveal microstructure heterogeneity and support materials design [41]. These studies collectively highlight the utility of VAE- based latent representations for extracting interpretable physical structure from complex experimental data. The Gaussian mixture component in a GMVAE corrects the unimodal latent space assumption of standard VAEs by introducing a multimodal prior, enabling the model to capture clustered, heterogeneous data distributions and thus improving struc- tured representation and class separability in the latent space [35]. The Pearson correlation identifies similarities in spectral shape, while the VAE compresses data into a compact latent space where distinct phases can be more effectively separated. In this work, we implement a two-step PCC-GMVAE workflow tailored for sparsely sampled STXM datasets. High resolution reference spectra for five sodium contents were used to select thirteen characteristic energies for high spatial resolution mapping. Pixel spectra were first assigned using Pearson correlation, and a reliability metric based on the gap between the two highest correlation coefficients was introduced to identify ambiguous regions. To resolve these ambiguities, spectra were projected into a three dimensional GM- VAE latent space and reassigned using Mahalanobis distances to phase clusters. Applied to Na x V 2 (PO 4 ) 2 F 3 cathodes at multiple states of charge, this approach reveals strong intra- and inter-particle heterogeneity and demonstrates reliable multiphase mapping, es- tablishing a robust AI-driven framework for sparse hyperspectral phase identification. 3 3 Results 3.1 Experimental procedure The experimental setup is shown in Figure 1a, and a comparison between acquisitions at high spectral resolution (Figures 1b–c) and high spatial resolution (Figures 1d-e) high- lights the intrinsic trade-off of STXM technique. High spectral resolution (∼ 0.1 eV) enables the identification of detailed energy-dependent features but provides limited spa- tial information (∼ 135 nm). Conversely, high spatial resolution (∼ 31 nm) captures fine spatial details but with a reduced number of energy points, as indicated by the discrete red bars in Figures 1d-e. This trade-off adds to acquisition time constraints and the risk of beam damage, making it difficult to simultaneously achieve both high spatial and spectral resolution. Figure 1: Scanning transmission X-ray microscopy (STXM) experimental setup. (a) Schematic of the STXM setup (not to scale). (b) Illustration of a stack of images acquired with low spatial resolution (low-sampling version of the image shown in panel d). (c) Optical density (OD) spectrum of a NVPF particle acquired with high energy resolution; the dashed line indicates the energies selected for high spatial resolution acquisitions. (d) High spatial resolution image of a NVPF particle (same as in (b)). (e) Illustration of the optical density (OD) spectrum from a low spectral resolution acquisition, where the bars represent the energy levels and their heights correspond to the optical density values. To address this limitation, we first measured reference spectra at high spectral reso- lution for the five sodium contents. The resulting V-L 2,3 and O-K edge spectra (Figure 4 2) exhibit strong variations at specific energies. From these data, thirteen characteristic energies were selected (black dashed lines in Figure 2), which were subsequently used for high spatial resolution mapping of the Na x VPF samples. 3.2 X-ray absorption near edge spectra (XANES) Figure 2: Reference XANES spectra of NVPF for different levels of Na + extraction. Optical density (OD) of V-L 2,3 and O-K edge X-ray absorption spectra acquired in STXM mea- surements on individual NVPF particles at different states of charge. Figure 2 shows X-ray absorption near edge structure (XANES) for NVPF samples at different states of charging. The corresponding electrochemical cycling curves and X-ray diffraction patterns are presented in Supplementary Figure S1. The full pattern matching (Le Bail method) refinements results for Na 3 VPF, Na 2.4 VPF, Na 2 VPF, Na 1 VPF, and Na 1−y VPF are also provided in Supplementary Figure S2. Each spectrum corresponds to the integration over an entire particle for a given state of charge. The XANES is in good agreement with the literature [9] and is thus considered representative of the different Na contents. These representative spectra from individual particles illustrate the progressive evolution of the redox state during cycling. In the pristine material, Na 3 VPF, two intense peaks are observed at about 517.5 eV and 525.0 eV, which originate from vanadium L 3 (2p 3/2 → 3d) and L 2 (2p 1/2 → 3d) transitions. The splitting between these peaks arises from spin–orbit coupling of the 2p electrons [9, 42]. As the material is deintercalated, the spectral weight of these peaks gradually shifts to higher energies, evidencing the oxidation of vanadium. This interpretation is further supported by the pre-edge features of the oxygen K-edge (530–535 eV), which correspond to transitions from O 1s to V 3d–O 2p hybridized states. The pre-edge peak at ∼530.3 eV becomes more pronounced during charging, indicating that the V–O bonds acquire increased covalency as vanadium loses electrons and the bonding evolves [9, 43]. The vertical dashed black lines mark twelve specific energies where significant changes in the optical density (i.e. XANES intensity) occur. An additional energy, around 505 eV, was taken before the V-L 2,3 edges, and is not visible in Figure 2 due to display preferences. Nevertheless, by analyzing only these thirteen energies, it is possible to extract qualitative information about the material, including the sodium content. Indeed, variations of the optical density at these energies are strongly linked to the changes in sodium levels [9]. In this study, these energies were used to map the phase distribution of several individual NVPF particles. Reference spectra composed of 13 selected energy points were generated for each Na content by extracting the corresponding optical density values from the high spectral resolution acquisitions shown in Figure 2. Intrinsically, this experimental approach records the spectral intensity only at selected energies and does not rely on the full XANES spectrum, as acquiring a complete high- 5 resolution spectral stack in STXM is highly time-consuming and results in a large cumu- lative X-ray dose that can induce significant beam damage in sensitive battery materials. Therefore, it is primordial to develop new data processing strategies capable of captur- ing the subtle spectral variations and the corresponding spatial variations. The typical methods commonly used to interpret X-ray absorption fine structures, using electronic structure calculations or peak fitting, are unsuitable in this case due to the sparse en- ergy sampling, which makes extracting quantitative information challenging [44, 45, 46]. Instead, approaches inspired by correlation techniques primarily enabling qualitative in- sights into the sample as used in diffraction pattern matching in transmission electron microscopy (e.g., ASTAR [47], PyXem [48], py4DSTEM [49], ePattern [50]), may provide robust alternatives for analyzing such datasets. 3.3 STXM phase mapping workflow The initial data processing was carried out using the aXis2000 software [27], which was employed for image alignment and normalization of the images into optical density (OD) according to the Beer–Lambert law. The extracted OD from reference samples at the characteristic energy levels provides experimental fingerprints of the sodium content in the samples. Subsequently, we developed a Python-based workflow for phase mapping of the STXM data, as illustrated in Figure 3. In this workflow, each spectrum from the spectral image data cube is compared with the five reference spectra using the PCC. This metric, highly sensitive to subtle variations in the data, quantifies the linear relationship between the measured OD spectra and the references, resulting in a correlation map for each compared sodium phase (shown in Supplementary Figure S3c–g). Each spectrum is then assigned to the phase with the highest PCC, yielding an initial phase map. We introduce a reliability metric R tailored to this context, based on the comparison of correlation coefficients extracted from the correlation maps associated with each phase. The metric is defined as: R = 100 (Q 1 − Q 2 )(1) where R is the reliability score, Q (1) is the highest correlation coefficient among all phases for a given pixel, and Q (2) is the second-highest coefficient. This score quantifies how unambiguous a pixel’s phase assignment is, larger values of R indicate a greater gap between the best and second-best correlations, and therefore higher confidence in the assignment. The reliability values are displayed as a greyscale map, with each pixel encoding its corresponding R (Supplementary Figure S3a), which is then overlaid on the phase map as a brightness layer to simultaneously visualize both the assigned phase and the confidence of the assignment (Supplementary Figure 3b). However, if the correlation coefficients of two phases for a given pixel differ by less than 0.005, this threshold being determined as the minimum difference required to distinguish between two reference spectra using correlation (see supplementary Figure S4), the pixel is labeled as an ambiguous phase. This designation indicates that the correlation values are too close to allow an unambiguous assignment. To resolve spectral ambiguities, we developed a deep learning model based on a GM- VAE, trained on one-dimensional XANES spectra extracted from STXM data cubes of NVPF particles at various states of charge. By incorporating a mixture of Gaussian priors, the GMVAE captures multimodal latent distributions, enabling unsupervised emergence of discrete spectral clusters corresponding to distinct chemical populations. As reported in [35], this approach improves clustering by preserving discrete structure, reducing mode mixing, and, through minimum-information constraints, producing compact and inter- pretable latent representations with competitive performance. The multimodal latent prior facilitates structured organization of the latent space, allowing spectrally similar phases to be separated more robustly than with conventional 6 Figure 3: STXM phase mapping workflow. (a) Pearson correlation phase mapping: the Pearson correlation is computed between each spectrum vector s and each reference vector r n (n=1,2,3,4,5). The resulting correlation map, ρ n (n=1,2,3,4,5), represents the correlation between s and r n across all pixels. The initial phase map is obtained by comparing the correlation values pixel by pixel and assigning each pixel to the reference with the highest correlation. (b) Ambiguity map: pixels where arg max(ρ n (i,j)− ρ m (i,j)) ≤ 0.005, with n ̸= m, are shown in grey and labeled as ambiguous pixels v a . The ambiguity is resolved by projecting v a into the latent space of the trained Gaussian mixture variational autoencoder (GMVAE) containing the overall phase distributions, measuring the Mahalanobis distances, and assigning the pixel to the phase corresponding to the nearest phase distribution. correlation-based approaches. Regularization through the Kullback–Leibler divergence term in the loss function further shapes the latent distribution, grouping similar spectra into well defined clusters. This provides a more effective representation space in which ambiguous or strongly overlapping XANES features can be distinguished with higher con- fidence. The GMVAE architecture is shown in Supplementary Figure S5. We designed and evaluated several strategies for phase assignment within the latent space. In the first approach, which proved to be the most robust for our dataset, all spec- tra are projected into the latent space and initially labeled using PCC. For each latent space cluster corresponding to a given phase, we compute the Mahalanobis distance (see Methods), which explicitly accounts for the covariance structure and anisotropy of the cluster distribution. Unlike the Euclidean distance, the Mahalanobis metric normalizes 7 distances by the intrinsic variance of each cluster and incorporates correlations between la- tent variables, making it particularly well-suited for probabilistic latent spaces where clus- ters exhibit non-spherical geometries. Ambiguous spectra are then assigned to the phase whose latent distribution yields the minimum Mahalanobis distance, ensuring statistically consistent classification relative to the learned cluster structure. In a second approach, the experimental reference spectra are directly projected into the latent space and used as anchor points. Ambiguous spectra are then assigned to the nearest reference using Eu- clidean distance. This strategy is appropriate when the measured spectra have intensity scales comparable to the references. However, this condition is not strictly satisfied in our dataset, limiting its reliability. Additionally, we explored alternative assignment schemes, including K-Nearest Neighbors (KNN) and distance-to-centroid classification based on the center of mass of each phase cluster. These approaches offer flexibility in cases where cluster boundaries are less clearly separated. 3.4 STXM phase mapping Figure 4: Phase mapping of a Na 2 V 2 (PO 4 ) 2 F sample. (a) Phase map obtained using PCC (b) Map of ambiguous regions, where the correlation coefficient between two phases is close (≤ 0.005), shown in grey. (c) Final phase map with ambiguities resolved using a GMVAE by projecting ambiguous spectra into the latent space and assigning them to the closest phase distribution. The scale bars correspond to 0.2μm (d) Phase distributions in the GMVAE latent space corresponding to the global latent representation without the projection of ambiguous pixels. (e) Projection of ambiguous pixels (in gray). (f) Latent space after resolution of ambiguous pixels. In this section, we present the phase mapping results for a Na 2 VPF particle in the charged state. Figure 4a shows the initial phase map obtained from the Pearson cor- relation analysis, which reveals the coexistence of four phases. Specifically, 58.36% of the particle corresponds to the Na 2 VPF phase, 34.18% of the particle is assigned to the Na 1 VPF phase. The Na 3 VPF and Na 2.4 VPF phase are also detected, suggesting incom- plete electrochemical transformations within the particle. Ambiguous pixels, shown in grey in Figure 4b, represent about 37.05% of the particle, reflecting uncertainty in phase assignment due to the high similarity of correlation values among the three phases. Figure 4c presents the refined phase map after resolving these ambiguities. Here, 8 the ambiguous pixels are redistributed among the coexisting phases, yielding an adjusted composition of 58.93% Na 2 VPF, 38.77% Na 1 VPF, and 1.97% Na 3 VPF. The corresponding correlation maps for each sodium phase are displayed in Supplementary Figure S3c–g, with PCCs ranging from 0.05 to 1.0. These matrices quantify the agreement between experimental XANES spectra and the reference spectra of each identified phase, and are visualized as color-coded maps where stronger correlations appear as more intense colors. Notably, the Na 1 VPF and Na 2 VPF phases exhibit strong spectral similarity as shown in Figure 2, which explains the phase ambiguity observed in Figure 4b. To address this ambiguity, the spectral vectors of the ambiguous pixels, together with the unambiguous spectra of the two phases in question, were projected into a latent space learned by a GMVAE. The analysis of the latent distributions (Figures 4d–f) shows a clear separation of points into distinct clusters, underlining both the discriminative strength of the PCC and the ability of the GMVAE to resolve ambiguous cases. When inspecting the latent space containing all particles (Supplementary Figure S6) it becomes clear that PCC based phase identification can produce false positives. For example, the PCC assigns some spectra to the Na 3 VPF phase even though they originate from a particle in the Na 1 VPF state of charge. By contrast, the GMVAE latent space places these spectra correctly near the Na 1 VPF cluster, and inspection of the correspond- ing raw spectra confirms that they resemble Na 1 VPF rather than pristine Na 3 VPF. This demonstrates that the global latent space representation not only resolves ambiguous as- signments but also provides a robust diagnostic for evaluating the reliability of PCC based methods. Figure 5: Phase maps of four NVPF samples at different states of charge. Phase maps of Na 3 VPF, Na 2 VPF, Na 1 VPF, and Na 1−y VPF, respectively, after ambiguity resolution. Each image corresponds to a sample halted at the state of charge indicated on the electrochemical curve of NVPF. Figure 5 presents the phase mapping results of four NVPF particles at different states of charge, ranging from Na 3.0 VPF to Na 1−y VPF. The corresponding reliability map is displayed in supplementary Figure S7. In the pristine state, the Na 3.0 VPF phase dom- inates, consistent with the absence of desodiation. Nonetheless, traces of the Na 2.4 VPF and Na 2 VPF phase are detected, particularly at the particle periphery and in small frag- ments adjacent to the main particle. These regions exhibit low reliability in the associated 9 reliability maps. At x = 2.0, the phase distribution becomes significantly heterogeneous, with 4.46% Na 2.4 VPF, 72.32% Na 2.0 VPF, 21.52% Na 1.0 VPF, and 1.70% Na 3 VPF. At x = 1.0, three phases are identified, with 58.37% Na 1−y VPF, 41.54% Na 1.0 VPF, and 0.09% Na 2.0 VPF. In the fully discharged state (x = 1 − y), only two phases are detected, 91.45% Na 1−y VPF and 8.55% Na 1.0 VPF. However, these values should be inter- preted with caution. As shown in Supplementary Figure S8, the PCCs obtained for the Na 1.0 VPF and Na 1−y VPF reference spectra are relatively low, reducing the confidence in the corresponding phase assignments. This limitation implies that, although the algorithm returns a dominant Na 1−y VPF contribution, the phases may not be fully reliable. Figure 6: Phase maps for six particles at the Na 2.0 VPF state of charge. The scale bar corresponds to 0.2 μm . Our data processing framework was systematically applied to multiple particles at a single nominal state of charge to assess its robustness and reproducibility. Across the six particles examined corresponding to the state of charge Na 2 VPF, we observe a highly heterogeneous and particle-dependent phase distribution, as shown in Figure 6. Despite reaching this nominal state of charge, all particles contain a noticeable fraction of the Na 1 VPF phase with a good reliability, indicating that they are effectively discharging faster than expected. At the same time, most particles also exhibit Na 2.4 VPF and Na 3 VPF phases, which shows that other regions within the same particles discharge more slowly. Overall, all six analyzed particles display a clearly heterogeneous and non-uniform phase distribution, both within individual particles and from one particle to another. 4 Discussion Our phase mapping results reveal significant spatial heterogeneity in NVPF particles dur- ing desodiation. In the pristine state, the Na 3 VPF phase predominates, confirming the absence of desodiation and validating our phase mapping workflow. However, traces of the 10 Na 2.4 VPF and Na 2 VPF phases are detected at the particle periphery and in adjacent frag- ments, where lower reliability is observed in the associated maps. At an intermediate state of charge (x = 2.0), our phase maps reveal a complex coexistence of four sodium phases (Na 3.0 , Na 2.4 , Na 2.0 , Na 1.0 ). The gradual evolution of Na 3 V 2 (PO 4 ) 2 F 3 XRD patterns re- ported by Bianchini et al. [18] corresponds to the deintercalation process from Na 3 VPF towards Na 2.4 VPF, Na 2.2 VPF and Na 2 VPF. Although direct reference patterns for inter- mediate compositions such as Na 2.2 VPF and Na 1.8 VPF are not available in our dataset, the observed three biphasic solution may reflect a progressive and spatially asynchronous phase transformation at the nanoscale. Such heterogeneity has been attributed to par- tial sodium/vacancy ordering, local structural constraints, and limited ionic or electronic transport, which can lead to incomplete or delayed transitions within individual particles. In our case, we also observe that some regions of a particle discharge significantly faster than others in a strongly heterogeneous manner (Figure 6c,d). While particle-to-particle variations in size and morphology could in principle influence this behavior, the parti- cles shown in Figure 6 have comparable sizes, suggesting that the observed heterogeneity arises primarily from intrinsic local structural or kinetic factors rather than from geomet- ric effects. The influence of grain orientation on the discharge heterogeneity in NVPF particles cannot be excluded, considering its significant effect on the reaction mechanisms in LiCoO 2 demonstrated in a recent study [51]. Some quantitative differences are apparent when comparing our results with previous studies. While Bianchini et al..[18] reported that Na 3 VPF vanishes almost completely be- fore substantial growth of Na 2 VPF, our phase mapping reveals persistent Na 3.0 VPF and Na 2.4 VPF domains. This suggests slower local transformation kinetics or stronger locally confined transport limitations. Akhtar et al..[52] documented structural heterogeneity and delayed transformations in NVPF cathodes, and primarily focused on material syn- thesis and O/F ratio control. At x = 1.0, only two phases are detected, with residual Na 2 VPF, indicating that some regions remain unreacted. However, it is important to note that the reliability of this particular phase is low. Reduced reliability at phase inter- faces corresponds to areas of higher strain, complicating phase discrimination [31]. In the fully discharged state (x = 1− y), only Na 1−y VPF and Na 1.0 VPF remain. Overall, these high resolution phase maps demonstrate that NVPF desodiation is inherently non-uniform and multi-step, with domain-specific variations in transformation kinetics and structural constraints. Capturing this level of detail required a Python-based phase mapping workflow, as our experimental setup involved a trade-off between achieving high spatial resolution and lim- iting spectral sampling to minimize beam damage. Under these conditions, conventional LS-LCF [30, 29, 27] are unsuitable due to undersampling. Instead, we employed the PCC [53, 54, 34], a shape-based similarity metric. To overcome cases where PCC alone cannot resolve phases with highly similar spectral signatures, we incorporated a GMVAE based ambiguity resolution step [36, 37, 38]. By clustering spectra in latent space, the GMVAE enables discrimination between otherwise indistinguishable phases. The GMVAE latent space distribution in Supplementary Figure S6a demonstrates that the model successfully captures the underlying structure of the different NVPF phases. Distinct clusters emerge, spanning from red to purple, corresponding to compositions ranging from Na 3 VPF to Na 1−y VPF. Within each cluster, the latent coordinates also reflect variations in overall spectral intensity, navigating from the upper to the lower regions of a given cluster corresponds to a systematic decrease in intensity. The reconstructed spectra presented in Supplementary Figure S9 further demonstrate the GMVAE’s ability to capture subtle spectroscopic evolution across the NVPF system. The model successfully reproduces the characteristic peak shifts associated with the tran- sition from Na 3 VPF to Na 1−y VPF, indicating that it has internalized key features of the V–L 2,3 edges and their evolution during sodium extraction. While these results are encouraging, they must be interpreted with caution. Beyond standard validation, it is 11 essential to acknowledge intrinsic limitations of VAE-based models. The Kullback–Leibler regularization enforces smooth latent-space interpolation, which can inherently bias recon- structions toward averaged representations and limit the recovery of sharp or rare spectral features. As a result, VAEs may struggle to faithfully reproduce highly localized or low- probability physicochemical states, whereas generative paradigms such as diffusion models are often better suited for high-fidelity data generation. In addition, reconstruction qual- ity remains fundamentally constrained by the size, diversity, and representativeness of the training dataset, particularly in sparsely sampled experimental regimes. Furthermore, a key next step will be to evaluate the model using synthetic spectra with controlled and physically grounded features, enabling systematic conditioning on specific experimental parameters and material states. This strategy will not only verify that the reconstruc- tions faithfully capture intrinsic material behavior but also enhance the model’s ability to generate high-confidence outputs associated with well-defined experimental conditions and electrochemical states. Beyond phase discrimination, the GMVAE latent space also captures physically mean- ingful intensity variations. As shown in Supplementary Figure S10c, the mean spectral intensity evolves systematically along the z 2 direction, indicating that the network or- ganizes spectra not only by chemical phase but also by overall absorption amplitude. Since STXM optical density depends on both composition and sample thickness via the Beer–Lambert law, this trend suggests that the GMVAE implicitly clusters energy vectors as a function of effective sample thickness. This emergent structuring highlights the abil- ity of the model to disentangle multiple experimental factors without explicit supervision, reinforcing the robustness and physical interpretability of the latent-space representation. This combined PCC-GMVAE strategy significantly enhances the reliability and ro- bustness of phase detection under sparse spectral sampling conditions, and highlights the capability of structured deep learning models to extract physically meaningful informa- tion from reduced hyperspectral datasets. Importantly, the GMVAE does not merely act as a clustering tool, but learns a globally consistent multimodal latent representation of the spectral landscape, enabling ambiguity resolution, false-positive correction, and sta- tistically grounded phase discrimination. We demonstrate that even with only 13 energy acquisitions, this workflow reliably resolves multiphase distributions in charged NVPF particles, thereby overcoming the intrinsic trade-off between spatial resolution, acquisi- tion time, and beam-induced damage in high-resolution STXM experiments. Beyond the present case study, the GMVAE-based framework is inherently generaliz- able to other sparsely sampled hyperspectral modalities, including soft X-ray spectromi- croscopy, EELS spectrum imaging, and STEM-EDX datasets. The latent-space formalism further opens promising perspectives for multimodal data fusion, for instance through cou- pling with 4D-STEM orientation and phase mapping approaches such as automated crystal orientation mapping (ACOM). Integrating spectroscopic latent encodings with diffraction- based structural descriptors would provide deeper insight into chemo-mechanical coupling and phase transformation pathways. Future developments may involve incorporating complementary experimental tech- niques to improve quantitative accuracy [55, 56], as well as extending the architecture toward deeper neural network models integrating attention mechanisms, diffusion pri- ors, or physics-informed constraints [57, 58]. Such advances could further enhance latent space disentanglement, improve interpretability, and ultimately enable transferable AI frameworks for multimodal, beam-sensitive materials characterization. 5 Conclusion In summary, this combined PCC and GMVAE strategy improves the reliability of phase detection and highlights the potential of high spatial resolution STXM for studying bat- tery materials and other beam-sensitive systems. Notably, we show that even in the case of 12 sparse energy sampling with only thirteen data points, the workflow can resolve desodiated phases in charged NVPF particles. More broadly, this work demonstrates that meaning- ful structural and chemical information can be extracted from reduced spectral datasets while maintaining nanometer-scale spatial resolution. Using NVPF at different states of charge as a case study, we demonstrate that combining high spatial resolution imaging with deep-learning-based spectral analysis provides a robust and statistically grounded framework for interpreting STXM-XANES data acquired under limited spectral sampling conditions. The proposed PCC-GMVAE workflow not only resolves phase ambiguities but also introduces a reliability metric and a latent-space-based validation strategy that strengthen confidence in phase assignment. By explicitly accounting for cluster covariance and multimodal spectral distributions, the approach enables consistent discrimination of spectrally overlapping phases and reveals pronounced intra- and inter-particle heterogene- ity during desodiation. Beyond the specific NVPF system, this study establishes a gener- alizable AI-driven methodology for phase identification in sparsely sampled hyperspectral datasets. The framework can be extended to other energy and beam-sensitive materi- als and adapted to complementary modalities such as X-ray or electron diffraction. More broadly, it demonstrates how deep learning latent representations enable data-efficient and physically interpretable high-resolution chemical mapping under experimental constraints. 6 Methods 6.1 STXM Samples of Na x V 2 (PO 4 ) 2 F 3 (where x = 3, 2.4, 2, 1, 1− y) were analyzed using Scanning Transmission X-ray Microscopy (STXM) at the HERMES beamline of the SOLEIL syn- chrotron [59]. A Zone plate focusing optics, (50 nm outerzone width) was used with a phosphor coated photomultiplier tube (PMT) as a photon detector. The STXM chamber was maintained at 1e −5 mbar pressure during the measurements. NVPF particles collected from the scratched electrode were dispersed in ultrapure ethanol and drop-cast onto an amorphous carbon-coated TEM grid, followed by solvent evaporation at room tempera- ture. Sample preparation and handling were performed under air-protected conditions to minimize degradation induced by moisture exposure. 6.2 Optical density STXM provides access to OD images at different energies, where the OD is defined as OD = ln I 0 I (2) where I 0 is the intensity of the incident beam, and I is the transmitted beam intensity. This relationship is derived from the beer-Lambert law, expressed as I = I 0 exp (μ(E)ρd)(3) where μ(E) is the energy-dependent mass absorption coefficient, ρ is the sample den- sity, and d is the sample thickness. 6.3 Data pre-processing we started the processing of STXM data several by aligning the sequence of images, or stack, using a cross-correlation-based program called Jacobson alignment, implemented in aXis2000. This alignment step is essential to guarantee that all images are correctly reg- istered, with the same region of interest consistently located across the entire stack. The images are then converted into OD maps using the Beer-Lambert law, which relates the 13 transmitted X-ray intensity to the concentration of the absorbing species within the ma- terial. then to perform a phase mapping the method described in section ”phase mapping workflow”. 6.4 Pearson Correlation Coefficient The Pearson correlation coefficient measures the linear relationship between two variables X and Y , quantifying both strength and direction: r XY = Cov(X,Y ) σ X σ Y where Cov(X,Y ) is the covariance between X and Y , and σ X and σ Y are the standard deviations of X and Y , respectively. The coefficient ranges from−1 to 1, with 1 indicating perfect positive linear correlation, −1 indicating perfect negative linear correlation, and 0 indicating no linear relationship. 6.5 Mahalanobis distance The Mahalanobis distance measures the distance between a point x and a distribution N (μ n , Σ n ), accounting for correlations and variable scales: D M = (x−μ n ) T Σ −1 n (x−μ n ) It generalizes Euclidean distance by incorporating the covariance structure, providing a normalized multivariate measure. 6.6 Gaussian mixture variational autoencoder (GMVAE) We implemented a GMVAE in TensorFlow/Keras to model hyperspectral STXM data and extract low-dimensional representations of the underlying phases. The dataset was randomly split into training (95%), validation (2.5%), and test (2.5%) subsets to train and evaluate the model. The encoder consists of four fully connected layers with 512, 256, 128, and 64 units (ReLU activations), followed by batch normalization and dropout. It outputs the parameters of a 3-dimensional latent space together with the mixture- component probabilities defining the Gaussian mixture prior. The decoder mirrors the encoder symmetrically (64–128–256–512 units) and reconstructs the spectra through a final linear layer. The GMVAE was trained using a reconstruction loss (MSE) and KL- divergence term adapted for mixture priors. 6.7 Material synthesis The synthesis of Na 3 V 2 (PO 4 ) 2 F 3 was achieved using VPO 4 and NaF in stoechiometric amounts as precursor materials. To prepare VPO 4 , stoichiometric amounts of V 2 O 5 (Sigma-Aldrich, ≥99.6%) and NH 4 H 2 PO 4 (Sigma-Aldrich, ≥99.99%) were thoroughly mixed using a high-energy ball mill for 90 minutes. The resulting mixture was subjected to a two-step thermal treatment under a flowing Ar/H 2 (95%/5%) atmosphere to fully reduce vanadium from its +5 to +3 oxidation state: first at 300 ◦ C for 5 hours with a heating rate of 0.5 ◦ C/min, followed by a second step at 890 ◦ C for 2 hours with a heating rate of 3 ◦ C/min. After the heat treatments, the material was cooled to ambient temperature at a controlled rate of 3 ◦ C/min. 6.8 Electrochemistry Electrochemical cycling was employed to prepare Na x V 2 (PO 4 ) 2 F 3 (NVPF) samples at dif- ferent charge and discharge states. The electrodes were prepared from a mixture of NVPF (94 wt%), carbon black (3 wt%), and PVDF binder (3 wt%, Sigma-Aldrich). Sodium 14 metal was used as both the counter and reference electrode. The electrolyte consisted of 1 M NaPF 6 dissolved in a 1:1 weight ratio of ethylene carbonate (EC) and dimethyl carbonate (DMC), with 2 wt.% fluoroethylene carbonate (FEC) as an additive. A Celgard membrane was used as the separator. Cells were cycled at a rate of C/10 (corresponding to the complete exchange of Na + per e − over 10 hours) to reach specific voltages, obtain- ing samples corresponding to x = 3, 2 (at 4.0 V vs. Na + ), 1 (at 4.3 V vs. Na + ), and also attempting to reach x < 1, stopping at x = 1− y (at 4.75 V vs. Na + ). 6.9 Code availability The STXM phase mapping Python code is open source and available on GitHub at: https://github.com/Image-DataScience-Team-LRCS/stxm-phase-mapping 6.10 Acknowledgements This work was supported by RS2E network (Reseau Fran ̧cais sur le Stockage Electrochim- ique de l’Energie) and the French National Research Agency under the France 2030 program (Grant ANR-22-PEBA-0002, PEPR Batteries). AD, MB, NF, FC and FA ac- knowledge access to beamtime and technical support from the HERMES beamline at the SOLEIL Synchrotron, France. CP and LC thank the European Union’s Horizon 2020 research and innovation program under grant agreement N 875629 (NAIMA project) for funding. 15 References [1] Q. Hassan et al. “The renewable energy role in the global energy transformations”. In: Renewable Energy Focus 48 (2024), p. 100545. doi: 10.1016/j.ref.2024. 100545. [2] B. Mathiesen et al. “Smart energy systems for coherent 100% renewable energy and transport solutions”. In: Applied Energy 145 (2015), p. 139–154. doi: 10.1016/j. apenergy.2015.01.075. [3] J. M. Tarascon and M. Armand. “Issues and challenges facing rechargeable lithium batteries”. In: Nature 414 (2001), p. 359–367. doi: 10.1038/35104644. [4] N. Nitta et al. “Li-ion battery materials: present and future”. In: Materials Today 18.5 (2014), p. 252–264. doi: 10.1016/j.mattod.2014.10.040. [5] H. Vikstr ̈om, S. Davidsson, and M. H ̈o ̈ok. “Lithium availability and future pro- duction outlooks”. In: Applied Energy 110 (2013), p. 252–266. doi: 10.1016/j. apenergy.2013.04.005. [6] T. Jaradat and T. Khatib. “A review of battery energy storage system for renewable energy penetration in electrical power system: environmental impact, sizing methods, market features, and policy frameworks”. In: Future Batteries (2025), p. 100106. doi: 10.1016/j.fub.2025.100106. [7] S. Shannak, L. Cochrane, and D. Bobarykina. “Strategic analysis of metal depen- dency in the transition to low-carbon energy: A critical examination of nickel, cobalt, lithium, graphite, and copper scarcity using IEA future scenarios”. In: Energy Re- search & Social Science 118 (2024), p. 103773. doi: 10.1016/j.erss.2024.103773. [8] K. Deshmukh et al. “Sodium-ion batteries: state-of-the-art technologies and future prospects”. In: Journal of Materials Science (2025). doi: 10.1007/s10853-025- 10671-6. [9] G. Yan et al. “Higher energy and safer sodium ion batteries via an electrochemi- cally made disordered Na 3 V 2 (PO 4 ) 2 F 3 material”. In: Nature Communications 10.1 (2019). doi: 10.1038/s41467-019-08359-y. [10] P. Wang et al. “High-rate stability of Na 3 V 2 (PO 4 ) 2 F 3 sodium-ion cathode materi- als enabled by entropy-increasing strategy”. In: Journal of Materials Chemistry A (2025). doi: 10.1039/d5ta02224j. [11] R. Essehli et al. “Temperature-dependent battery performance of a Na 3 V 2 (PO 4 ) 2 F 3 MWCNT cathode and in-situ heat generation on cycling”. In: ChemSusChem 13.18 (2020), p. 5031–5040. doi: 10.1002/cssc.202001268. [12] C. Sun et al. “Achieving high-performance Na 3 V 2 (PO 4 ) 2 F 3 cathode material through a bifunctional N-doped carbon network”. In: ACS Applied Materials & Interfaces 16.27 (2024), p. 35179–35189. doi: 10.1021/acsami.4c06830. [13] N. Pianta, D. Locatelli, and R. Ruffo. “Cycling properties of Na 3 V 2 (PO 4 ) 2 F 3 as positive material for sodium-ion batteries”. In: Ionics 27 (2021), p. 1853–1860. doi: 10.1007/s11581-021-04015-y. [14] J. J. Huang et al. “Disorder dynamics in battery nanoparticles during phase transi- tions revealed by operando single-particle diffraction”. In: Advanced Energy Materi- als 12.12 (2022). doi: 10.1002/aenm.202103521. [15] N. Yabuuchi et al. “Research development on sodium-ion batteries”. In: Chemical Reviews 114.23 (2014), p. 11636–11682. doi: 10.1021/cr500192f. [16] M. Li et al. “Low-temperature performance of Na-ion batteries”. In: Carbon Energy 6.10 (2024). doi: 10.1002/cey2.546. 16 [17] Thibault Broux et al. “High Rate Performance for Carbon-Coated Na 3 V 2 (PO 4 ) 2 F 3 in Na-Ion Batteries”. In: Small Methods 3.4 (2018). doi: 10.1002/smtd.201800215. [18] M. Bianchini et al. “Comprehensive investigation of the Na 3 V 2 (PO 4 ) 2 F 3 –NaV 2 (PO 4 ) 2 F 3 system by operando high resolution synchrotron X-ray diffraction”. In: Chemistry of Materials 27.8 (2015), p. 3009–3020. doi: 10.1021/acs.chemmater.5b00361. [19] J. T. Mefford et al. “Correlative operando microscopy of oxygen evolution electrocat- alysts”. In: Nature 593.7857 (2021), p. 67–73. doi: 10.1038/s41586-021-03454-x. [20] J. Lim et al. “Origin and hysteresis of lithium compositional spatiodynamics within battery primary particles”. In: Science 353.6299 (2016), p. 566–571. doi: 10.1126/ science.aaf4914. [21] N. Ohmer et al. “Phase evolution in single-crystalline LiFePO 4 followed by in situ scanning X-ray microscopy of a micrometre-sized battery”. In: Nature Communica- tions 6 (2015), p. 6045. doi: 10.1038/ncomms7045. [22] M. Yoo et al. “A tailored oxide interface creates dense Pt single-atom catalysts with high catalytic activity”. In: Energy & Environmental Science 13.4 (2020), p. 1231– 1239. doi: 10.1039/c9e03492g. [23] A. B. Askari et al. “In Situ X-ray Microscopy Reveals Particle Dynamics in a NiCo Dry Methane Reforming Catalyst under Operating Conditions”. In: ACS Catalysis 10.11 (2020), p. 6223–6230. doi: 10.1021/acscatal.9b05517. [24] E. de Smit, I. Swart, J. Creemer, et al. “Nanoscale chemical imaging of a work- ing catalyst by scanning transmission X-ray microscopy”. In: Nature 456 (2008), p. 222–225. doi: 10.1038/nature07516. [25] J. Kim et al. “Energy material analysis via in-situ/operando scanning transmis- sion X-ray microscopy: A review”. In: Journal of Electron Spectroscopy and Related Phenomena 266 (2023), p. 147337. doi: 10.1016/j.elspec.2023.147337. [26] I. Temprano et al. “Advanced methods for characterizing battery interfaces: Towards a comprehensive understanding of interfacial evolution in modern batteries”. In: Energy Storage Materials 73 (2024), p. 103794. [27] A. P. Hitchcock. “Analysis of X-ray images and spectra (aXis2000): A toolkit for the analysis of X-ray spectromicroscopy data”. In: Journal of Electron Spectroscopy and Related Phenomena 266 (2023), p. 147360. doi: 10.1016/j.elspec.2023.147360. [28] M. A. Marcus. “Data analysis in spectroscopic STXM”. In: Journal of Electron Spectroscopy and Related Phenomena 264 (2023), p. 147310. doi: 10 . 1016 / j . elspec.2023.147310. [29] M. Lerotic et al. “MANTiS: a program for the analysis of X-ray spectromicroscopy data”. In: Journal of Synchrotron Radiation 21.5 (2014), p. 1206–1212. doi: 10. 1107/S1600577514013964. [30] H. Umemoto et al. “Stain-free mapping of polymer-blend morphologies via appli- cation of high-voltage STEM-EELS hyperspectral imaging to low-loss spectra”. In: Polymer Journal 55.9 (2023), p. 997–1006. doi: 10.1038/s41428-023-00786-5. [31] D. A. Santos et al. “Multivariate hyperspectral data analytics across length scales to probe compositional, phase, and strain heterogeneities in electrode materials”. In: Patterns (N Y) 3.12 (2022), p. 100634. doi: 10.1016/j.patter.2022.100634. [32] M. Zhu et al. “Deep learning-based spectral unmixing for biomedical hyperspectral imaging”. In: Computers in Biology and Medicine 165 (2023), p. 107446. doi: 10. 1016/j.compbiomed.2023.107446. 17 [33] L. Zhang et al. “Spectral reconstruction and phase mapping in sparse hyperspectral datasets using deep generative models”. In: Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 315 (2024), p. 123456. doi: 10.1016/j.saa.2024. 123456. [34] K. Pearson. “Mathematical contributions to the theory of evolution”. In: Philosoph- ical Transactions of the Royal Society A 187 (1896), p. 253–318. [35] N. Dilokthanakul et al. “Deep Unsupervised Clustering with Gaussian Mixture Vari- ational Autoencoders”. In: arXiv preprint arXiv:1611.02648 (2017). [36] D. P. Kingma and M. Welling. “Auto-Encoding Variational Bayes”. In: arXiv preprint arXiv:1312.6114 (2013). [37] S. J. Wetzel. “Unsupervised learning of phase transitions: from principal component analysis to variational autoencoders”. In: Physical Review E 96 (2017), p. 022140. doi: 10.1103/PhysRevE.96.022140. [38] M. Banko et al. “Machine learning in materials science: a review of recent applica- tions”. In: Journal of Materials Science 56 (2021), p. 10201–10233. [39] A. Biswas, M. Ziatdinov, and S. V. Kalinin. “Combining variational autoencoders and physical bias for improved microscopy data analysis”. In: Machine Learning: Science and Technology (2023). doi: 10.1088/2632-2153/acf6a9. [40] Z. Shabeeb et al. “Learning the diffusion of nanoparticles in liquid phase TEM via physics-informed generative AI”. In: Nature Communications (2025). doi: 10.1038/ s41467-025-61632-1. [41] M. Calvat et al. “Learning metal microstructural heterogeneity through spatial map- ping of diffraction latent space features”. In: npj Computational Materials (2025). doi: 10.1038/s41524-025-01770-8. [42] M. Abbate et al. “Soft X-ray absorption spectroscopy of vanadium oxides”. In: Jour- nal of Electron Spectroscopy and Related Phenomena 62.1–2 (1993), p. 185–195. doi: 10.1016/0368-2048(93)80014-D. [43] J. Suntivich et al. “High-throughput STXM for energy materials”. In: Journal of Physical Chemistry Letters 5 (2014), p. 781–786. doi: 10.1021/jz500012d. [44] T. T. Fister et al. “Deconvolving instrumental and intrinsic broadening in core-shell x-ray spectroscopies”. In: Journal of Synchrotron Radiation 14 (2007), p. 353–361. doi: 10.1107/S0909049507010933. [45] L. Wang et al. “Correlative STXM and 4D-STEM for nanoscale mapping of battery electrodes”. In: Advanced Functional Materials 31.41 (2021), p. 2105587. doi: 10. 1002/adfm.202105587. [46] L. Wang et al. “High-resolution mapping of local strain in cathode materials using 4D-STEM”. In: Ultramicroscopy 251 (2023), p. 113236. doi: 10.1016/j.ultramic. 2023.113236. [47] E. F. Rauch and M. V ́eron. “Automated crystal orientation and phase mapping in TEM by precession diffraction”. In: Ultramicroscopy 127 (2013), p. 92–102. doi: 10.1016/j.ultramic.2012.07.018. [48] N. Cautaerts et al. “Free, flexible and fast: orientation mapping using the multi- core and GPU-accelerated template matching capabilities in the python-based open source 4D-STEM analysis toolbox pyxem”. In: Ultramicroscopy 234 (2022), p. 113453. doi: 10.1016/j.ultramic.2021.113453. [49] C. Ophus, S. E. Zeltmann, B. H. Savitzky, et al. “Automated crystal orientation mapping in py4DSTEM using sparse correlation matching”. In: Microscopy and Mi- croanalysis 27.3 (2021), p. 612–624. doi: 10.1017/S1431927621000275. 18 [50] N. Folastre, J. Cao, G. Oney, et al. “Improved ACOM pattern matching in 4D- STEM through adaptive sub-pixel peak detection and image reconstruction”. In: Scientific Reports 14 (2024), p. 12385. doi: 10.1038/s41598-024-63060-5. [51] Q. Jacquet et al. “Operando microimaging of crystal structure and orientation in all components of all-solid-state-batteries”. In: Nature Communications (2025). [52] M. Akhtar et al. “A novel solid-state synthesis route for high voltage Na 3 V 2 (PO 4 ) 2 F 3−2y O 2y cathode materials for Na-ion batteries”. In: Journal of Materials Chemistry A 11.46 (2023), p. 25650–25661. doi: 10.1039/d3ta04239a. [53] M. Atoum et al. “Multivariate analysis in materials science”. In: Journal of Applied Statistics 46 (2019), p. 1–22. [54] V. Zhelezniak et al. “Data-driven methods for material characterization”. In: Journal of Computational Materials Science 162 (2019), p. 19–30. [55] V. B. Olmos et al. “Multimodal correlative microscopy for isotopically resolved sub- 15 nm chemical imaging of phase-separated polymer blends”. In: Surfaces and In- terfaces 84 (2026), p. 108606. doi: 10.1016/j.surfin.2026.108606. [56] L. Su et al. “Multiscale operando X-ray investigations provide insights into battery electrode behavior”. In: Journal of Power Sources 503 (2021), p. 230071. doi: 10. 1016/j.jpowsour.2021.230071. [57] S. Hajimiri, Amir Lotfi, and Mahdieh Soleymani Baghshah. “Semi-Supervised Dis- entanglement of Class-Related and Class-Independent Factors in VAE”. In: arXiv preprint arXiv:2102.00892 (2021). doi: 10.48550/arXiv.2102.00892. [58] M. B. Rocha and R. A. Krohling. “VAE-GNA: a variational autoencoder with Gaus- sian neurons in the latent space and attention mechanisms”. In: Knowledge and In- formation Systems 66.10 (2024), p. 6415–6437. doi: 10.1007/s10115-024-02169- 5. [59] R. Belkhou et al. “Characterization of electrode materials using synchrotron tech- niques”. In: Journal of Synchrotron Radiation 22 (2015), p. 1019–1027. doi: 10. 1107/S1600577515004690. 19 Supporting Information – AI-Driven Phase Identification from X-ray Hyperspectral Imaging of cycled Na-ion Cathode Materials Fay ̧cal Adrar 1,2,4 , Nicolas Folastre 1,2,4 , Chlo ́e Pablos 1,2,6 , Stefan Stanescu 3 , Sufal Swaraj 3 , Raghvender Raghvender 1,2 , Fran ̧cois Cadiou 1 , Laurence Croguennec 2,4,6 , Matthieu Bugnet 5,# , and Arnaud Demorti`ere 1,2,4,# 1 Laboratoire de R ́eactivit ́e et de Chimie des Solides (LRCS), CNRS UMR 7314, Universit ́e de Picardie Jules Verne, Hub de l’Energie, 15 Rue Baudelocque, Amiens, France 2 R ́eseau sur le Stockage Electrochimique de l’Energie (RS2E), CNRS FR 3459, Hub de l’Energie, 15 Rue Baudelocque, Amiens, France 3 Synchrotron SOLEIL, L’Orme des Merisiers, 91190 Saint-Aubin, France 4 ALISTORE-European Research Institute, CNRS FR 3104, Hub de l’Energie, Rue Baudelocque, Amiens, France 5 CNRS, INSA Lyon, Universit ́e Claude Bernard Lyon 1, MATEIS, UMR 5510, 69621 Villeurbanne, France 6 Univ. Bordeaux, CNRS, Bordeaux INP, ICMCB, UMR 5026, F-33600 Pessac, France 1 corresponding author: matthieu.bugnet@cnrs.fr, arnaud.demortiere@cnrs.fr March 10, 2026 1 1 Supplementary Figures Laboratory powder X-ray diffraction (PXRD) patterns were recorded on a PANalytical X’Pert 3 diffractometer in Debye–Scherrer θ–θ geometry. Routine acquisitions were per- formed on powders packed in 0.5 m diameter capillaries over a 2θ range of 10–80 ◦ with a step size of 0.0099 ◦ . The diffractometer was equipped with a Cu Kα 1,2 X-ray source. Full- pattern matching refinements were carried out using the Le Bail method as implemented in Jana2006 [1]. Figure S1: On the right, the NVPF electrochemical curves corresponding to each state of charge investigated in this study. On the left, the associated X-ray diffraction (XRD) patterns collected at the same states of charge, highlighting the structural evolution during (de)sodiation. 2 Figure S2: Le Bail refinements of (a) Na 3 V 2 (PO 4 ) 2 F 3 in the orthorhombic Amam space group, (b) Na 2 V 2 (PO 4 ) 2 F 3 in the tetragonal I4/m space group, (c) Na 1 V 2 (PO 4 ) 2 F 3 in the orthorhombic Cmc2 1 space group. (d) Na 2 .4V 2 (PO 4 ) 2 F 3 a mixture of phases belonging to the orthorhombic Amam space group and the tetragonal I4/m space group, and (e) Na 1 − yV 2 (PO 4 ) 2 F 3 in the orthorhombic Cmc2 1 space group 3 Figure S3: a) Reliability maps. b) Combined reliability map with the phase map. c–g) Correlation maps obtained for Na 2.0 VPF. The scale bars correspond to 0.2μm . 4 Figure S4: Trigonal diagram showing the correlation between the reference vectors using Pearson correlation coefficient. 5 Figure S5: Gaussian Mixture Variational AutoEncoder (GMVAE) architecture. Figure S6: Latent space distribution of ten NVPF particles at different states of charge. a) Latent space labeled according to the phase distribution, obtained from Pearson corre- lation comparison with the reference spectra. b) The projection of the Na 1 VPF particle in figure 4. The Pearson correlation quantifies linear similarity between an observed signal and a reference pattern. This approach implicitly assumes that the data can be well described as a scaled and shifted version of the reference. However, in the presence of mixed sig- nals, overlapping phases, or nonlinear spectral distortions, this assumption does not hold. A high correlation value may therefore reflect partial similarity in dominant features or shared variance driven by hidden factors, rather than true phase identity. The projection of the same region into the GMVAE latent space, as shown in Figure S6, revealed that these pixels did not group with the cluster associated with the pristine phase. Unlike correla- tion analysis performed in the original observation space, the GMVAE learns a structured latent representation that captures multimodality and nonlinear relationships in the data. Phase assignment is then based on probabilistic clustering within this learned manifold, rather than on linear similarity to a single reference pattern. The latent space representa- 6 tion showed that the region occupied an intermediate position between clusters, indicating that the original classification resulted from a linear projection artifact. Consequently, we reclassified this area as an ambiguous region rather than a confidently assigned pristine phase. This ambiguity was subsequently resolved using the GMVAE latent space repre- sentation, which provides phase discrimination grounded in the full probabilistic structure of the data and reduces false positive assignments arising from correlation-based analysis. For completeness, the reliability maps derived from the Pearson correlation analysis are included here (Figure S7c the red region) to illustrate the origin of the initial misclas- sification, while the final resolved phase map obtained from the GMVAE framework is presented in the main Figure 5. Figure S7: (a–d) Reliability maps corresponding to each sample represented in Figure 5, where pixel brightness encodes the confidence in the phase assignment. Figure S8: Correlation maps of the individual sodium phases associated with the particles presented in Figure 5, from Na 1-y VPF on the left to Na 3 VPF on the right (as indicated above). The expected composition is shown to the left of the corresponding particle. 7 Figure S9: The reconstructed X-ray absorption spectra obtained using the decoder only architecture from the projected points in the latent space, spanning the Na 3 VPF phase to the Na 1-y VPF phase. Figure S10: Latent space dimension analysis: (a) K-means clustering of spectra, with each color representing a different cluster. (b) Latent space representation labeled by Pearson correlation, including ambiguous points. (c) Mean intensity of the original spectra along a trajectory following the −z 2 direction in the latent space. Original and reconstructed spectra for each identified phase: (d) Na 3.0 VPF region, (e) Na 2.0 VPF region, (f) Na 1.0 VPF region, (g) Na 1−y VPF region. The K-means clustering applied in the latent space reveals distinct clusters, which pri- marily reflect variations in spectral intensity. In Figure S10a, the identification is based on intensity, with each color corresponding to nearly the same intensity. By calculating the average intensity of each spectrum (each point represents a spectrum) from bottom to top along the −z 2 direction, we observe an almost linear increase (Figure S10c). According to the Beer-Lambert law, this trend is directly related to thickness, the thicker the mate- rial, the more it absorbs, and the higher the spectral intensity. The different phases are 8 encoded along the remaining latent space axes, as evidenced by the reconstructed spec- tra corresponding to each cluster (Figure S10d–g). Each reconstructed spectrum closely matches the original (experimental spectrum), demonstrating that the latent space effec- tively separates the phases while simultaneously capturing intensity (thickness) variations along z 2 . This visualization highlights the dual functionality of the latent space, one axis (z 2 ) tracks intensity or thickness variations, whereas the other axes encode phase specific spectral features. Figure S11: Evolution of the mean squared error and Kullback–Leibler losses for the VAE across the training and validation as a function of the epoch, shown on a logarithmic scale. 9 2 Algorithms Algorithm 1: Phase Mapping using Pearson Correlation with Ambiguity Track- ing Input: Reference set R =r m M m=1 , image set I =I n N n=1 , correlation threshold δ = 0.005 Output: Phase map P , reliability map L, ambiguity mask A, ambiguous pairs P amb 1 for each image I n ∈ I do 2Convert I n to vector v n ∈R p 3 Construct X ∈R p×N such that X[i,n] = v n [i] 4 for each reference r m ∈ R do 5Compute correlations C m [i] using Pearson coefficient: 6 C m [i] = P N n=1 (r m [n]− ̄r m )(X[i,n]− ̄ X i ) q P N n=1 (r m [n]− ̄r m ) 2 q P N n=1 (X[i,n]− ̄ X i ) 2 7 for each pixel i do 8Let c i = [C 1 [i],...,C M [i]] 9Find best and second-best correlations and their indices 10Compute reliability L[i] = best− second-best 11Mark ambiguity A[i] = 1 if L[i] < δ 12Store ambiguous pair (i, best phase, second phase) 13 return P,L,A,P amb Algorithm 2: Training GMVAE on XANES vectors using ADAM Input: Data X =x i N i=1 ⊂R d , batch size B = 128, learning rate η = 10 −3 , KL weight β = 10 −5 , dropout p drop = 0.1, Gaussian components K = 5, number of training epochs T = 400 Output: Trained GMVAE parameters (φ,θ) 1 for epoch = 1 to T do 2Sample a batch X B ⊂ X of size B 3 for each x i ∈ X B do 4Encode: z mean ,z logvar = f φ (x i ) 5Reparameterize: z i = z mean + exp(0.5z logvar )⊙ ε, ε∼N (0,I) 6Decode: ˆx i = g θ (z i ) 7Compute batch loss: L B = 1 B X i∈X B ∥x i − ˆx i ∥ 2 2 |z reconstruction +β D KL q φ (z i |x i )∥ K X k=1 π k N (z i |μ k , Σ k ) |z GM prior KL Update (φ,θ) using ADAM optimizer with learning rate η 8 return Trained GMVAE (φ,θ) References [1] V ́aclav Petˇr ́ıˇcek, Michal Duˇsek, and Luk ́aˇs Palatinus. “Crystallographic Computing System JANA2006: General features”. In: Zeitschrift f ̈ur Kristallographie – Crys- talline Materials 229.5 (2014), p. 345–352. doi: 10.1515/zkri-2014-1737. 10