← Back to papers

Paper deep dive

FedLECC: Cluster- and Loss-Guided Client Selection for Federated Learning under Non-IID Data

Daniel M. Jimenez-Gutierrez, Giovanni Giunta, Mehrdad Hassanzadeh, Aris Anagnostopoulos, Ioannis Chatzigiannakis, Andrea Vitaletti

Year: 2026Venue: arXiv preprintArea: cs.DCType: PreprintEmbeddings: 35

Abstract

Abstract:Federated Learning (FL) enables distributed Artificial Intelligence (AI) across cloud-edge environments by allowing collaborative model training without centralizing data. In cross-device deployments, FL systems face strict communication and participation constraints, as well as strong non-independent and identically distributed (non-IID) data that degrades convergence and model quality. Since only a subset of devices (a.k.a clients) can participate per training round, intelligent client selection becomes a key systems challenge. This paper proposes FedLECC (Federated Learning with Enhanced Cluster Choice), a lightweight, cluster-aware, and loss-guided client selection strategy for cross-device FL. FedLECC groups clients by label-distribution similarity and prioritizes clusters and clients with higher local loss, enabling the selection of a small yet informative and diverse set of clients. Experimental results under severe label skew show that FedLECC improves test accuracy by up to 12%, while reducing communication rounds by approximately 22% and overall communication overhead by up to 50% compared to strong baselines. These results demonstrate that informed client selection improves the efficiency and scalability of FL workloads in cloud-edge systems.

Tags

ai-safety (imported, 100%)csdc (suggested, 92%)preprint (suggested, 88%)

Links

PDF not stored locally. Use the link above to view on the source site.

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 95%

Last extracted: 3/13/2026, 12:56:18 AM

Summary

FedLECC is a novel client selection strategy for Federated Learning in cloud-edge environments that addresses non-IID data challenges by combining label-distribution-based clustering with loss-guided prioritization. By selecting a diverse and informative subset of clients, FedLECC improves test accuracy by up to 12% and reduces communication overhead by up to 50% compared to existing baselines.

Entities (5)

FedLECC · algorithm · 100%Federated Learning · field-of-study · 100%Hellinger distance · metric · 95%Non-IID Data · problem · 95%OPTICS · clustering-algorithm · 95%

Relation Signals (3)

FedLECC addresses Non-IID Data

confidence 95% · FedLECC is a client selection strategy designed for cross-device FL under severe label skew.

FedLECC uses Hellinger distance

confidence 95% · FedLECC groups clients based on similarity, via Hellinger distance (HD)

FedLECC uses OPTICS

confidence 95% · We evaluate several clustering techniques... Among them, OPTICS consistently provides the best trade-off

Cypher Suggestions (2)

Find all algorithms that address the Non-IID data problem · confidence 90% · unvalidated

MATCH (a:Algorithm)-[:ADDRESSES]->(p:Problem {name: 'Non-IID Data'}) RETURN a.name

List metrics used by FedLECC · confidence 90% · unvalidated

MATCH (a:Algorithm {name: 'FedLECC'})-[:USES]->(m:Metric) RETURN m.name

Full Text

34,473 characters extracted from source content.

Expand or collapse full text

FedLECC: Cluster- and Loss-Guided Client Selection for Federated Learning under Non-IID Data Daniel M. Jimenez-Gutierrez 1 , Giovanni Giunta 1 , Mehrdad Hassanzadeh 1 , Aris Anagnostopoulos 1 , Ioannis Chatzigiannakis 1 , Andrea Vitaletti 1 1 Sapienza University of Rome Via Ariosto 25 Rome, 00185, Italy jimenezgutierrez@diag.uniroma1.it, giovanni.giunta@corner.ch, hassanzadeh.1961575@studenti.uniroma1.it, aris@diag.uniroma1.it, ichatz@diag.uniroma1.it, vitaletti@diag.uniroma1.it Abstract—Federated Learning (FL) enables distributed Artifi- cial Intelligence (AI) across cloud–edge environments by allowing collaborative model training without centralizing data. In cross- device deployments, FL systems face strict communication and participation constraints, as well as strong non-independent and identically distributed (non-IID) data that degrades conver- gence and model quality. Since only a subset of devices (a.k.a clients) can participate per training round, intelligent client selection becomes a key systems challenge. This paper proposes FedLECC (Federated Learning with Enhanced Cluster Choice), a lightweight, cluster-aware and loss-guided client selection strategy for cross-device FL. FedLECC groups clients by label- distribution similarity and prioritizes clusters and clients with higher local loss, enabling the selection of a small yet informative and diverse set of clients. Experimental results under severe label skew show that FedLECC improves test accuracy by up to 12%, while reducing communication rounds by approximately 22% and overall communication overhead by up to 50% compared to strong baselines. These results demonstrate that informed client selection improves the efficiency and scalability of FL workloads in cloud–edge systems. Index Terms—Federated Learning, Client Selection, Non-IID Data, Cloud-Edge Systems I. INTRODUCTION The proliferation of Internet of Things (IoT) and edge devices has enabled a new class of distributed Artificial Intelligence (AI) applications spanning cloud–edge environ- ments [1]. These applications include predictive mainte- nance [2] and anomaly detection [3], where data is generated at the network edge and must be processed under strict latency, bandwidth, and privacy constraints. Centralizing such data in the cloud is often impractical due to communication overhead and regulatory concerns. Federated learning (FL) has emerged as a key enabler for distributed AI across cloud–edge infrastructures, allowing collaborative model training without moving raw data off- device [4]. In practice, however, FL systems operate under significant systems and networking constraints: only a subset of edge devices, i.e. clients, can participate in each training round due to limited bandwidth, energy budgets, straggler effects, and heterogeneous device capabilities. As a result, intelligent client selection becomes a central challenge in deploying FL at scale. A major complication in cross-device FL is non-independent and identically distributed (non-IID) data, also known as data or statistical heterogeneity, which causes client updates to diverge and may slow convergence or degrade model quality [5]. Among different forms of non-IID data, label skew - where clients hold disjoint or highly imbalanced label distributions - has been shown to be particularly harmful, leading to strong client drift and unstable aggregation [6], [7]. For this reason, this work focuses on label-based non- IID data. Among the different forms of non-IID data, label skew is widely recognized as one of the most challenging scenarios in FL, as it induces strong client drift and unstable aggregation. Such conditions naturally arise in cloud-edge deployments, where clients capture localized events or user- specific behaviors. In these settings, naive client selection strategies may waste communication resources on redundant or low-impact updates, motivating the need for intelligent selec- tion mechanisms tailored to heterogeneous data distributions. Beyond non-IID data considerations, these challenges high- light a broader cloud–edge selection problem: how to select a small set of clients that provides both informative and diverse updates while respecting communication and partici- pation constraints. Prior work has shown that uniform random sampling is often suboptimal in such environments [8], moti- vating the need for intelligent, system-aware client selection mechanisms. Personalized FL (PFL) further emphasizes the role of client selection under non-IID data. Tan et al. [9] distinguish between regularization-based methods and selection-based strategies, the latter explicitly controlling which clients participate in training. Our work falls into this selection-based category, with a focus on scalable selection for cross-device FL systems. In this paper, we propose FedLECC (Federated Learn- ing with Enhanced Cluster Choice), a lightweight, cluster- aware, loss-guided client selection strategy for cloud–edge FL. FedLECC groups clients based on similarity, via Hellinger dis- tance (HD) [10], in label distributions and prioritizes clusters and clients associated with higher local loss, i.e., edge devices arXiv:2603.08911v1 [cs.DC] 9 Mar 2026 where the current global model underperforms. By jointly enforcing diversity, through clustering, and informativeness (through loss-guided selection), FedLECC improves learning efficiency while significantly reducing communication over- head in highly non-IID settings. A. Motivation To address these challenges in a systematic manner, we next formalize the motivation and research questions that guide the design of FedLECC. Deploying FL as a cloud–edge service requires balancing learning performance with system- level efficiency [11]. Under non-IID data, local updates are biased toward client-specific distributions, which can slow convergence and lead to suboptimal global models [12]. These effects are amplified in large-scale cross-device deployments, where only a fraction of clients can be selected per round. Consequently, effective client selection is essential to avoid wasting communication resources on redundant or low-impact updates. This work investigates whether combining clustering and loss-based client selection can improve both model perfor- mance and communication efficiency for federated workloads in cloud–edge environments, addressing the following research questions: 1) RQ1: To what extent can FedLECC improve test accu- racy under severe label skew in cross-device FL? 2) RQ2: How do clustering-based diversity control and loss-guided prioritization contribute to FedLECC’s per- formance gains? 3) RQ3: How much can FedLECC reduce communication rounds and total communication overhead compared to state-of-the-art client selection strategies? B. Contributions The main contributions of this paper are: 1) We propose FedLECC, an intelligent, cluster-aware client selection strategy for FL in cloud–edge environ- ments where clients observe highly non-IID data. 2) We show that by selecting a very limited but suitably selected number of informative edge devices can lead to significant improvements of learning efficiency while also drastically reducing communication costs. 3) Through extensive experiments under severe label skew, we demonstrate that FedLECC improves accuracy by up to 12% while reducing communication rounds by about 22% and overall communication overhead by up to 50% compared to strong baselines. I. RELATED WORK Non-IID data, particularly label skew, is a major chal- lenge in FL, leading to slow convergence and degraded model accuracy when aggregating client updates [7], [12]. In PFL, two main classes of approaches address non-IID data: regularization-based methods, which modify local objectives, and selection-based methods, which prioritize specific clients during training [9]. A. Regularization-based Solutions Several works mitigate non-IID data by regularizing local training objectives. FedProx [13] extends FedAvg by introduc- ing a proximal term that constrains local updates, improving robustness under non-IID data. FedNova [14] normalizes local updates to account for heterogeneous computation and varying numbers of local steps, reducing bias introduced by stragglers. FedDyn [15] introduces a dynamic regularization term that corrects client drift over time, achieving improved convergence under non-IID data. While effective, these approaches do not explicitly address which clients should participate in each training round, nor do they exploit data diversity through structured client selection. B. Selection-based Solutions Client selection has emerged as a complementary strat- egy to address non-IID data and communication constraints. HACCS [16] clusters clients based on label histograms and selects latency-efficient clients from each cluster, improving robustness to stragglers. FedCLS [17] leverages group label information and Hamming distance to guide client selection, achieving faster convergence than random sampling. Fed- Cor [18] models inter-client correlations using Gaussian Pro- cesses and selects clients based on their predicted contribution to global accuracy. Loss-based selection has also proven effective. Cho et al. [19] showed that prioritizing clients with higher local loss accelerates convergence and improves accuracy, leading to the Power-of-Choice (POC) strategy. Our work differs from prior selection-based approaches by jointly combining clustering and loss-guided selection. FedLECC first structures the client population by label- distribution similarity and then prioritizes high-loss partici- pants within each group, jointly promoting informative updates and diversity across suitably selected clients. This enables more targeted client participation, improving learning perfor- mance and significantly reducing communication cost under severe label skew. I. PROBLEM FORMULATION We follow the standard FL setting introduced in prior work [19]. Consider a system with K clients, where each client i holds a local dataset B i of size N i , and datasets are disjoint across clients. Let N = P K i=1 N i denote the total number of data samples. Each client trains a local model parameterized byθ i . Given a sample ξ, the per-sample loss is denoted by f (θ,ξ). The local empirical loss of client i is defined as: ℓ i (θ i ) = 1 N i X ξ∈B i f (θ i ,ξ).(1) The objective of FL is to learn a global modelθ that minimizes the aggregated empirical risk: ℓ(θ) = K X i=1 p i ℓ i (θ i ),(2) Client 1 Client 5 Client 9 Client 2 Client 3 Client 6 Client 7 Client 10 Client 11 Client 4 Client 8 Client 푲 Client 1 Client 5 Client 9 Hellinger distance (HD) Client 1 Client 9 Client 2 Client 3Client 4 Client 6 Client 7Client 8 Client 10 Client 11 Client K Client 풎 Cluster 1 Cluster 2 Cluster 푱 풎풂풙 Cluster 1 Cluster 푱 ... ... High loss ... 1. Quantify labels heterogeneity2. Cluster clients 3. Select clients Fig. 1. High-level client selection process in FedLECC. where p i = N i /N reflects the relative data contribution of client i. A. Biased Client Selection FL proceeds in communication rounds coordinated by a central server. At each round, only a subset of m ≪ K clients is selected to participate based on the energy and latency constraints of the clients while aiming to maintain learning accuracy and reduce communication cost. Rather than selecting clients uniformly at random, biased client selection assigns higher participation probability to clients that are expected to provide more informative updates [19]. Such strategies are particularly important under non-IID data, where client updates may vary significantly in usefulness. B. Clustering-based Client Selection In this work, we consider a clustering-based client selection strategy tailored to label-skewed FL settings, since label-based non-IID data is known to be the most detrimental form of non- IID data in FL. Clients are first grouped according to similarity in their label distributions. At each round, a subset of clusters is selected, and clients within these clusters are sampled based on their local loss values. Concretely, the selection process consists of: 1) Grouping clients into clusters based on label-distribution similarity. 2) Prioritizing clusters associated with higher average loss. 3) Selecting a fixed number of clients per chosen cluster. This formulation enables controlling both diversity, via clustering, and informativeness, via loss-based selection, pro- viding the foundation for the proposed FedLECC strategy. Remark that although the clients share information regarding the observed label distributions it does not constitute a leak of private information if existing, practical, privacy-preserving techniques such as Differential Privacy [20] or secure multi- party computation [21] are integrated into the client selection pipeline. IV. PROPOSED FEDLECC STRATEGY FedLECC is a client selection strategy designed for cross- device FL under severe label skew. As illustrated in Fig. 1, FedLECC operates in three stages: (1) Quantifying client non- IID data, (2) Clustering clients with similar label distributions, and (3) Selecting informative clients from representative clus- ters guided by the local empirical loss of each client. This design explicitly balances diversity and informativeness, which are both critical for effective learning under non-IID data. In practice, each client communicates its normalized label histogram to the server once, or whenever significant changes in the local data distribution occur. This information exchange is lightweight, as it scales with the number of labels rather than the dataset size, and can be amortized across many training rounds. No raw data or per-sample information is shared, preserving the decentralized nature of FL. A. Non-IID data Quantification via Label Distributions FedLECC starts by characterizing non-IID data across clients using their label distributions. Each client i provides the server with an aggregated histogram of labels, which does not reveal individual samples. The server computes pairwise distances between clients using the HD, a bounded and symmetric metric well suited for comparing probability distributions. Alternative distance definitions based on feature distri- butions conditioned on labels were evaluated, but label- distribution distances consistently yielded better clustering quality and downstream performance. Recall that FedLECC focuses on label-based non-IID data, which is known to be the most detrimental form of non-IID data in FL. B. Clustering Clients Using the pairwise HD matrix, clients are grouped into clusters of similar label distributions. We evaluate several clustering techniques, including DBSCAN, k-medoids, and OPTICS. Among them, OPTICS consistently provides the best trade-off between cluster quality and robustness, as it does Server Client 1Client 2Client 3 Client 4Client 5Client 6 Client 7Client 8Client K Cluster 1 (ℓ 1 =ℎ푖푔ℎ) Cluster 2 (ℓ 2 =푙표푤) Cluster 푱 풎풂풙 (ℓ 퐽 푚푎푥 =푚푒푑푖푢푚) 휃 ଵ 휃 ଷ 휃 ௄ 휃 ଺ 휃 ௜ ℓ ௖ Parameters for client 푖 Mean loss for cluster 푐 Selected client Fig. 2. FL architecture with FedLECC client selection. not require specifying the number of clusters in advance and adapts well to varying client densities. Clustering plays a critical role in FedLECC by preventing the repeated selection of clients with highly similar data distri- butions, which could otherwise lead to the over-specialization of the global model. Purely loss-based client selection may repeatedly select clients from a single difficult data mode, leading to over-specialization. By enforcing cluster-level rep- resentation, FedLECC ensures that selected clients span di- verse label data distributions, preserving diversity while still prioritizing challenging regions of the data space. C. Selecting Clusters and Clients At each communication round, FedLECC selects a subset of clusters and clients based on local loss values. Two parameters control the selection process: • J : Number of clusters selected (J ≤ J max ), • m: Total number of participating clients, with z =⌈m/J⌉ clients drawn per selected cluster. Here, J max is automatically determined by OPTICS based on the label distribution similarities among clients, as mea- sured by the HD. After local training, each client reports its computed local empirical loss to the server. The server computes the average loss for each cluster and ranks clusters accordingly. The top-J clusters are selected, and within each cluster, the z clients with the highest loss are chosen. If a cluster contains fewer than z clients, remaining slots are filled by high-loss clients from the next clusters, under the existing loss-based ordering. To clarify the overall framework, FedLECC operates as a lightweight extension of the standard FL workflow. The server first acquires coarse-grained information about client non-IID data in the form of label histograms, which are used to compute inter-client similarities and derive clusters. During training, the clustering structure is treated as fixed, while client selection is performed dynamically at each communi- cation round based on the reported local losses. Importantly, FedLECC does not modify the local training procedure or the aggregation rule, but only influences which clients participate in each round. The full selection procedure is summarized in Algorithm 1 and illustrated in Fig. 2. Algorithm 1 Cluster- and Loss-Based Client Selection (FedLECC) Require: Clusters C = C 1 ,...,C J max , target clusters J , target clients m Ensure: Set S of m selected clients 1: z ←⌈m/J⌉ 2: for each client i do 3:Compute local loss ℓ i and send to server 4: end for 5: for each cluster C k do 6:Compute mean loss ̄ ℓ k 7: end for 8: Select top-J clusters with highest ̄ ℓ k 9: for each selected cluster do 10:Select top-z clients with highest ℓ i 11: end for 12: if |S| < m then 13:Fill remaining slots with highest-loss clients from the following clusters ordered by descending ̄ ℓ k 14: end if 15: return S D. Convergence Considerations. FedLECC modifies the standard FL workflow only through the client selection mechanism, while preserving the local training procedure and the server-side aggregation rule (e.g., weighted averaging as in FedAvg). As a result, FedLECC inherits the convergence properties of biased client selection schemes studied in prior work [19]. In particular, selecting TABLE I CHARACTERISTICS OF THE DATASETS USED IN OUR EVALUATION. Dataset#train#test#features#classes MNIST60,00010,00078410 FMNIST60,00010,00078410 clients with higher local loss increases the probability of sam- pling informative updates, which has been shown to accelerate convergence under non-IID data without destabilizing training. Moreover, the clustering step in FedLECC does not alter model updates, but enforces diversity among selected clients by preventing repeated sampling from highly similar data dis- tributions. This mitigates client drift under severe label skew and empirically improves convergence stability, as observed in Section V. While a formal convergence proof is beyond the scope of this work, our experimental results indicate that FedLECC converges faster and more stably than both uniform sampling and single-factor selection baselines. E. Design Rationale and Implicit Optimization Objective FedLECC is guided by the objective of maximizing the utility of each communication round under a strict partic- ipation budget, i.e., selecting a small set of clients that is both informative and diverse under severe non-IID data. Loss-guided prioritization targets informativeness: clients (and clusters) with higher loss are more likely to yield updates that reduce the global objective, as also motivated by biased client selection studies [19]. Clustering based on label-distribution similarity acts as a diversity control mechanism that prevents repeatedly sampling highly similar clients, mitigating over- specialization and improving robustness under label skew. Thus, the two-stage selection in Algorithm 1 can be seen as a practical approximation to constrained utility maximization that balances loss reduction with coverage of heterogeneous data distributions. V. EXPERIMENTS AND RESULTS This section evaluates FedLECC in terms of (i) test accuracy and (i) communication overhead under severe label skew. We compare against FedAvg and state-of-the-art baselines. A. Experimental Setup Datasets and non-IID partitioning. We evaluate on MNIST [22] and FMNIST [23] (see Table I). Each dataset is partitioned across K clients using FedArtML [24] with a Dirichlet(α) label split to simulate label skew. We focus on a high non-IID data regime (HD ≈ 0.9), where larger values indicate highly non-IID data. Model and training protocol. For both datasets, we train a Multilayer Perceptron (MLP) with two hidden layers (200 neurons), using cross-entropy loss and Stochastic Gradient Descent (SGD) with a learning rate of 0.005 and a batch size of 64. We use T = 150 communication rounds, and report results averaged over five random seeds. 020406080100120140 0.1 0.2 0.3 0.4 0.5 0.6 FedAvg FedProx FedDyn FedNova HACCS FedCLS FedCor POC FedLECC (ours) Communication Rounds Accuracy Fig. 3. Accuracy comparison for the baselines and FedLECC using FMNIST andK = 100 (best tuned hyperparameters). TABLE I ACCURACY (MEAN± STD) ON MNIST AND FMNIST UNDER HIGH NON-IID DATA. MNISTFMNIST MethodK = 100K = 250K = 100K = 300 HD0.900.860.900.86 Silhouette0.6410.5020.7230.409 FedAvg0.681± 0.020.745± 0.030.565± 0.030.634± 0.03 FedProx0.681± 0.030.745± 0.030.608± 0.020.636± 0.02 FedNova0.696± 0.030.732± 0.030.554± 0.030.598± 0.04 FedDyn0.654± 0.030.687± 0.020.567± 0.030.661± 0.03 HACCS0.625± 0.030.535± 0.030.608± 0.030.658± 0.03 FedCLS0.636± 0.020.644± 0.030.577± 0.030.639± 0.03 FedCor0.678± 0.030.546± 0.020.592± 0.030.652± 0.03 POC0.673± 0.030.719± 0.020.605± 0.030.668± 0.02 FedLECC (ours)0.702± 0.030.772± 0.030.629± 0.030.675± 0.02 Baselines and tuning. All baselines are tuned follow- ing their original papers: FedProx [13], FedNova [14], Fed- Dyn [15], HACCS [16], FedCLS [17], FedCor [18], and POC [19]. B. Accuracy Comparison Figure 3 reports representative learning curves on FMNIST with K = 100 clients under severe label skew. FedLECC converges faster than all baselines and reaches a higher final accuracy, indicating that its cluster-aware and loss-guided selection enables more effective use of each communication round. In this setting, FedLECC reduces the number of communication rounds required to reach a given accuracy level by approximately 22% compared to FedAvg. In con- trast, baselines relying on uniform sampling or single-factor selection exhibit slower convergence and larger performance fluctuations, which are symptomatic of client drift under non- IID data. Table I reports final test accuracy on MNIST and FMNIST under highly non-IID data. FedLECC achieves the highest accuracy in most settings, especially with larger client pop- ulations, where non-IID data is more pronounced. Across these configurations, FedLECC improves model accuracy by up to 12% compared to FedAvg and other strong baselines. Compared to both regularization-based and selection-based approaches (e.g., HACCS and POC), FedLECC’s combination of cluster-level diversity and loss-guided prioritization proves more effective than relying on either mechanism alone. TABLE I AVERAGE COMMUNICATION OVERHEAD (MB) ON MNIST AND FMNIST (SMALLER IS BETTER). MNISTFMNIST MethodK = 100K = 250K = 100K = 300 FedAvg49.28123.2149.28126.28 FedProx21.1249.2844.73119.73 FedNova17.70132.6373.6193.92 FedDyn41.3138.0515.2235.61 HACCS3.423.233.4255.29 FedCLS35.2640.2135.20114.30 FedCor2.474.473.9741.80 POC4.785.652.5234.31 FedLECC (ours)2.111.932.2433.55 C. Communication Overhead We measure the total communication exchanged between the server and clients over training, including model pa- rameters, cluster information, and loss values. Table I re- ports the average communication overhead under high non- IID conditions. FedLECC consistently achieves lower over- head than FedAvg by limiting participation to a small set of informative clients, and remains competitive with other selection-based baselines. Across the evaluated configurations, FedLECC reduces the overall communication overhead by up to 50% compared to strong baselines, demonstrating that suitably selecting clients can significantly reduce bandwidth and coordination costs in cloud–edge FL systems without sacrificing accuracy. VI. DISCUSSION FedLECC consistently improves learning performance while reducing system-level costs, even when only a limited subset of clients is selected. This confirms that informed se- lection, rather than broad participation, is essential for scalable cross-device FL. With respect to RQ1, FedLECC improves test accuracy un- der severe label skew by prioritizing informative and represen- tative edge devices. Explicitly considering non-IID data during client selection helps mitigate client drift and improve the sta- bility of the aggregation process. Regarding RQ2, FedLECC’s gains stem from combining cluster-based diversity control with loss-guided prioritization, which encourages the selection of clients with diverse label distributions while focusing training on underperforming regions of the data distribution. RQ3 is addressed through the observed reductions in com- munication rounds and overall communication overhead. Se- lecting fewer yet more informative clients reduces bandwidth consumption and coordination costs, which are critical con- straints in cloud–edge infrastructures. From a systems perspec- tive, these results position FedLECC as an effective selection mechanism for resource-efficient distributed AI workloads. Finally, from a scalability perspective, our experiments consider configurations with up to K = 300 clients across different datasets and severe non-IID scenarios. The results show that FedLECC preserves its accuracy advantages over strong baselines as the number of participating clients in- creases, indicating good scalability. Overall, the discussion highlights that FedLECC aligns with the goals of intelligent cloud computing and networking by addressing core challenges in selection and scalable FL under non-IID data. VII. POTENTIAL CHALLENGES FOR FEDLECC While effective, FedLECC’s performance depends on con- figuration choices such as the number of selected clusters and clients. Similar sensitivity to participation-related parameters is also present in other state-of-the-art client selection base- lines, such as POC, FedCor and HACCS. Inappropriate config- urations may trade off accuracy for communication efficiency or vice versa. Addressing this common challenge through adaptive parameter tuning and workload-aware configuration mechanisms remains an important direction for future work. VIII. CONCLUSION AND FUTURE WORK This paper presented FedLECC, an intelligent, cluster-aware client selection strategy for FL workloads in cloud–edge environments under highly non-IID data and more specifically severe label skew. By combining cluster-level diversity control with loss-guided prioritization, FedLECC enables the server to select a small set of informative edge devices at each communication round. Experimental results under highly non- IID label skew show that FedLECC improves model accuracy by up to 12%, while reducing communication rounds by ap- proximately 22% and overall communication overhead by up to 50%. These gains demonstrate that informed client selection can substantially improve the efficiency and scalability of cross-device FL systems operating under limited bandwidth and participation budgets. Several directions remain open for future work from a systems and networking perspective. In particular, exploring lightweight mechanisms to automatically adapt FedLECC’s configuration to workload dynamics and resource availability could further improve its robustness in cloud–edge environ- ments. In addition, integrating privacy-preserving techniques such as Differential Privacy [20] or secure multiparty com- putation [21] into the client selection pipeline remains an important direction to strengthen trust and security guarantees in practical federated cloud–edge systems. IX. ACKNOWLEDGMENTS Daniel M. Jimenez-Gutierrez was partially supported by PNRR351 TECHNOPOLE – NEXT GEN EU Roma Technopole – Digital Transition, FP2 – Energy transition and digital transition in urban regeneration and construction. Aris Anagnostopoulos was supported by the PNRR MUR project PE0000013-FAIR, the PNRR MUR project IR0000013- SoBigData.it, and the MUR PRIN project 2022EKNE5K “Learning in Markets and Society.” Ioannis Chatzigiannakis was supported by PE07- SERICS (Security and Rights in the Cyberspace) – European Union Next-Generation-EU- PE00000014 (Piano Nazionale di Ripresa e Re- silienza – PNRR). Andrea Vitaletti was supported by the project SER- ICS (PE00000014) under the MUR National Recovery and Resilience Plan funded by the European Union - NextGener- ationEU. REFERENCES [1] M. Soori, B. Arezoo, and R. Dastres, “Internet of things for smart factories in industry 4.0, a review,” Internet of Things and Cyber- Physical Systems, 2023. [2] M. Arafat, M. Hossain, and M. M. Alam, “Machine learning scopes on microgrid predictive maintenance: Potential frameworks, challenges, and prospects,” Renewable and Sustainable Energy Reviews, vol. 190, p. 114088, 2024. [3] B. Wang, Y. Dong, J. Yao, H. Qin, and J. Wang, “Exploring anomaly detection and risk assessment in financial markets using deep neural networks,” International Journal of Innovative Research in Computer Science and Technology, vol. 12, no. 4, 2024. [4] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics. PMLR, 2017, p. 1273– 1282. [5] D. M. Jimenez Gutierrez, H. M. Hassan, L. Landi, A. Vitaletti, and I. Chatzigiannakis, “Application of federated learning techniques for arrhythmia classification using 12-lead ecg signals,” in International Symposium on Algorithmic Aspects of Cloud Computing.Springer, 2023, p. 38–65. [6] D. M. Jimenez-Gutierrez, M. Hassanzadeh, A. Anagnostopoulos, I. Chatzigiannakis, and A. Vitaletti, “A thorough assessment of the non- iid data impact in federated learning,” arXiv preprint arXiv:2503.17070, 2025. [7] Q. Li, Y. Diao, Q. Chen, and B. He, “Federated learning on non-iid data silos: An experimental study,” in 2022 IEEE 38th International Conference on Data Engineering (ICDE), IEEE.IEEE: IEEE, 2022, p. 965–978. [8] L. Nagalapatti, R. S. Mittal, and R. Narayanam, “Is your data relevant?: Dynamic selection of relevant data for federated learning,” in Proceed- ings of the AAAI Conference on Artificial Intelligence, vol. 36, 2022, p. 7859–7867. [9] A. Z. Tan, H. Yu, L. Cui, and Q. Yang, “Towards personalized federated learning,” IEEE transactions on neural networks and learning systems, vol. 34, no. 12, p. 9587–9603, 2022. [10] R. Goussakov, “Hellinger distance-based similarity measures for recom- mender systems,” Ph.D. dissertation, Umea University, 2020. [11] P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings et al., “Advances and open problems in federated learning,” Foundations and Trends® in Machine Learning, vol. 14, no. 1–2, p. 1–210, 2021. [12] Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra, “Federated learning with non-iid data,” arXiv preprint arXiv:1806.00582, 2018. [13] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” Proceedings of Machine learning and systems, vol. 2, p. 429–450, 2020. [14] J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V. Poor, “A novel framework for the analysis and design of heterogeneous federated learning,” IEEE Transactions on Signal Processing, vol. 69, p. 5234–5249, 2021. [15] D. A. E. Acar, Y. Zhao, R. M. Navarro, M. Mattina, P. N. Whatmough, and V. Saligrama, “Federated learning based on dynamic regularization,” arXiv preprint arXiv:2111.04263, 2021. [16] J. Wolfrath, N. Sreekumar, D. Kumar, Y. Wang, and A. Chandra, “HACCS: Heterogeneity-aware clustered client selection for accelerated federated learning,” in 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2022, p. 985–995. [17] C. Li and H. Wu, “FedCLS: A federated learning client selection algorithm based on cluster label information,” in 2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall).IEEE, 2022, p. 1–5. [18] M. Tang, X. Ning, Y. Wang, J. Sun, Y. Wang, H. Li, and Y. Chen, “Fedcor: Correlation-based active client selection strategy for heteroge- neous federated learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, p. 10 102–10 111. [19] Y. Jee Cho, J. Wang, and G. Joshi, “Towards understanding biased client selection in federated learning,” in Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, G. Camps-Valls, F. J. R. Ruiz, and I. Valera, Eds., vol. 151.PMLR, 2022, p. 10 351–10 375. [Online]. Available: https://proceedings.mlr.press/v151/jee-cho22a.html [20] ́ U. Erlingsson, V. Pihur, and A. Korolova, “Rappor: Randomized aggre- gatable privacy-preserving ordinal response,” in Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, 2014, p. 1054–1067. [21] J. B ̈ ohler and F. Kerschbaum, “Secure multi-party computation of differentially private heavy hitters,” in Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, 2021, p. 2361–2377. [22] L. Deng, “The mnist database of handwritten digit images for machine learning research,” IEEE Signal Processing Magazine, vol. 29, no. 6, p. 141–142, 2012. [23] H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv:1708.07747, 2017. [24] D. M. Jimenez, A. Anagnostopoulos, I. Chatzigiannakis, and A. Vitaletti, “Fedartml: A tool to facilitate the generation of non-iid datasets in a controlled way to support federated learning research,” IEEE Access, vol. 12, p. 81 004–81 016, 2024.