Paper deep dive
AW-MoE: All-Weather Mixture of Experts for Robust Multi-Modal 3D Object Detection
Hongwei Lin, Xun Huang, Chenglu Wen, Cheng Wang
Abstract
Abstract:Robust 3D object detection under adverse weather conditions is crucial for autonomous driving. However, most existing methods simply combine all weather samples for training while overlooking data distribution discrepancies across different weather scenarios, leading to performance conflicts. To address this issue, we introduce AW-MoE, the framework that innovatively integrates Mixture of Experts (MoE) into weather-robust multi-modal 3D object detection approaches. AW-MoE incorporates Image-guided Weather-aware Routing (IWR), which leverages the superior discriminability of image features across weather conditions and their invariance to scene variations for precise weather classification. Based on this accurate classification, IWR selects the top-K most relevant Weather-Specific Experts (WSE) that handle data discrepancies, ensuring optimal detection under all weather conditions. Additionally, we propose a Unified Dual-Modal Augmentation (UDMA) for synchronous LiDAR and 4D Radar dual-modal data augmentation while preserving the realism of scenes. Extensive experiments on the real-world dataset demonstrate that AW-MoE achieves ~ 15% improvement in adverse-weather performance over state-of-the-art methods, while incurring negligible inference overhead. Moreover, integrating AW-MoE into established baseline detectors yields performance improvements surpassing current state-of-the-art methods. These results show the effectiveness and strong scalability of our AW-MoE. We will release the code publicly at this https URL.
Tags
Links
- Source: https://arxiv.org/abs/2603.16261v1
- Canonical: https://arxiv.org/abs/2603.16261v1
Intelligence
Status: not_run | Model: - | Prompt: - | Confidence: 0%
Entities (0)
Relation Signals (0)
No relation signals yet.
Cypher Suggestions (0)
No Cypher suggestions yet.
Full Text
60,336 characters extracted from source content.
Expand or collapse full text
AW-MoE: All-Weather Mixture of Experts for Robust Multi-Modal 3D Object Detection Hongwei Lin, Xun Huang, Chenglu Wen†, , and Cheng Wang Hongwei Lin, Xun Huang, Chenglu Wen, and Cheng Wang are with the Fujian Key Laboratory of Urban Intelligent Sensing and Computing and the School of Informatics, Xiamen University, Xiamen, FJ 361005, China. (E-mail: greatlin, huangxun@stu.xmu.edu.cn; clwen, cwang@xmu.edu.cn). Xun Huang is also with Zhongguancun Academy, Beijing, China.† Corresponding author. Abstract Robust 3D object detection under adverse weather conditions is crucial for autonomous driving. However, most existing methods simply combine all weather samples for training while overlooking data distribution discrepancies across different weather scenarios, leading to performance conflicts. To address this issue, we introduce AW-MoE, the framework that innovatively integrates Mixture of Experts (MoE) into weather-robust multi-modal 3D object detection approaches. AW-MoE incorporates Image-guided Weather-aware Routing (IWR), which leverages the superior discriminability of image features across weather conditions and their invariance to scene variations for precise weather classification. Based on this accurate classification, IWR selects the top-K most relevant Weather-Specific Experts (WSE) that handle data discrepancies, ensuring optimal detection under all weather conditions. Additionally, we propose a Unified Dual-Modal Augmentation (UDMA) for synchronous LiDAR and 4D Radar dual-modal data augmentation while preserving the realism of scenes. Extensive experiments on the real-world dataset demonstrate that AW-MoE achieves ∼ 15% improvement in adverse-weather performance over state-of-the-art methods, while incurring negligible inference overhead. Moreover, integrating AW-MoE into established baseline detectors yields performance improvements surpassing current state-of-the-art methods. These results show the effectiveness and strong scalability of our AW-MoE. We will release the code publicly at https://github.com/windlinsherlock/AW-MoE. I Introduction Three-Dimensional object detection, a fundamental task in 3D computer vision, has significantly advanced autonomous driving and other unmanned systems. Most existing methods rely on the stable performance of sensors, such as LiDAR [33, 17, 47, 32, 15] and cameras [38, 22]. However, under adverse weather conditions (e.g., rain, fog, snow), sensor performance degrades, leading to weakened system reliability [6]. Therefore, recent studies have explored developing robust 3D object detection techniques under adverse weather conditions. These works pursue robustness through two complementary approaches: the construction of simulation-augmented [6, 12, 9] or real-world datasets [24] at the data level, and the development of multi-modal fusion techniques [13, 3, 25] at the algorithmic level. However, existing methods primarily simply combine all weather samples for training while overlooking the substantial distribution discrepancies across adverse weather conditions, which may lead to performance conflicts across various scenarios. Figure 1: Comparison of weather-type discriminability between camera images and LiDAR point clouds. (a, b) Camera images exhibit distinct visual characteristics and robustness to scene variations, facilitating accurate weather classification. (c, d) In contrast, LiDAR point clouds suffer from ambiguous inter-class geometric distortions and scene-induced intra-class distribution shifts, which obscure the boundaries between different weather conditions. To investigate this, we first explore the influence of weather sample bias through fine-tuning the state-of-the-art L4DR method [13] separately on each weather subset. As shown in Fig. 2 (a), models fine-tuned on a specific weather condition improve in that condition but suffer degraded performance in others. This phenomenon indicates that significant distributional gaps exist across weather conditions, preventing a single model from maintaining optimal performance across all conditions. Moreover, due to the expensive collection of adverse weather data, real-world datasets such as K-Radar [24] contain far fewer adverse weather samples than normal-weather ones (see Fig. 2 (b)). This imbalanced distribution in weather samples tends to bias training toward normal weather conditions, thereby further overlooking adverse weather scenarios. To address these challenges, we propose the Adverse-Weather Mixture of Experts (AW-MoE), the first approach that introduces the Mixture of Experts (MoE) technique to enhance the robustness of 3D object detection under adverse weather conditions. AW-MoE leverages the MoE mechanism [14, 31] to extend a single-branch detector into a specialized multi-branch architecture, in which each branch is explicitly tailored to a specific weather condition. This design enables robust adaptation to diverse adverse-weather scenarios while incurring negligible inference overhead. It is worth noting that the effectiveness of Mixture-of-Experts (MoE) in multi-scenario applications heavily relies on optimal expert routing. Standard MoE frameworks [31] typically employ Point-cloud Feature-based Routing (PFR), utilizing input point-cloud features to guide the routing process, as shown in Fig. 3 (a). However, PFR exhibits significant limitations in outdoor autonomous driving under adverse weather conditions. First, point clouds suffer from ambiguous inter-class geometric distortions, making it difficult to precisely differentiate weather conditions in the feature space (see Fig. 1 (c)). Furthermore, the highly dynamic nature of outdoor environments leads to scene-induced intra-class distribution shifts, where point clouds of the same weather exhibit massive variations across different scenes (see Fig. 1 (d)). In contrast, camera images demonstrate highly favorable properties for weather perception. First, images present distinct visual characteristics (see Fig. 1 (a)). For instance, normal weather offers clear vision and high definition, rain introduces windshield droplets and strong specular reflections, and snow presents significant snowflake accumulations. These prominent visual cues enable an Image Weather Classifier to easily distinguish weather conditions in the feature space. Second, images demonstrate strong robustness to scene variations (see Fig. 1 (b)). Motivated by these observations, we propose an Image-guided Weather-aware Routing (IWR) module, illustrated in Fig. 3 (b). IWR leverages an Image Weather Classifier to explicitly identify the weather condition, thereby routing the data to the most suitable weather expert to mitigate data distribution discrepancies. As shown in Table I, our IWR achieves a routing accuracy of nearly 99% across all weather conditions, whereas the baseline PFR struggles significantly to recognize severe weather environments. Guided by the accurate expert routing of IWR, a Weather-Specific Experts (WSE) module subsequently extracts weather-specific features for the corresponding conditions. Moreover, to mitigate the scarcity of adverse weather samples, we propose a Unified Dual-Modal Augmentation (UDMA) module that performs synchronized data augmentation on both LiDAR and 4D Radar point clouds. Furthermore, we introduce a variant termed AW-MoE-LRC, which integrates image features with the LiDAR and 4D Radar representations. This variant fully exploits the rich semantic information of cameras to achieve enhanced perception performance. Figure 2: (a) Performance changes of L4DR [13] after fine-tuning on a single weather condition under different weather scenarios. (b) Statistics of data volume across different weather conditions in the K-Radar dataset [24]. Figure 3: Method comparison between Point-cloud Feature-based Routing (PFR) and the proposed Image-guided Weather-aware Routing (IWR). Extensive experiments on the real-world K-Radar [24] dataset demonstrate that AW-MoE outperforms state-of-the-art methods and shows the extensibility of AW-MoE. Our main contributions are summarized as follows: • We propose the Adverse-Weather Mixture of Experts (AW-MoE), which first introduce MoE technique to enhance the robustness of 3D object detection in adverse weather scenarios. The effect is remarkably significant, AW-MoE achieves robust detection performance across all weather conditions. • Leveraging the inherent advantages of images in distinguishing weather types, we design the Image-guided Weather-aware Routing (IWR) and Weather-Specific Experts (WSE) modules. This design overcomes the limitations of prior MoE routing approaches under varying weather conditions, thereby enhancing the overall effectiveness of the framework. Additionally, we introduce a tri-modal variant, AW-MoE-LRC, which further incorporates camera features into the LiDAR and 4D Radar modalities. • Our AW-MoE is also a highly scalable framework that can be extended to various 3D object detection methods, yielding substantial performance gains under adverse weather conditions. Extensive experiments on real-world datasets demonstrate the superior performance and strong extensibility of our AW-MoE. I Related Work I-A 3D Object Detection. 3D object detection [46, 41, 20, 40, 39, 21, 35, 49] is a core task in 3D vision, predominantly relying on raw point clouds like LiDAR. Existing methods are broadly categorized into three types based on data representation: point-based, voxel-based, and point-voxel-based. Point-based methods [33, 45, 34] directly sample and aggregate features from raw points. They classify foreground points and predict corresponding 3D bounding boxes. This preserves fine-grained geometric details but incurs high computational costs. Conversely, voxel-based methods [43, 5, 51, 17, 47] partition point clouds into regular grids. They aggregate features within each voxel and apply 3D spatial convolutions. Many models [17, 5] further compress these features into Bird’s Eye View (BEV) space for efficient 2D convolutions, significantly accelerating inference. Point-voxel-based methods [46, 32] integrate both representations to balance efficiency and geometric accuracy. While these approaches achieve impressive accuracy under normal conditions, their performance drops significantly in adverse weather. Environmental interference degrades LiDAR signals, severely compromising the reliability of these conventional methods. I-B 3D Object Detection Under Adverse Weather. Under adverse weather conditions, the perception capability of sensors such as LiDAR degrades, leading to reduced detection performance [6, 30]. Recent research has extensively explored 3D object detection [17, 4, 47, 48] under such conditions [12, 16, 1, 6, 26, 13, 42, 10]. Some works generate simulated adverse weather data (e.g., rain, snow, fog) to train robust detection models [11, 6, 12, 9]. In contrast, others focus on real-world datasets such as K-Radar [24], which provides multimodal data from LiDAR, 4D Radar, and cameras, and introduces RTNH [24] using 4D Radar for detection. Furthermore, sensor-fusion methods, including Bi-LRFusion [37], 3D-LRF [3], and L4DR [13], leverage complementary information from LiDAR and Radar to enhance robustness. Although these approaches outperform single-modal methods, they overlook the substantial distribution gaps across different adverse weather conditions. Our experiments reveal that training a single-branch model with mixed-weather data causes conflicting optimizations among weather scenarios, leading to unstable performance. Therefore, addressing weather-specific discrepancies is essential for maintaining robust and consistent detection across all conditions. I-C Mixture of Experts (MoE). MoE [19, 23, 50] has emerged as a powerful framework for scaling models while maintaining computational efficiency. Initially proposed by [14], MoE divides the model into specialized experts and uses a gating network to select the most relevant experts for each input. Sparsely-Gated MoE [31] further improves scalability by activating only a subset of experts, allowing models to scale to billions of parameters without significant computational overhead. GShard [18]optimized MoE training on distributed systems, enabling efficient large-scale training. Switch Transformer [8] simplified expert routing by adopting top-1 selection, enhancing both training stability and scalability. Later works, such as GLaM [7] and DeepSpeed-MoE [27], focused on improving MoE for multi-task learning and large-scale training. In contrast, V-MoE [29] extended MoE to vision tasks by applying sparse activation to image patches in Vision Transformers [28], thereby improving computational efficiency. The MoE framework offers a promising solution to the challenges posed by diverse data distributions in tasks with varying conditions. Motivated by these advantages, we are the first to introduce MoE into 3D object detection under adverse weather conditions, effectively addressing inter-weather discrepancies and enabling robust performance across all conditions. I Proposed method Figure 4: AW-MoE Framework. (a) Unified Dual-Modal Augmentation (UDMA): Synchronously augments LiDAR and 4D Radar point clouds. Its GT Sampling only selects ground truths matching the scene’s weather. (b) Image-guided Weather-aware Routing (IWR): Uses an Image-based Weather Classifier to predict the scene weather and routes the feature to the top-K most relevant Weather-Specific Experts. (c) Weather-Specific Experts (WSE): Each expert is specialized for a weather condition, extracting robust weather-specific features and regressing bounding boxes with tailored sensitivity. I-A Problem Formulation In outdoor adverse weather scenarios, the sensory inputs, denoted as ℐI, are processed by a perception model ℳM to extract deep representations f=ℳ(ℐ)f=M(I). For multi-modal settings [13, 3], features f from different sensors are further integrated by a feature fusion module G, producing the fused representation f′=(f)f =G(\f\). The fused features are then fed into the detection head to regress the final 3D bounding boxes B=bii=1Nb,B∈ℝNb×7B=\b_i\^N_b_i=1,B ^N_b× 7, where NbN_b denotes the number of detected bounding boxes. Our proposed AW-MoE builds upon the state-of-the-art LiDAR–4D Radar fusion framework L4DR [13] by integrating a Mixture of Experts (MoE) mechanism. The input consists of LiDAR point clouds lP^l and 4D Radar point clouds rP^r, denoted collectively as m=pimi=1Nm,m∈l,rP^m=\p_i^m\_i=1^N_m,\ m∈\l,r\, where pimp_i^m represents a 3D point in modality m. I-B AW-MoE The overall architecture of AW-MoE is illustrated in Fig. 4. AW-MoE consists of three main components: a Shared Backbone, an Image-guided Weather-aware Routing (IWR) module, and multiple Weather-Specific Experts (WSE). The Shared Backbone extracts general representations from the input data, while the IWR leverages discriminative visual cues from images under different weather conditions to dynamically route features to the most suitable WSE. Each WSE is specialized in processing features corresponding to a particular weather type, enabling AW-MoE to maintain robust and consistent detection performance across diverse adverse conditions. Moreover, the proposed Unified Dual-Modal Augmentation (UDMA) performs synchronized data augmentation for both LiDAR and 4D Radar modalities, ensuring sample authenticity and cross-modal consistency under various weather scenarios. I-B1 Unified Dual-Modal Augmentation Data augmentation [17, 47, 5] is widely used in deep learning but has been largely overlooked in LiDAR–4D Radar fusion [13, 2, 3]. In this work, we address this limitation by proposing Unified Dual-Modal Augmentation (UDMA), which performs synchronized augmentations on LiDAR and 4D Radar data, including flipping, rotation, scaling, and ground-truth (GT) sampling, to maintain cross-modal consistency. Unlike conventional GT sampling [32, 43] which indiscriminately mixes data from different weather conditions and thereby degrades scene realism, our proposed Weather-Specific GT Sampling (WSGTS) accounts for the substantial geometric and reflective variations of objects across diverse weather scenarios. As shown in Fig. 4 (a), WSGTS samples GTs exclusively from scenes with matching weather conditions, effectively avoiding cross-weather mismatches, preserving environmental authenticity, and improving detection performance, as reported in Table VI. I-B2 Image-guided Weather-aware Routing The key to the MoE framework’s effectiveness in handling multi-task and multi-scenario problems lies in the ability of expert routing to accurately select the most suitable expert. As analyzed in the Introduction, the Point-cloud Feature-based Routing (PFR) [14], which relies on point cloud features, performs poorly in outdoor scenarios due to the highly dynamic nature of environments and the difficulty of capturing point cloud differences under adverse weather such as fog, sleet, and light snow (see Figs. 1 (c, d) and Table I). Conversely, images offer superior clarity in distinguishing diverse weather patterns while remaining largely invariant to fluctuations in scene geometry (see Figs. 1 (a, b)). Based on this observation, we design an Image-guided Weather-aware Routing (IWR) to perform expert selection. First, we design a lightweight image-based Weather Classifier to categorize the captured scene images: z=(ℐimg)∈ℝNW,z=C(I_img) ^N_W, (1) where z denotes the classification result, C represents the Weather Classifier, ℐimgI_img denotes camera image, and NWN_W is the number of weather categories. Then, the classification result z is normalized using a softmax function, where PwP_w denotes the probability corresponding to the w-th weather category: P=softmax(z),Pw=exp(zw)∑i=1NWexp(zi),w=1,…,NW.P=softmax(z),\\ P_w= exp(z_w) _i=1^N_Wexp(z_i),\ w=1,...,N_W. (2) Finally, we select the top-K weather categories with the highest probabilities in P to determine the corresponding Weather-Specific Experts (WSE): =TopK(P,K)⊂1,…,NW,||=K,S=TopK(P,K)⊂1,...,N_W,\ \ \ |S |=K, (3) where S denotes the set of selected WSE. Since the proposed lightweight image-based Weather Classifier achieves high accuracy in predicting scene weather types (over 99%, see Table I), our IWR can reliably select the most appropriate WSE. Weather Classifier. The architecture of our Image-based Weather Classifier is illustrated in Fig. 6. It consists of an initial convolutional layer followed by a backbone composed of four consecutive Depthwise Separable Blocks. Each Depthwise Separable Block contains a depthwise convolution, a pointwise convolution, and two normalization layers, which collaboratively extract discriminative weather-related features from the input image. Despite its lightweight design, the proposed image-based Weather Classifier achieves both high efficiency and accuracy, providing precise and efficient routing for weather-specific experts. Figure 5: The architecture of the AW-MoE-LRC framework. The pipeline comprises three stages: (i) LiDAR-Guided Image Feature Lifting, where sparse LiDAR depth assists in predicting 3D frustum features from images; (i) 3D Geometry Transformation and BEV Pooling, which projects and aggregates these features into the ego-vehicle BEV space; and (i) Multi-Modal Feature Fusion, which concatenates the aligned camera, LiDAR, and 4D Radar BEV features along the channel dimension for final convolution-based integration. Figure 6: Architecture of the proposed Image-based Weather Classifier. I-B3 Weather-Specific Experts After IWR selects the most suitable expert, the corresponding Weather-Specific Expert (WSE) is activated to handle the scenario under a specific weather condition. Each WSE consists of three components: a Weather-Specific Backbone, a Weather-Specific Feature Fusion module, and a Weather-Specific Detection Head. The Weather-Specific Backbone is responsible for extracting weather-specific features that cannot be captured by the shared backbone. The Weather-Specific Feature Fusion module performs weather-aware complementary fusion of LiDAR and 4D Radar features according to their quality differences under different weather conditions. The Weather-Specific Detection Head predicts and regresses 3D bounding boxes with varying sensitivities tailored to specific weather scenarios. The overall pipeline of WSE can be formulated as: Bw=ℋw(ℱw(ℰw(f))),w∈,B_w=H_w(F_w(E_w(\f\))), w , (4) where ℰwE_w, ℱwF_w, and ℋwH_w denote the w-th Weather-Specific Backbone, Weather-Specific Feature Fusion module, and Weather-Specific Detection Head, respectively. In AW-MoE, all WSEs share the same structural design, with a total of NW=7N_W=7 experts corresponding to the number of weather categories. TABLE I: Comparison of weather classification accuracy between Point-cloud Feature-based Routing (PFR) and the proposed Image-guided Weather-aware Routing (IWR). Method Total Normal Overcast Fog Rain Sleet Light Snow Heavy Snow PFR 71.3 98.9 12.0 51.9 76.7 57.6 2.1 58.3 IWR 99.0 99.8 100.0 97.3 99.0 97.1 99.3 98.7 I-C AW-MoE-LRC: Integrating Image Features In the AW-MoE framework (Fig. 4), camera images are exclusively used within the Image-guided Weather-aware Routing (IWR) module to select Weather-Specific Experts. The image features are not directly utilized for 3D object detection. To explicitly integrate image semantics with LiDAR and 4D Radar features, we propose an extended pipeline, AW-MoE-LRC, as illustrated in Fig. 5. Following [22], we adopt a LiDAR-guided Lift-Splat-Shoot (LSS) architecture to map 2D image features into a unified Bird’s-Eye-View (BEV) space for multi-modal alignment. This process consists of three main stages: I-C1 LiDAR-Guided Image Feature Lifting Standard LSS architectures often lack precise geometric constraints for depth estimation. To address this, we leverage sparse LiDAR point clouds to guide image depth prediction. First, we project the LiDAR point clouds lP^l onto the camera image plane using the intrinsic matrix A and extrinsic matrix TextT_ext to generate a sparse depth map Dsparse∈ℝD1×H×WD_sparse ^D_1× H× W. This depth map is convolved, concatenated with the backbone-extracted image features fimgf_img, and fed into a DepthNet. The network outputs context features fcontext∈ℝC2×H×Wf_context ^C_2× H× W and a discrete depth probability distribution Dprob∈ℝD2×H×WD_prob ^D_2× H× W, where D2D_2 denotes the number of predefined depth bins. The 3D frustum feature frustumf_frustum is then computed via the outer product of the depth probabilities and context features: frustum(u,v,d)=Dprob(u,v,d)⊗fcontext(u,v),f_frustum(u,v,d)=D_prob(u,v,d) f_context(u,v), (5) where (u,v)(u,v) represents the image pixel coordinates and d is the discrete depth index. I-C2 3D Geometry Transformation and BEV Pooling (Splatting) To map the frustum features into the ego-vehicle coordinate system, we compute the 3D coordinate PegoP_ego for each feature point. Given the depth d and pixel coordinate (u,v)(u,v), and accounting for data augmentations (e.g., image augmentation matrix Timg_augT_img\_aug and LiDAR augmentation matrix Tlidar_augT_lidar\_aug), the coordinate transformation is formulated as: Pego=Tlidar_aug(TextA−1Timg_aug−1[u⋅dv⋅d1]).P_ego=T_lidar\_aug (T_ext\ A^-1\ T_img\_aug^-1 bmatrixu· d\\ v· d\\ d\\ 1 bmatrix ). (6) After obtaining the 3D coordinates for all frustum features, we apply an efficient BEV pooling operation to aggregate features that fall into the same 3D voxel grid. The features along the Z-axis are then flattened and concatenated across the channel dimension. Finally, a downsampling convolutional layer is applied to generate the spatial BEV features for the camera branch, denoted as fcf^c. I-C3 Multi-Modal Feature Fusion Once the image spatial features fcf^c are extracted, they are fused with the LiDAR features flf^l and 4D Radar features frf^r within the unified BEV space. We concatenate the features along the channel dimension and apply several convolutional layers to learn cross-modal interactions and adaptive weight assignments. The final fused feature f^f is obtained as follows: f=Convs([fc,fl,fr]),f^f=Convs ([f^c,f^l,f^r] ), (7) where [⋅][·] denotes the channel-wise concatenation. This fusion strategy effectively harnesses the rich semantic information from images, the precise geometric structure of LiDAR, and the robust, all-weather dynamic perception of 4D Radar. Input: LiDAR point clouds lP^l, 4D Radar point clouds rP^r, camera images ℐimgI_img Output: Trained AW-MoE model 1 2Stage 1: Pretrain single-branch AW-MoE; 3 Select a designated WSEd; 4 for each batch in all-weather data l,r\P^l,P^r\ do 5 Forward: Bd←ℋd(ℱd(ℰd(ℰshared(l,r))))B_d _d(F_d(E_d(E_shared(P^l,P^r)))); 6 Compute loss and update parameters of ℰsharedE_shared and WSEd; 7 8 9Stage 2: Train image-based Weather Classifier C; 10 for each batch of ℐimgI_img with weather labels do 11 Forward: z←(ℐimg)∈ℝNWz (I_img) ^N_W; 12 Compute classification loss and update C; 13 14 15Stage 3: Initialize AW-MoE; 16 Freeze parameters of ℰsharedE_shared; 17 Copy pretrained parameters to all WSE branches: WSEw←WSEd,w=1,…,NWWSE_w _d, w=1,…,N_W; 18 19Stage 4: Train AW-MoE with IWR; 20 for each batch l,r,ℐimg\P^l,P^r,I_img\ do 21 Extract shared features: f←ℰshared(l,r)f _shared(P^l,P^r); 22 Compute weather probabilities: P←softmax((ℐimg))∈ℝNWP (C(I_img)) ^N_W; 23 Select top-K experts: ←TopK(P,K)S (P,K); 24 Predict 3D boxes: Bw←ℋw(ℱw(ℰw(f))),w∈B_w _w(F_w(E_w(\f\))), w ; 25 Compute confidence-weighted loss: ℒ←∑w∈Pwℒw(WSEw)L_CW← _w P_w\,L_w(WSE_w); 26 Update parameters of selected experts WSEw,w∈WSE_w,\ w ; 27 Algorithm 1 AW-MoE Training Strategy I-D Loss Function and Post-Processing In the AW-MoE framework, the IWR selects the top-K Weather-Specific Experts (WSE) to process the input data. During training, each selected WSE computes an individual loss, while during inference, each WSE regresses a dedicated set of 3D bounding boxes. However, the relevance between a WSE and the input data fluctuates based on weather conditions. To account for this varying contribution, a specialized loss function and post-processing strategy are required to aggregate the outputs. We thus propose the following formulations: I-D1 Confidence-Weighted MoE Loss To account for the varying relevance between data and experts, we introduce the Confidence-Weighted MoE Loss. This objective function leverages the routing probabilities P, generated by the IWR, as dynamic confidence scores. The total loss is formulated as a weighted sum over the set of selected experts S: ℒ=∑w∈Pwℒw(WSEw),L_CW= _w P_w\,L_w(WSE_w), (8) where ℒwL_w denotes the individual loss computed by the w-th WSE. Scaling each expert’s contribution proportional to its routing probability PwP_w prevents samples with low relevance from disproportionately affecting the optimization of specialized experts, thereby ensuring stable, weather-aware convergence. I-D2 Confidence-Weighted Post-Processing Complementing the weighted loss, we apply a consistent Confidence-Weighted Post-Processing during inference to aggregate the 3D bounding boxes B=bii=1NbB=\b_i\_i=1^N_b regressed by the top-K experts. This process effectively integrates multi-expert predictions through two stages: Candidate Selection and Confidence-Weighted Aggregation. Candidate Selection. We first evaluate the 3D Intersection over Union (IoU) among all predicted boxes. Candidates with an IoU below a predefined matching threshold are retained as independent detections. Confidence-Weighted Aggregation. For overlapping boxes representing the same target, we perform a weighted aggregation. Let Ω denote a set of matched boxes, where each box bj∈Ωb_j∈ is associated with its corresponding routing probability pjp_j. The fused bounding box b b is b^=∑bj∈Ωpj⋅bj,bj∈ℝ7 b= _b_j∈ p_j· b_j, b_j ^7 (9) By sharing the same IWR-derived weights as the loss function, this post-processing module dynamically prioritizes predictions from experts most relevant to the current weather, ensuring robust and spatially consistent final detections. TABLE I: Quantitative results of different 3D object detection methods on K-Radar dataset. We present the modality of each method (L: LiDAR, 4DR: 4D Radar) and detailed performance for each weather condition. Best in bold, second in underline, and ∗ indicates results reproduced using open code. Method Modality IoU Metric Total Normal Overcast Fog Rain Sleet Light Snow Heavy Snow RTNH [24] (NeurIPS 2022) 4DR 0.3 APBEVAP_BEV 41.1 41.0 44.6 45.4 32.9 50.6 81.5 56.3 AP3DAP_3D 37.4 37.6 42.0 41.2 29.2 49.1 63.9 43.1 0.5 APBEVAP_BEV 36.0 35.8 41.9 44.8 30.2 34.5 63.9 55.1 AP3DAP_3D 14.1 19.7 20.5 15.9 13.0 13.5 21.0 6.36 RTNH [24] (NeurIPS 2022) L 0.3 APBEVAP_BEV 76.5 76.5 88.2 86.3 77.3 55.3 81.1 59.5 AP3DAP_3D 72.7 73.1 76.5 84.8 64.5 53.4 80.3 52.9 0.5 APBEVAP_BEV 66.3 65.4 87.4 83.8 73.7 48.8 78.5 48.1 AP3DAP_3D 37.8 39.8 46.3 59.8 28.2 31.4 50.7 24.6 InterFusion∗ [36] (IROS 2023) L+4DR 0.3 APBEVAP_BEV 69.5 76.6 84.9 84.3 70.2 35.1 63.1 46.3 AP3DAP_3D 65.6 72.5 81.4 76.9 63.8 34.6 59.9 45.9 0.5 APBEVAP_BEV 66.1 70.5 82.0 81.8 67.2 33.9 62.9 46.0 AP3DAP_3D 41.7 44.6 53.5 64.8 37.2 25.5 35.4 27.0 3D-LRF [3] (CVPR 2024) L+4DR 0.3 APBEVAP_BEV 84.0 83.7 89.2 95.4 78.3 60.7 88.9 74.9 AP3DAP_3D 74.8 81.2 87.2 86.1 73.8 49.5 87.9 67.2 0.5 APBEVAP_BEV 73.6 72.3 88.4 86.6 76.6 47.5 79.6 64.1 AP3DAP_3D 45.2 45.3 55.8 51.8 38.3 23.4 60.2 36.9 L4DR [13] (AAAI 2025) L+4DR 0.3 APBEVAP_BEV 79.5 86.0 89.6 89.9 81.1 62.3 89.1 61.3 AP3DAP_3D 78.0 77.7 80.0 88.6 79.2 60.1 78.9 51.9 0.5 APBEVAP_BEV 77.5 76.8 88.6 89.7 78.2 59.3 80.9 53.8 AP3DAP_3D 53.5 53.0 64.1 73.2 53.8 46.2 52.4 37.0 L4DR-DA3D [44] (M 2025) L+4DR 0.3 APBEVAP_BEV 80.4 86.5 89.8 90.1 81.0 62.6 89.9 61.9 AP3DAP_3D 79.3 85.9 88.4 89.2 79.7 65.8 89.0 60.2 0.5 APBEVAP_BEV 78.5 77.4 89.1 90.1 79.3 58.8 88.9 60.6 AP3DAP_3D 61.9 58.9 66.4 79.2 63.0 48.2 64.6 47.6 AW-MoE (Ours) L+4DR 0.3 APBEVAP_BEV 88.2 87.7 94.5 96.7 88.8 81.0 95.4 70.2 AP3DAP_3D 83.9 84.2 90.0 95.3 84.4 72.9 90.2 64.0 0.5 APBEVAP_BEV 84.2 82.8 91.6 96.3 85.3 75.0 94.7 66.4 AP3DAP_3D 61.5 59.0 67.2 85.7 63.5 43.3 70.1 53.1 TABLE I: Performance (AP3DAP_3D) of AW-MoE and its camera-integrated variant, AW-MoE-LRC. Method Modality IoU Total Normal Overcast Fog Rain Sleet Light Snow Heavy Snow FPS [HZ] AW-MoE L+4DR 0.3 83.9 84.2 90.0 95.3 84.4 72.9 90.2 64.0 12.41 0.5 61.5 59.0 67.2 85.7 63.5 43.3 70.1 53.1 AW-MoE-LRC L+4DR+C 0.3 84.3 84.7 91.0 95.3 84.0 72.9 89.6 63.7 10.02 0.5 61.8 60.2 70.4 85.8 63.4 43.7 69.9 52.8 TABLE IV: Performance (AP3DAP_3D) comparison of AW-MoE when extended to different 3D object detection baselines. Method IoU Total Normal Overcast Fog Rain Sleet Light Snow Heavy Snow RTNH (4DR) [24] 0.3 37.4 37.6 42.0 41.2 29.2 49.1 63.9 43.1 0.5 14.1 19.7 20.5 15.9 13.0 13.5 21.0 6.4 RTNH (4DR) - AW-MoE 0.3 65.7 64.4 72.2 88.7 58.3 64.0 71.2 65.0 0.5 35.7 28.6 44.3 72.9 34.9 32.4 42.7 45.3 0.3 +28.3 +26.8 +30.2 +47.5 +29.1 +14.9 +7.3 +21.9 Improvement 0.5 +21.6 +8.9 +23.8 +57.0 +21.9 +18.9 +21.7 +38.9 RTNH (L) [24] 0.3 72.7 73.1 76.5 84.8 64.5 53.4 80.3 52.9 0.5 37.8 39.8 46.3 59.8 28.2 31.4 50.7 24.6 RTNH (L) - AW-MoE 0.3 81.1 84.0 89.6 93.5 83.4 56.2 89.4 57.3 0.5 55.4 53.0 51.2 82.1 59.4 38.7 69.5 40.5 0.3 +8.4 +10.9 +13.1 +8.7 +18.9 +2.8 +9.1 +4.4 Improvement 0.5 +17.6 +13.2 +4.9 +22.3 +31.2 +7.3 +18.8 +15.9 InterFusion [36] 0.3 65.6 72.5 81.4 76.9 63.8 34.6 59.9 45.9 0.5 41.7 44.6 53.5 64.8 37.2 25.5 35.4 27.0 InterFusion - AW-MoE 0.3 81.7 83.8 89.7 92.7 80.4 71.2 82.5 60.7 0.5 60.3 58.0 70.1 85.9 60.9 47.0 63.4 44.7 0.3 +16.1 +11.3 +8.3 +15.8 +16.6 +36.6 +22.6 +14.8 Improvement 0.5 +18.6 +13.4 +16.6 +21.1 +23.7 +21.5 +28.0 +17.7 TABLE V: FPS, and FLOPS comparison of detectors before and after applying AW-MoE (corresponding to Table IV). Method Sensors FPS [HZ] FLOPS [GB] Param [M] L4DR [13] L+4DR 13.94 142.65 59.73 L4DR - AW-MoE 12.41 143.41 143.46 RTNH (4DR) [24] 4DR 14.77 502.57 17.35 RTNH (4DR) - AW-MoE 14.50 503.25 61.41 RTNH (L) [24] L 14.62 502.53 17.35 RTNH (L) - AW-MoE 14.20 503.22 61.41 InterFusion [36] L+4DR 31.64 16.72 3.87 InterFusion - AW-MoE 29.94 17.48 20.71 I-E Training Strategy As mentioned in the Introduction, collecting data under adverse weather conditions is challenging, resulting in significantly fewer samples for each adverse condition (see Fig. 2 (b)). Even with top-K expert routing, some Weather-Specific Experts may not receive sufficient training. To address this, we propose a training strategy tailored for AW-MoE, as summarized in Algorithm 1. First, all weather data are used to train a single WSE branch, allowing the model to acquire basic 3D object detection capabilities. Next, the Shared Backbone is frozen, and the trained parameters of this WSE are copied to each branch for further training. Combined with the top-K expert routing, this strategy effectively mitigates the training challenges caused by limited adverse-weather data. IV Experiments IV-A Dataset and Evaluation Metrics The K-Radar dataset [24] contains 58 sequences with a total of 34,944 frames (17,486 for training and 17,458 for testing), collected with 64-line LiDAR, cameras, and 4D Radar sensors. It includes not only normal conditions but also six types of adverse weather, such as fog, rain, and heavy snow. For evaluation, we adopt two standard metrics for 3D object detection: 3D Average Precision (AP3DAP_3D) and Bird’s Eye View Average Precision (APBEVAP_BEV), which are measured on the “Sedan” class at IoU thresholds of 0.3 and 0.5. IV-B Implement Details Our AW-MoE is designed as a general framework that can be extended to various 3D object detection algorithms. In this work, we extend the L4DR [13] baseline to develop AW-MoE. Furthermore, we propose AW-MoE-LRC, which integrates camera image features into the AW-MoE framework. To achieve a balance between detection performance and inference efficiency, we set K=1K=1 in the Image-guided Weather-aware Routing. The model is trained on four RTX 3090 GPUs with a batch size of 3. IV-C Results on K-Radar Adverse Weather Dataset Following L4DR [13], we compare AW-MoE with several modality-based 3D object detection methods, including RTNH [24], InterFusion [36], 3D-LRF [3], L4DR [13] and L4DR-DA3D [44]. The results are reported in Table I. AW-MoE consistently outperforms the state-of-the-art methods under both normal and adverse weather conditions, with particularly notable gains under extreme weather such as fog, sleet, light snow, and heavy snow. Specifically, compared to its baseline L4DR, our extended AW-MoE achieves a 10% increase in AP3DAP_3D (IoU=0.3) under fog, a 12.5% increase in AP3DAP_3D (IoU=0.5) under rain, a 12.8% increase in AP3DAP_3D (IoU=0.3) under sleet, and approximately 15% improvements in AP3DAP_3D (IoU=0.3 and 0.5) under light snow and heavy snow. Furthermore, our AW-MoE significantly outperforms L4DR-DA3D across most evaluation metrics. These improvements are attributed to AW-MoE’s multi-branch Weather-Specific Expert design, which mitigates performance conflicts arising from large inter-weather variations, and the precise expert selection enabled by the Image-guided Weather-aware Routing, which further enhances the model’s robustness across diverse conditions. IV-D Extensibility of AW-MoE to Other 3D Detectors To evaluate the extensibility of AW-MoE to other 3D object detectors, we applied it to RTNH [24] and InterFusion [36], where RTNH includes both LiDAR and 4D Radar variants. As shown in Table IV, incorporating AW-MoE consistently improves detection performance across various weather conditions and IoU thresholds, yielding improvements of over 15%. Notably, after integrating AW-MoE, RTNH (4DR) [24] and InterFusion [36] outperform the state-of-the-art methods listed in Table I, enabling previously inferior models to surpass them; for instance, InterFusion achieves a 6.8% higher total performance than L4DR in AP3DAP_3D (IoU=0.5), with even larger gains under adverse weather. These results demonstrate that AW-MoE is highly compatible and effective across different detectors, further validating the robustness and generality of its design. IV-E Performance of AW-MOE-LRC Table I presents the evaluation results of our camera-integrated variant, AW-MoE-LRC, on the K-Radar dataset. Compared to AW-MoE, AW-MoE-LRC moderately improves detection accuracy under high-visibility conditions, such as normal and overcast weather. However, it yields negligible gains in severe weather like fog, rain, and snow. This occurs because camera sensors require clear visibility to capture useful semantic information; in extreme weather, degraded visibility renders these features ineffective for detection. These results underscore the strategic design of our IWR. By leveraging the distinct visual characteristics of images to classify weather and route inputs to the appropriate expert, IWR provides a much more effective way to utilize camera data. IV-F Computational Efficiency of AW-MoE The key advantage of the MoE framework lies in its multi-branch architecture, which effectively handles diverse tasks and scenarios. Since expert routing activates only a subset of experts during inference, it incurs only a minimal increase in computational cost. Our AW-MoE inherits this property. As shown in Table V, AW-MoE introduces negligible impact on inference speed and FLOPs when extended to different baselines. This efficiency stems from the lightweight design of the Image-guided Weather-aware Routing module, which precisely selects the appropriate experts while adding only marginal computational overhead. Furthermore, the table indicates that the parameter overhead introduced by this design remains within an acceptable range, ensuring its viability for practical deployment. TABLE VI: Performance comparison between Weather-Specific GT Sampling (WSGTS) and Weather-Agnostic GT Sampling (WAGTS). Non-normal denotes the aggregate of all non-normal weather conditions. Method IoU Metric Total Normal Overcast Fog Rain Sleet Light Snow Heavy Snow Non-normal WAGTS 0.3 APBEVAP_BEV 87.4 87.2 94.7 96.3 87.9 78.8 94.7 71.7 86.9 AP3DAP_3D 82.5 83.6 87.3 93.1 83.6 69.7 89.2 65.5 81.6 0.5 APBEVAP_BEV 83.2 82.1 90.3 96.1 84.4 70.4 93.8 67.6 83.4 AP3DAP_3D 59.3 58.1 60.6 77.7 63.6 37.4 63.6 50.2 60.1 WSGTS 0.3 APBEVAP_BEV 88.2 87.7 94.4 96.5 88.3 79.5 95.4 72.8 87.9 AP3DAP_3D 83.6 84.1 89.9 93.4 84.3 70.2 89.0 66.0 82.4 0.5 APBEVAP_BEV 84.2 82.7 91.6 96.1 85.5 72.2 94.6 67.9 84.7 AP3DAP_3D 60.0 59.0 67.1 73.4 63.3 41.9 67.3 50.8 60.8 Figure 7: Comparison of our AW-MoE, L4DR [13], and InterFusion [36] visualization results under Normal, Overcast, Rainy and Sleet weather conditions. Figure 8: Comparison of our AW-MoE, L4DR [13], and InterFusion[36] visualization results under Fog, Light Snow and Heavy Snow weather conditions. TABLE VII: Performance comparison of AW-MoE using different routing strategies: Point-cloud Feature-based Routing (PFR) and Image-guided Weather-aware Routing (IWR). Routing IoU Metric Total Normal Overcast Fog Rain Sleet Light Snow Heavy Snow PFR 0.3 APBEVAP_BEV 58.7 87.3 94.3 23.1 77.7 34.7 70.2 70.9 AP3DAP_3D 52.8 84.0 89.3 11.4 73.0 29.1 64.8 63.6 0.5 APBEVAP_BEV 55.6 82.1 91.4 22.9 74.4 31.4 69.6 66.9 AP3DAP_3D 35.4 57.8 66.5 6.2 52.4 16.6 46.6 49.9 IWR 0.3 APBEVAP_BEV 88.2 87.7 94.5 96.7 88.8 81.0 95.4 70.2 AP3DAP_3D 83.9 84.2 90.0 95.3 84.4 72.9 90.2 64.0 0.5 APBEVAP_BEV 84.2 82.8 91.6 96.3 85.3 75.0 94.7 66.4 AP3DAP_3D 61.5 59.0 67.2 85.7 63.5 43.3 70.1 53.1 TABLE VIII: Performance (AP3DAP_3D) comparison between AW-MoE training strategy and direct end-to-end training. Non-normal denotes the aggregate of all non-normal weather conditions. Training Strategy IoU Total Normal Overcast Fog Rain Sleet Light Snow Heavy Snow Non-normal Direct Training 0.3 74.9 80.2 84.1 86.5 75.9 53.2 75.8 53.8 72.8 0.5 54.2 54.5 63.3 64.3 53.5 38.8 58.8 41.1 52.3 AW-MoE Training Strategy 0.3 83.9 84.2 90.0 95.3 84.4 72.9 90.2 64.0 83.9 0.5 61.5 59.0 67.2 85.7 63.5 43.3 70.1 53.1 61.5 TABLE IX: Effects of different top-K values in Image-guided Weather-aware Routing (IWR). Non-normal denotes the aggregate of all non-normal weather conditions. Parameter K IoU Total Normal Overcast Fog Rain Sleet Light Snow Heavy Snow Non-normal K = 1 0.3 83.9 84.2 90.0 95.3 84.4 72.9 90.2 64.0 83.9 0.5 61.5 59.0 67.2 85.7 63.5 43.3 70.1 53.1 61.5 K = 2 0.3 84.2 84.9 90.0 95.0 84.8 72.8 88.4 64.8 82.8 0.5 61.1 59.3 67.2 81.5 63.7 44.2 67.8 52.7 62.5 K = 3 0.3 83.2 84.1 89.0 94.1 84.7 69.9 88.2 64.4 82.2 0.5 61.0 59.3 69.0 79.7 63.5 42.7 70.2 53.1 62.3 TABLE X: Robustness analysis of IWR under ambiguous weather conditions with varying Top-K values. (0.3 / 0.5) indicates the IoU value. Parameter K Metric Total (0.3 / 0.5) Non-normal (0.3 / 0.5) K = 1 APBEVAP_BEV 82.8 / 79.1 86.0 / 82.0 AP3DAP_3D 77.0 / 53.3 80.1 / 56.0 K = 2 APBEVAP_BEV 84.3 / 79.9 87.8 / 82.7 AP3DAP_3D 77.3 / 55.8 80.5 / 58.3 IV-G Ablation Study IV-G1 Effectiveness Analysis of Weather-Specific GT Sampling In this section, we compare the proposed Weather-Specific GT Sampling (WSGTS) with traditional Weather-Agnostic GT Sampling (WAGTS). As shown in Table VI, WSGTS consistently outperforms WAGTS under both normal and adverse weather conditions. This improvement stems from WSGTS sampling ground-truth data exclusively from scenes matching the current weather, which avoids the insertion of mismatched GT that could compromise scene authenticity while still enabling effective data augmentation. IV-G2 Ablation on Expert Routing Table I compares the weather classification capabilities of Point-cloud Feature-based Routing (PFR) and Image-guided Weather-aware Routing (IWR). IWR achieves approximately 99% accuracy across all weather categories. In contrast, PFR struggles significantly in conditions like overcast, fog, and snow due to the inherent limitations of point clouds in capturing weather semantics. Furthermore, Table VII presents an ablation study on detection performance when integrating these routing methods into the MoE framework. IWR consistently achieves much higher detection accuracy than PFR across all weather conditions. This superior performance stems directly from IWR’s ability to accurately classify the weather and route features to the optimal expert module. Conversely, PFR’s poor routing accuracy severely degrades final detection performance. Together, these evaluations validate the effectiveness and ingenuity of the IWR design. IV-G3 Analysis of AW-MoE Training Strategy In Table VIII, we compare our AW-MoE training strategy with direct training. The results demonstrate that our strategy significantly improves detection performance, particularly under adverse weather conditions. For example, under fog, AP3DAP_3D at IoU=0.5 increases by 21.4%. This improvement stems from pre-training each Weather-Specific Expert (WSE) using all-weather data, allowing the WSEs to acquire basic 3D object detection capabilities before further fine-tuning within AW-MoE, effectively mitigating the challenges posed by limited adverse-weather data. IV-G4 Parameter K in Image-guided Weather-aware Routing We conducted an ablation study on the parameter K in the IWR module, with results shown in Table IX. Overall, K=1K=1 and K=2K=2 yield similar performance, and both outperform K=3K=3. To investigate the minimal difference between K=1K=1 and K=2K=2, we evaluated their performance under ambiguous weather conditions, where IWR is prone to misclassification (Table X). In these scenarios, K=2K=2 performs better than K=1K=1. Routing to multiple experts mitigates the impact of classification errors, thereby enhancing robustness. However, this advantage is negligible in the overall metrics (Table IX). Because IWR achieves approximately 99% classification accuracy, misclassified cases are too infrequent to significantly affect global performance. Consequently, to achieve an optimal balance between computational efficiency and detection accuracy, we set K=1K=1. IV-H Visualization Comparison To provide a more intuitive understanding, we visually compare our AW-MoE against the L4DR [13] and InterFusion [36] baselines across various weather conditions (Fig. 7 and Fig. 8). The visualizations demonstrate two key improvements. First, AW-MoE effectively reduces missed detections caused by adverse weather (highlighted by red circles). Second, it regresses higher-quality 3D bounding boxes that align more closely with the ground truth (GT) (highlighted by blue circles). These enhancements stem from the AW-MoE design, which successfully mitigates the distribution discrepancy across different weather conditions. V Conclusion In this paper, we propose AW-MoE, the first framework to incorporate Mixture of Experts (MoE) for 3D detection under adverse weather, effectively addressing performance conflicts caused by large inter-weather data discrepancies in single-branch detectors. Specifically, the proposed Image-guided Weather-aware Routing (IWR) leverages the distinct visual characteristics of camera images to classify weather conditions. This ensures precise data routing to the optimal expert model, effectively overcoming the inherent limitations of point-cloud-based routing. Extensive experiments on the K-Radar dataset demonstrate the superiority and strong generalizability of AW-MoE. Overall, AW-MoE provides an effective framework for 3D object detection under adverse weather, enabling various detection algorithms to achieve optimal performance across different conditions, while incurring minimal impact on inference speed and computational cost. References [1] M. Bijelic, T. Gruber, F. Mannan, F. Kraus, W. Ritter, K. Dietmayer, and F. Heide (2020) Seeing through fog without seeing fog: deep multimodal sensor fusion in unseen adverse weather. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p. 11682–11692. Cited by: §I-B. [2] Y. Chae, H. Kim, C. Oh, M. Kim, and K. Yoon (2024) Lidar-based all-weather 3d object detection via prompting and distilling 4d radar. In European Conference on Computer Vision, p. 368–385. Cited by: §I-B1. [3] Y. Chae, H. Kim, and K. Yoon (2024) Towards robust 3d object detection with lidar and 4d radar fusion in various weather conditions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p. 15162–15172. Cited by: §I, §I-B, §I-A, §I-B1, TABLE I, §IV-C. [4] Z. Chen, Z. Chen, Z. Li, S. Zhang, L. Fang, Q. Jiang, F. Wu, and F. Zhao (2024) Graph-detr4d: spatio-temporal graph modeling for multi-view 3d object detection. IEEE Transactions on Image Processing 33, p. 4488–4500. Cited by: §I-B. [5] J. Deng, S. Shi, P. Li, W. Zhou, Y. Zhang, and H. Li (2021) Voxel r-cnn: towards high performance voxel-based 3d object detection. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35, p. 1201–1209. Cited by: §I-A, §I-B1. [6] Y. Dong, C. Kang, J. Zhang, Z. Zhu, Y. Wang, X. Yang, H. Su, X. Wei, and J. Zhu (2023) Benchmarking robustness of 3d object detection to common corruptions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p. 1022–1032. Cited by: §I, §I, §I-B. [7] N. Du, Y. Huang, A. M. Dai, S. Tong, D. Lepikhin, Y. Xu, M. Krikun, Y. Zhou, A. W. Yu, O. Firat, et al. (2022) Glam: efficient scaling of language models with mixture-of-experts. In International conference on machine learning, p. 5547–5569. Cited by: §I-C. [8] W. Fedus, B. Zoph, and N. Shazeer (2022) Switch transformers: scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research 23 (120), p. 1–39. Cited by: §I-C. [9] M. Hahner, C. Sakaridis, D. Dai, and L. Van Gool (2021) Fog simulation on real lidar point clouds for 3d object detection in adverse weather. In Proceedings of the IEEE/CVF international conference on computer vision, p. 15283–15292. Cited by: §I, §I-B. [10] W. Huang, G. Xu, W. Jia, S. Perry, and G. Gao (2025) Revivediff: a universal diffusion model for restoring images in adverse weather conditions. IEEE Transactions on Image Processing. Cited by: §I-B. [11] X. Huang, J. Wang, Q. Xia, S. Chen, B. Yang, C. Wang, and C. Wen (2024) V2x-r: cooperative lidar-4d radar fusion for 3d object detection with denoising diffusion. arXiv e-prints, p. arXiv–2411. Cited by: §I-B. [12] X. Huang, H. Wu, X. Li, X. Fan, C. Wen, and C. Wang (2024) Sunshine to rainstorm: cross-weather knowledge distillation for robust 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, p. 2409–2416. Cited by: §I, §I-B. [13] X. Huang, Z. Xu, H. Wu, J. Wang, Q. Xia, Y. Xia, J. Li, K. Gao, C. Wen, and C. Wang (2025) L4dr: lidar-4dradar fusion for weather-robust 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39, p. 3806–3814. Cited by: Figure 2, §I, §I, §I-B, §I-A, §I-A, §I-B1, TABLE I, TABLE V, Figure 7, Figure 8, §IV-B, §IV-C, §IV-H. [14] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton (1991) Adaptive mixtures of local experts. Neural computation 3 (1), p. 79–87. Cited by: §I, §I-C, §I-B2. [15] H. Jing, A. Wang, Y. Zhang, D. Bu, and J. Hou (2026) Reflectance prediction-based knowledge distillation for robust 3d object detection in compressed point clouds. IEEE Transactions on Image Processing 35, p. 85–97. Cited by: §I. [16] L. Kong, Y. Liu, X. Li, R. Chen, W. Zhang, J. Ren, L. Pan, K. Chen, and Z. Liu (2023) Robo3d: towards robust and reliable 3d perception against corruptions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, p. 19994–20006. Cited by: §I-B. [17] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom (2019) Pointpillars: fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, p. 12697–12705. Cited by: §I, §I-A, §I-B, §I-B1. [18] D. Lepikhin, H. Lee, Y. Xu, D. Chen, O. Firat, Y. Huang, M. Krikun, N. Shazeer, and Z. Chen (2020) Gshard: scaling giant models with conditional computation and automatic sharding. arXiv preprint arXiv:2006.16668. Cited by: §I-C. [19] B. Lin, Z. Tang, Y. Ye, J. Cui, B. Zhu, P. Jin, J. Huang, J. Zhang, Y. Pang, M. Ning, et al. (2024) Moe-llava: mixture of experts for large vision-language models. arXiv preprint arXiv:2401.15947. Cited by: §I-C. [20] H. Lin, D. Pan, Q. Xia, H. Wu, C. Wang, S. Shen, and C. Wen (2025) Pretend benign: a stealthy adversarial attack by exploiting vulnerabilities in cooperative perception. In Proceedings of the IEEE/CVF International Conference on Computer Vision, p. 19947–19956. Cited by: §I-A. [21] Z. Liu, Z. Wu, and R. Tóth (2020) Smoke: single-stage monocular 3d object detection via keypoint estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, p. 996–997. Cited by: §I-A. [22] Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. L. Rus, and S. Han (2023) Bevfusion: multi-task multi-sensor fusion with unified bird’s-eye view representation. In 2023 IEEE international conference on robotics and automation (ICRA), p. 2774–2781. Cited by: §I, §I-C. [23] S. Masoudnia and R. Ebrahimpour (2014) Mixture of experts: a literature survey. Artificial Intelligence Review 42 (2), p. 275–293. Cited by: §I-C. [24] D. Paek, S. Kong, and K. T. Wijaya (2022) K-radar: 4d radar object detection for autonomous driving in various weather conditions. Advances in Neural Information Processing Systems 35, p. 3819–3829. Cited by: Figure 2, §I, §I, §I, §I-B, TABLE I, TABLE I, TABLE IV, TABLE IV, TABLE V, TABLE V, §IV-A, §IV-C, §IV-D. [25] D. Paek and S. Kong (2025) Availability-aware sensor fusion via unified canonical space for 4d radar, lidar, and camera. arXiv preprint arXiv:2503.07029. Cited by: §I. [26] K. Qian, S. Zhu, X. Zhang, and L. E. Li (2021) Robust multimodal vehicle detection in foggy weather using complementary lidar and radar signals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p. 444–453. Cited by: §I-B. [27] S. Rajbhandari, C. Li, Z. Yao, M. Zhang, R. Y. Aminabadi, A. A. Awan, J. Rasley, and Y. He (2022) Deepspeed-moe: advancing mixture-of-experts inference and training to power next-generation ai scale. In International conference on machine learning, p. 18332–18346. Cited by: §I-C. [28] R. Ranftl, A. Bochkovskiy, and V. Koltun (2021) Vision transformers for dense prediction. In Proceedings of the IEEE/CVF international conference on computer vision, p. 12179–12188. Cited by: §I-C. [29] C. Riquelme, J. Puigcerver, B. Mustafa, M. Neumann, R. Jenatton, A. Susano Pinto, D. Keysers, and N. Houlsby (2021) Scaling vision with sparse mixture of experts. Advances in Neural Information Processing Systems 34, p. 8583–8595. Cited by: §I-C. [30] J. Ryde and N. Hillier (2009) Performance of laser and radar ranging devices in adverse environmental conditions. Journal of Field Robotics 26 (9), p. 712–727. Cited by: §I-B. [31] N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean (2017) Outrageously large neural networks: the sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538. Cited by: §I, §I, §I-C. [32] S. Shi, C. Guo, L. Jiang, Z. Wang, J. Shi, X. Wang, and H. Li (2020) Pv-rcnn: point-voxel feature set abstraction for 3d object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, p. 10529–10538. Cited by: §I, §I-A, §I-B1. [33] S. Shi, X. Wang, and H. Li (2019) Pointrcnn: 3d object proposal generation and detection from point cloud. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, p. 770–779. Cited by: §I, §I-A. [34] W. Shi and R. Rajkumar (2020) Point-gnn: graph neural network for 3d object detection in a point cloud. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, p. 1711–1719. Cited by: §I-A. [35] J. Wang, F. Li, S. Lv, L. He, and C. Shen (2025) Physically realizable adversarial creating attack against vision-based bev space 3d object detection. IEEE Transactions on Image Processing 34, p. 538–551. Cited by: §I-A. [36] L. Wang, X. Zhang, B. Xv, J. Zhang, R. Fu, X. Wang, L. Zhu, H. Ren, P. Lu, J. Li, et al. (2022) InterFusion: interaction-based 4d radar and lidar fusion for 3d object detection. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), p. 12247–12253. Cited by: TABLE I, TABLE IV, TABLE V, Figure 7, Figure 8, §IV-C, §IV-D, §IV-H. [37] Y. Wang, J. Deng, Y. Li, J. Hu, C. Liu, Y. Zhang, J. Ji, W. Ouyang, and Y. Zhang (2023) Bi-lrfusion: bi-directional lidar-radar fusion for 3d dynamic object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p. 13394–13403. Cited by: §I-B. [38] H. Wu, H. Lin, X. Guo, X. Li, M. Wang, C. Wang, and C. Wen (2025) Motal: unsupervised 3d object detection by modality and task-specific knowledge transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, p. 6284–6293. Cited by: §I. [39] H. Wu, S. Zhao, X. Huang, C. Wen, X. Li, and C. Wang (2024) Commonsense prototype for outdoor unsupervised 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p. 14968–14977. Cited by: §I-A. [40] Q. Xia, J. Deng, C. Wen, H. Wu, S. Shi, X. Li, and C. Wang (2023) Coin: contrastive instance feature mining for outdoor 3d object detection with very limited annotations. In Proceedings of the IEEE/CVF International Conference on Computer Vision, p. 6254–6263. Cited by: §I-A. [41] Q. Xu, Y. Zhong, and U. Neumann (2022) Behind the curtain: learning occluded shapes for 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36, p. 2893–2901. Cited by: §I-A. [42] Q. Xu, Y. Zhou, W. Wang, C. R. Qi, and D. Anguelov (2021) Spg: unsupervised domain adaptation for 3d object detection via semantic point generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, p. 15446–15456. Cited by: §I-B. [43] Y. Yan, Y. Mao, and B. Li (2018) Second: sparsely embedded convolutional detection. Sensors 18 (10), p. 3337. Cited by: §I-A, §I-B1. [44] H. Yang, L. Li, J. Guo, B. Li, M. Qin, H. Yu, and T. Zhang (2025) DA3D: domain-aware dynamic adaptation for all-weather multimodal 3d detection. In Proceedings of the 33rd ACM International Conference on Multimedia, p. 2150–2158. Cited by: TABLE I, §IV-C. [45] Z. Yang, Y. Sun, S. Liu, and J. Jia (2020-06) 3DSSD: point-based 3d single stage object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §I-A. [46] Z. Yang, Y. Sun, S. Liu, X. Shen, and J. Jia (2019) Std: sparse-to-dense 3d object detector for point cloud. In Proceedings of the IEEE/CVF international conference on computer vision, p. 1951–1960. Cited by: §I-A. [47] T. Yin, X. Zhou, and P. Krahenbuhl (2021) Center-based 3d object detection and tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, p. 11784–11793. Cited by: §I, §I-A, §I-B, §I-B1. [48] C. Zhang, W. Chen, W. Wang, and Z. Zhang (2024) MA-st3d: motion associated self-training for unsupervised domain adaptation on 3d object detection. IEEE Transactions on Image Processing 33, p. 6227–6240. Cited by: §I-B. [49] N. Zhao, P. Qian, F. Wu, X. Xu, X. Yang, and G. H. Lee (2024) SDCoT++: improved static-dynamic co-teaching for class-incremental 3d object detection. IEEE Transactions on Image Processing 34, p. 4188–4202. Cited by: §I-A. [50] Y. Zhou, T. Lei, H. Liu, N. Du, Y. Huang, V. Zhao, A. M. Dai, Q. V. Le, J. Laudon, et al. (2022) Mixture-of-experts with expert choice routing. Advances in Neural Information Processing Systems 35, p. 7103–7114. Cited by: §I-C. [51] Y. Zhou and O. Tuzel (2018) Voxelnet: end-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, p. 4490–4499. Cited by: §I-A.