← Back to papers

Paper deep dive

Rethinking Multimodal Fusion for Time Series: Auxiliary Modalities Need Constrained Fusion

Seunghan Lee, Jun Seo, Jaehoon Lee, Sungdong Yoo, Minjae Kim, Tae Yoon Lim, Dongwan Kang, Hwanil Choi, SoonYoung Lee, Wonbin Ahn

Year: 2026Venue: arXiv preprintArea: cs.LGType: PreprintEmbeddings: 214

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 94%

Last extracted: 3/26/2026, 1:33:40 AM

Summary

The paper identifies that naive multimodal fusion strategies (e.g., simple addition or concatenation) in time series (TS) forecasting often underperform unimodal models due to the integration of irrelevant auxiliary information. The authors propose 'Controlled Fusion Adapter' (CFA), a plug-in method that uses low-rank adapters to filter irrelevant textual information and injects it into TS backbones via residual connections. Extensive experiments across 20K configurations demonstrate that constrained fusion, particularly CFA, consistently outperforms naive methods.

Entities (5)

Controlled Fusion Adapter · method · 100%Constrained Fusion · method · 95%LoRA · technique · 95%Naive Fusion · method · 95%Time Series Forecasting · task · 95%

Relation Signals (3)

Controlled Fusion Adapter improves Time Series Forecasting

confidence 95% · CFA achieves the best performance, indicating that controlled integration of textual information is crucial for robust multimodal TS forecasting.

Controlled Fusion Adapter uses LoRA

confidence 95% · To implement this, we adopt a LoRA-style parameterization [9], enabling lightweight integration into diverse TS backbones.

Naive Fusion underperforms Unimodal TS models

confidence 90% · multimodal models with naive fusion strategies (e.g., simple addition or concatenation) often underperform unimodal TS models

Cypher Suggestions (2)

Find all methods that improve time series forecasting performance. · confidence 90% · unvalidated

MATCH (m:Method)-[:IMPROVES]->(t:Task {name: 'Time Series Forecasting'}) RETURN m.name

Identify the relationship between fusion methods and their performance relative to unimodal baselines. · confidence 85% · unvalidated

MATCH (f:Method)-[r:PERFORMS_RELATIVE_TO]->(u:Method {name: 'Unimodal TS models'}) RETURN f.name, r.type

Abstract

Abstract:Recent advances in multimodal learning have motivated the integration of auxiliary modalities such as text or vision into time series (TS) forecasting. However, most existing methods provide limited gains, often improving performance only in specific datasets or relying on architecture-specific designs that limit generalization. In this paper, we show that multimodal models with naive fusion strategies (e.g., simple addition or concatenation) often underperform unimodal TS models, which we attribute to the uncontrolled integration of auxiliary modalities which may introduce irrelevant information. Motivated by this observation, we explore various constrained fusion methods designed to control such integration and find that they consistently outperform naive fusion methods. Furthermore, we propose Controlled Fusion Adapter (CFA), a simple plug-in method that enables controlled cross-modal interactions without modifying the TS backbone, integrating only relevant textual information aligned with TS dynamics. CFA employs low-rank adapters to filter irrelevant textual information before fusing it into temporal representations. We conduct over 20K experiments across various datasets and TS/text models, demonstrating the effectiveness of the constrained fusion methods including CFA. Code is publicly available at: this https URL.

Tags

ai-safety (imported, 100%)cslg (suggested, 92%)preprint (suggested, 88%)

Links

Your browser cannot display the PDF inline. Open PDF directly →

Full Text

213,949 characters extracted from source content.

Expand or collapse full text

Rethinking Multimodal Fusion for Time Series: Auxiliary Modalities Need Constrained Fusion Seunghan Lee, Jun Seo, Jaehoon Lee, Sungdong Yoo, Minjae Kim Tae Yoon Lim, Dongwan Kang, Hwanil Choi, SoonYoung Lee, Wonbin Ahn LG AI Research, Seoul, South Korea Abstract Recent advances in multimodal learning have motivated the integration of auxil- iary modalities such as text or vision into time series (TS) forecasting. However, most existing methods provide limited gains, often improving performance only in specific datasets or relying on architecture-specific designs that limit generaliza- tion. In this paper, we show that multimodal models with naive fusion strategies (e.g., simple addition or concatenation) often underperform unimodal TS models, which we attribute to the uncontrolled integration of auxiliary modalities which may introduce irrelevant information. Motivated by this observation, we explore various constrained fusion methods designed to control such integration and find that they consistently outperform naive fusion methods. Furthermore, we propose Controlled Fusion Adapter (CFA), a simple plug-in method that enables con- trolled cross-modal interactions without modifying the TS backbone, integrating only relevant textual information aligned with TS dynamics. CFA employs low- rank adapters to filter irrelevant textual information before fusing it into temporal representations. We conduct over 20K experiments across various datasets and TS/text models, demonstrating the effectiveness of the constrained fusion methods including CFA. Code is publicly available at: https://github.com/seunghan96/cfa/. 1 Introduction Time series (TS) forecasting is widely used across domains such as finance [30], traffic [5], and climate [1]. With the emergence of large language models (LLMs), multimodal TS forecasting has gained increasing attention [11]. Recent studies attempt to enhance TS forecasting by incorporating auxiliary modalities, including text [27,38,44,25], vision [42,8], and tabular [2], under the assumption that such contextual information enhances the modeling of temporal dependencies. Existing methods often adopt naive fusion strategies (e.g., simple addition or concatenation) [17, 13,20] without carefully considering how modalities should interact, or rely on model-specific architectures [27,38,25] that are not easily integrated into unimodal TS models. Additionally, prior work shows that fusion does not consistently improve performance [13], attributing this to the difficulty of alignment among modalities. Moreover, even when improvements are reported, we find that they often depend on the dataset or the TS model, indicating limitations of naive fusion strategies. CFA Gating FiLM Orthogonal Middle Last w/o fusion Middle FirstFirst Last 0.3 0.4 0.5 0.6 Normalized MSE 14 TS models × 4 Text models × 9 Datasets × 4 Horizons Unimodal baseline Average over 2K settings Fusion Constrained Naive-Additive Naive-Concat Figure 1: Constrained vs. naive fusion. In this paper, we conduct extensive experiments across diverse datasets and models (Table 3), evaluating fusion at the first, middle, and final layers with additive and concatenation oper- ators. As shown in Figure 1, we find that multimodal models with naive fusion often underperform unimodal models, as both concatenation-based (red) and additive (blue) fusion fre- quently fall below the unimodal (gray) baseline. Note that each setting is tuned with 10 learning rates, selecting the best- performing configuration. This observation motivates the need for improved fusion mechanisms, suggesting that more careful and controlled fusion between TS and text is required. Preprint. arXiv:2603.22372v1 [cs.LG] 23 Mar 2026 We attribute this phenomenon to the intrinsic nature of multimodal TS forecasting, where TS serve as the primary modality, while other modalities serve as auxiliary modalities that provide contextual guidance [31,42,18], which may introduce irrelevant or conflicting information [19] that is mis- aligned with TS. Therefore, indiscriminate fusion can degrade forecasting performance, highlighting the necessity of constrained fusion, which we refer to as a fusion strategy that incorporates auxiliary signals in a controlled manner while preserving core temporal (TS) representations. To this end, we explore various constrained fusion methods (Section 3.2), which consistently out- perform naive methods, as shown in Figure 1 (green). Furthermore, we propose Controlled Fusion Adapter (CFA), a plug-in method that injects textual information via a residual connection constrained to a low-rank subspace, filtering irrelevant textual information and enabling incorporation of auxiliary signals while preserving temporal dynamics. CFA outperforms other constrained fusion methods and remains effective even when naive fusion degrades performance. The main contributions are: •We show that naive multimodal fusion often underperforms unimodal TS forecasting and demon- strate that constrained fusion consistently improves over naive strategies across diverse datasets and models, where we evaluate four constrained fusion methods, including our proposed method. •We propose Controlled Fusion Adapter (CFA), a simple yet effective fusion method for multimodal TS forecasting, which injects auxiliary textual information into TS representations via a residual connection constrained to a low-rank subspace, enabling controlled integration without modifying the backbone. In contrast to prior multimodal TS forecasting works that rely on architecture-specific designs, our method is a plug-in module, making it generally applicable to any unimodal TS model. •We conduct over 20K experiments across various settings (9 multimodal datasets, 4 forecasting horizons, 14 TS models, and 4 text models) with 10 fusion strategies, demonstrating the effective- ness of constrained fusion including CFA. Additionally, we provide detailed analyses explaining why constrained fusion strategies yield superior performance compared to naive fusion strategies. 2 Related Works TS forecasting models. Recent TS forecasting methods employ Transformers [33] to capture temporal and channel dependencies. PatchTST [26] segments TS into patches and adopts channel- independent modeling. Crossformer [41] captures channel interactions via hierarchical attention. iTransformer [23] applies attention across features to model channel dependencies. Nonstationary Transformer [22] models non-stationarity within attention to address distribution shifts. Several works adopt lightweight architectures without attention, where DLinear [40] uses linear decomposition to capture trend and seasonal components. TSMixer [3] employs MLP-based mixing across temporal and feature dimensions. TiDE [6] employs an MLP-based encoder-decoder with temporal embeddings. FiLM [45] introduces a Frequency-improved Legendre Memory model with Legendre projection and Fourier denoising. Koopa [24] learns latent linear dynamics through Koopman operator. Multimodal TS forecasting models. Recent work integrates external modalities with TS to enrich forecasting with contextual information [12,37,11]. UniCast [27] combines pretrained vision and text encoders with a frozen TS foundation model. CAPTime [38] aligns TS representations with LLM-derived textual context within a probabilistic forecasting framework. BALM-TSF [44] mitigates modality imbalance by aligning TS and textual embeddings before fusion. SpecTF [25] performs frequency-domain fusion by projecting textual embeddings into the spectral space and integrating them with TS components. Multi-Modal Forecaster [13] jointly models TS and text through shared embeddings. Time-VLM [42] leverages pretrained vision-language models to construct multimodal representations. TimeCMA [20] performs cross-modality alignment with dual encoding branches. GPT4MTS [10] employs LLMs to generate task-aware textual prompts to guide TS forecasting. T3Time [4] employs temporal, spectral, and prompt features with an adaptive gating mechanism. Table 1: Multimodal forecasting models. TypeFusionMethods [4, 20, 25, 38, 42, 44] Plug-in Naive[10, 12, 13, 17, 21, 27] ConstrainedCFA (Ours) Architecture-specific As shown in Table 1, these works either adopt naive fusion strategies or focus on designing architecture- specific models for multimodal forecasting, which are not generally applicable to existing unimodal TS models. While TaTS [17] can be plugged into existing unimodal TS models, it adopts a naive first-layer additive fusion scheme included as a baseline in our experiments. Addi- tionally, although ContextFormer [2] can be applied to arbitrary TS encoders, its module accounts for over 75% of the total parameters due to multiple cross-attention layers, limiting general applicability. 2 Add Text into TS in a controlled manner! LLM Projection TS model LLM Projection Time-series Text or Naive Fusion F F F F : TS model Time-series Text Constrained Fusion Additive Concat Controlled Fusion Adapter (CFA) First Mid Last Plug-in method C Naive: Fuse(TS, Text) Constrained: Fuse(TS, Controlled Text) Controlled Integration Low-rank subspace To filter irrelevant info. Enc. Layer Figure 2: Comparison of multimodal fusion strategies for TS. (Left) Naive fusion applies simple additive or concatenation operators at first, middle, or last stages without considering modality relevance. (Right) Constrained fusion incorporates textual information in a controlled manner by considering its relevance to TS. CFA injects textual signals via a residual connection constrained to a low-rank subspace to filter irrelevant information while preserving TS representations. 3 Necessity of Constrained Fusion for Multimodal TS Forecasting In multimodal forecasting, a model predicts future valuesy = (x L+1 ,..., x L+H )given a lookback windowx = (x 1 ,..., x L )and a paired text sequencet = (t 1 ,..., t L ). Eachx i ∈R C denotes observations at time stepi, whereL,H, andCrepresent the lookback length, forecast horizon, and number of channels, respectively. Each textt i is encoded by a language model asz Text,i = g Text t i ), forming a text embedding Z Text = (z Text,1 ,..., z Text,L ), which is then fused with a TS embedding. 3.1 Naive Fusion Algorithm 1 Multimodal TS forecasting Input:X = [X 1 , . . . ,X L ],T = [T 1 , . . . ,T L ] Output: ˆ Y = [ ˆ X L+1 , . . . , ˆ X L+H ] 1:Z TS ← g TS (X)// Input projection (TS) 2:Z Text ← g ( TextT)// Input projection (Text) 3:Z TS ←F(Z TS ,Z Text )// First fusion 4: for m in encoder layers do 5:Z TS ← f(Z TS ) 6:Z TS ←F(Z TS ,Z Text )// Middle fusion 7: end for 8:Z TS ←F(Z TS ,Z Text )// Last fusion 9: ˆ Y ← h(Z TS )// Output projection (TS) Naive fusion integrates textual information us- ing a fusion operator (e.g., addition or con- catenation), where the text embeddings pro- duced by the text encoder are directly fused with TS embeddings without any constraint. Fu- sion can be applied before the encoder (first fu- sion), within intermediate encoder layers (mid- dle fusion), or after temporal encoding (last fu- sion), as shown in Algorithm 1. Many multi- modal TS forecasting methods follow this ap- proach [10,12,13,17,21,27], whereas meth- ods that do not follow this paradigm typically adopt architecture-specific designs to fuse differ- ent modalities [4, 20, 25, 38, 42, 44], which are not generally applicable to existing TS models. 3.2 Constrained Fusion In this paper, we argue that TS forecasting primarily relies on learning temporal representations, while auxiliary modalities provide contextual guidance that may contain information irrelevant to TS. However, naive fusion methods do not account for this misalignment, which may disrupt temporal representations and underperform a unimodal (TS-only) model. To investigate this effect, we explore various constrained fusion strategies 1 , which incorporate auxiliary information from other modalities in a constrained manner while preserving temporal representations as follows: • [1] Gating mechanism determines the relevance of text information at each time step using a learned gate, allowing the model to utilize only the necessary textual information. •[2] FiLM (Feature-wise Linear Modulation) 2 [28] modulates the scale and bias of TS embeddings based on text embeddings, preserving temporal structure while adjusting feature representations. •[3] Orthogonal fusion projects the text embedding into the TS embedding space and injects only the orthogonal component, explicitly preserving temporal information without overwriting it. Naive fusion vs. Constrained fusion. As shown in Figure 1, these constrained fusion strategies consistently outperform naive fusion strategies across various settings. Nonetheless, they can still underperform unimodal baselines under certain settings, as shown in Table 4. This observation motivates the need to more effectively filter irrelevant textual signals while preserving informative ones. A comparison of naive and constrained fusion is provided in Figure 2. 1 Details of the constrained fusion strategies are discussed in Appendix F. 2 Note that FiLM [28] is different from (TS model) FiLM [45], which is discussed in Section 3.2. 3 3.3 Controlled Fusion Adapter (CFA) Building on these insights, we propose Controlled Fusion Adapter (CFA), a model-agnostic fusion method where text embeddings are projected through a low-dimensional bottleneck and added to the TS embedding as a small residual. It is important to note that our goal is not to develop a novel adapter but to design a controlled fusion formulation that constrains the textual signal ef- fectively. To implement this, we adopt a LoRA-style parameterization [9], enabling lightweight integration into diverse TS backbones. The low-rank bottleneck limits textual capacity and en- courages retention of information useful for TS forecasting while filtering irrelevant components. This design preserves the TS backbone and enables text-guided integration into any TS model as: z Adapter,t = W down z Text,t ∈R D/r (1a) z ′ Adapter,t = ReLU(LayerNorm(z Adapter,t ))(1b) z ′ Adapter,t = W up z ′ Adapter,t ∈R D (1c) ̃ z TS,t = z TS,t + z ′ Adapter,t .(1d) Table 2: Comparison of constrained fusions. Methods Gatingz TS,t + g t ⊙ z Text,t FiLMγ t ⊙ z TS,t +β t Orthogonalz TS,t + z ⊥ Text,t CFAz TS,t + W up φ(W down z Text,t ). Here, ̃ z TS,t denotes the fused embedding at time stept, and the residual addition is applied at each encoder layer of any TS model. The bottleneck dimensionD/rcontrols parameter efficiency, where we setr = 8. Note that initializingW up near zero ensures that textual influence is minimal at the early stages of training. A comparison of how the four constrained fusion strategies construct the fused embedding is shown in Table 2, and robustness to the choice of r is discussed in Appendix I. Role of low-rank bottleneck. To understand the role of the bottleneck, we conduct a toy experiment examining how it affects fusion under 1) informative, 2) contradicting, and 3) irrelevant texts. The results show that the low-rank projection suppresses misleading textual signals while preserving useful guidance. Detailed analyses are provided in Appendix E, with three complementary analyses: •a) Performance comparison (E.1). We compare forecasting performance w/ and w/o the low-rank bottleneck across matching, contradicting, and irrelevant text. The results show that the low-rank bottleneck consistently improves performance and provides the largest gain when text is irrelevant. •b) Representation analysis (E.2). We measure the text-contribution ratio at the adapter to quantify how much textual signal survives the bottleneck. Matching text exhibits stronger contribution than contradicting text, indicating that the bottleneck selectively preserves useful information. • c) TS visualization (E.3). Forecast trajectories show that the bottleneck prevents misleading predictions under contradicting text while preserving accurate forecasts when text is helpful. A theoretical perspective showing how the low-rank bottleneck constrains the textual signals to a low-dimensional subspace during fusion is provided in Appendix G. 4 Experiments Experimental settings. To evaluate the generality of the proposed method, we conduct experiments on 9 multimodal datasets [21], 14 TS backbones, and 4 language models across 4Hs defined by data frequency, as shown in Table 3. In all experiments, only the TS model is trained while the text model remains frozen. Each setting is evaluated over 10 learning rates, reporting the best result. Following Time-MMD [21], datasets are split into train, validation, and test sets with a ratio of 7:1:2, and the model with the lowest validation error is selected. Performance is evaluated using MSE and MAE. Table 3: Experimental settings across 20K configurations (2K settings × 10 fusion methods) Experimental settings ([1]× [2]× [3] > 2K) [1] Datasets (Time-MMD [21])Agriculture, Climate, Economy, Energy, Environment, Public Health, Security, Social Good, Traffic [2] Models TS Transformer Nonstationary Transformer [22], PatchTST [26], iTransformer [23], Crossformer [41], FEDformer [46], Autoformer [36], Reformer [15], Informer [43], Transformer [33] Linear/MLPDLinear [40], TiDE [6], TSMixer [3] OthersKoopa [24], FiLM [45] TextBERT [7], GPT2 [29], Llama3 [32], Doc2Vec [16] [3] Forecasting horizons (H )[Daily]: 48, 96, 192, 336. [Weekly]: 12, 24, 36, 48. [Monthly]: 6, 8, 10, 12. Fusion methods (10) Naive AdditiveFirst [17, 10], Middle, Last [21] ConcatFirst [27, 13, 12], Middle, Last ConstrainedOrthogonal, FiLM [28], Gating [42, 4], CFA (Ours) 4 Table 4: Comparison of multimodal fusion strategies for TS forecasting. While naive fusion strategies often underperform the unimodal baseline, constrained fusion methods consistently perform better, with our CFA showing robust improvements across diverse settings. Red and blue denote improvements and degradations over the unimodal model, respectively, averaged across four H s. TS Model Dataset Unimodal (w/o text) Naive Constrained AdditiveConcat First [17, 10] Middle (—) Last [21] First [27, 13, 12] Middle (—) Last (—) Orthogonal (—) FiLM [28] Gating [42, 4] CFA (Ours) (1) Transformer-based methods Nonstationary Transformer [22] Agriculture0.0840.0890.0820.0790.0900.0840.8920.0840.0850.0850.082 Climate1.2481.2871.2291.2101.2151.2301.1531.2641.2081.2321.170 Economy0.0190.0200.0200.0200.0200.0194.5620.0190.0190.0200.019 Energy0.2470.2680.2560.2570.2520.2520.5720.2400.2430.2430.244 Environment0.4370.4780.4470.4370.4470.4530.4610.4410.4360.4470.442 Public Health1.2101.2241.1411.1761.2711.2401.4651.2351.2851.1531.086 Security106.4106.4107.8107.4106.1107.0121.5105.7105.3106.1105.8 Social Good0.9420.9400.9180.8931.0000.9102.3620.9930.9320.9250.910 Traffic0.1990.2010.2010.2100.2010.1940.4190.2020.2010.1980.197 Win rate (vs. Unimodal) (%)11.144.444.422.233.311.133.355.666.788.9 PatchTST [26] Agriculture0.0920.0920.0930.0880.0920.0930.5470.0930.0920.0930.091 Climate1.2591.2541.2431.2601.2541.2591.1691.2431.2651.2361.252 Economy0.0170.0170.0170.0190.0170.017Div. * 0.0170.0180.0180.017 Energy0.2640.2650.2630.2520.2650.2580.7230.2620.2570.2600.263 Environment0.5020.5020.4470.4920.5020.4710.5830.4560.4510.4550.442 Public Health1.3901.3881.4131.1941.3881.4181.4721.4001.3841.3941.390 Security109.5108.8106.7108.0108.8107.5128.7106.5108.5108.0109.1 Social Good0.9550.9660.9430.8900.9660.9482.1950.9430.9390.9540.920 Traffic0.1990.1980.2010.2100.1980.1970.7440.1980.1950.1950.201 Win rate (vs. Unimodal) (%)44.466.766.744.455.611.166.766.766.788.9 (2) Linear/MLP-based methods DLinear [40] Agriculture0.2050.1830.1880.1910.6510.7460.6090.1690.2590.1880.124 Climate1.2251.3241.3091.2831.2271.1671.1591.3121.0901.3121.098 Economy0.1290.1160.1240.126Div. * Div. * Div. * 0.1100.2480.1250.040 Energy0.2630.3260.3140.2910.6700.3820.3680.2980.3130.2990.243 Environment0.5450.4910.4890.4790.5280.5010.4970.5490.5000.4850.546 Public Health1.5541.6041.5691.5491.7901.6371.5701.7361.6821.5171.530 Security109.0110.7109.8109.2142.6137.6136.1110.1108.6109.3108.8 Social Good0.9341.0961.0691.0681.9281.5871.4911.0370.9811.0260.904 Traffic0.2760.3340.3580.3430.8410.8830.7570.3010.3330.3400.233 Win rate (vs. Unimodal) (%)33.333.344.411.122.222.222.233.344.488.9 TiDE [6] Agriculture0.108Div. * 0.1070.099Div. * 0.1070.8240.0960.0960.0950.095 Climate1.469Div. * 1.4871.5062.9541.5051.2311.2611.2671.2551.264 Economy0.035Div. * 0.0340.042Div. * 0.038Div. * 0.0170.0170.0180.017 Energy0.287Div. * 0.2900.291Div. * 0.2880.7680.2560.2620.2590.260 Environment0.534Div. * 0.5350.515Div. * 0.5340.5550.5340.5330.5330.533 Public Health1.568Div. * 1.5611.643Div. * 1.5881.7521.4591.4761.4761.463 Security138.3143.9135.4134.5140.1132.2146.1109.7109.7111.7110.1 Social Good1.185Div. * 1.0831.143126.81.1152.1251.0041.0251.0161.001 Traffic0.295Div. * 0.2790.3256.0160.3020.4720.2450.2470.2450.244 Win rate (vs. Unimodal) (%)0.066.744.40.044.411.1100.0100.0100.0100.0 (3) Others Koopa [24] Agriculture0.0910.0840.0840.084Div. * 0.6990.6990.0920.1230.0850.090 Climate1.2495.2191.2571.2572.6021.1471.1471.2841.0881.2521.248 Economy0.0180.0280.0240.024Div. * Div. * Div. * 0.0280.0510.0210.018 Energy0.253Div. * 0.2520.252Div. * 0.5340.5340.2700.2620.2470.253 Environment0.526Div. * 0.4970.497Div. * 0.4700.4700.5300.4830.4960.527 Public Health1.442Div. * 1.1841.184Div. * 1.3811.3811.6371.3671.1761.439 Security108.5108.2107.6107.6130.1131.6131.6110.1111.8106.6107.0 Social Good0.9640.9960.9760.9762.1802.0972.0970.9930.8410.9180.959 Traffic0.2330.2560.2440.2440.4310.3660.3660.2490.2390.2420.219 Win rate (vs. Unimodal) (%)22.255.655.60.033.333.30.044.466.788.9 FiLM [45] Agriculture0.0985.4600.0950.0921.3130.0990.8960.0960.0970.0950.094 Climate1.2868.7991.2861.2746.6231.2911.1681.2801.2821.2741.264 Economy0.018Div. * 0.0220.029Div. * 0.0229.1400.0180.0180.0180.018 Energy0.271Div. * 0.2850.2940.8070.3070.7370.2600.2680.2750.259 Environment0.534Div. * 0.5040.512Div. * 0.4920.5650.5350.5090.4930.532 Public Health1.501Div. * 1.5311.6152.5891.6461.7261.5051.5291.4931.490 Security118.5122.3106.7114.7Div. * 107.0144.7108.8107.4107.2110.0 Social Good1.057Div. * 0.9931.032Div. * 1.0482.2891.0131.0190.9901.018 Traffic0.2300.5870.2310.2391.0640.2370.6980.2560.2330.2380.234 Win rate (vs. Unimodal) (%)0.055.655.60.033.311.155.677.866.788.9 ∗ Divergence (Div. * ) indicates cases where the MSE exceeds that of the unimodal baseline by more than 10×. 4.1 Multimodal Time Series Forecasting Table 4 presents the forecasting performance (MSE) of multimodal fusion strategies across TS models and datasets using BERT [7] as the language model. Due to space limitations, we report six TS models [22,26,40,6,24,45], selecting the two representative methods from each model category with the highest performance. Overall, naive fusion methods, whether additive or concatenation- based, frequently fail to outperform the unimodal baseline and even lead to severe degradation or divergence. In contrast, constrained fusion methods consistently yield more stable and better performance across models and domains. Notably, CFA achieves the best performance, indicating that controlled integration of textual information is crucial for robust multimodal TS forecasting. Brief and full results for all TS/text models are presented in Section 4.2 and Appendix K, respectively. Additionally, comparison with other (architecture-specific) methods are shown in Appendix J. 5 CFA w/o fusion Orthogonal Gating Last Middle FiLM First Middle First Last 0.2 0.4 0.6 0.8 Algriculture Last FiLM CFA Middle Gating First w/o fusion Orthogonal Middle Last First 0.3 0.4 0.5 0.6 0.7 Climate CFA w/o fusion Orthogonal Gating Middle FiLM Last First Middle First Last 0.2 0.4 0.6 0.8 Economy Orthogonal Gating CFA FiLM w/o fusion MiddleMiddle Last FirstFirst Last 0.2 0.4 0.6 Energy CFA Gating FiLM MiddleMiddle Orthogonal w/o fusion First LastLast First 0.2 0.3 0.4 0.5 0.6 Environment CFA Last Gating w/o fusion Middle FiLM Middle Orthogonal Last FirstFirst 0.3 0.4 0.5 0.6 Public Health CFA Gating FiLM Orthogonal w/o fusion Middle First Middle Last First Last 0.4 0.6 0.8 Security CFA Last w/o fusion FiLM Gating Middle Orthogonal Middle First Last First 0.2 0.3 0.4 0.5 0.6 Social Good CFA Gating w/o fusion FiLM Orthogonal Last MiddleMiddle FirstFirst Last 0.2 0.4 0.6 Traffic Performance of Multimodal Fusion (Average of 14 TS models × 4 Text models × 4 Horizons) ConstrainedNaive-AdditiveNaive-Concat (a) Performance by Dataset. CFA Middle Gating FiLM Orthogonal Middle w/o fusion Last FirstFirst Last 0.2 0.4 0.6 Autoformer CFA w/o fusion Gating FiLM Middle Orthogonal Middle First LastLast First 0.3 0.4 0.5 0.6 Crossformer CFA w/o fusion Gating FiLM Last Middle First Orthogonal Last Middle First 0.25 0.50 0.75 DLinear CFA Gating Middle Orthogonal Middle FiLM w/o fusion Last FirstFirst Last 0.2 0.4 0.6 0.8 FEDformer Gating CFA FiLM Middle Orthogonal Middle w/o fusion LastLast FirstFirst 0.2 0.4 FiLM First Gating FiLM w/o fusion Last Orthogonal CFA MiddleMiddle Last First 0.4 0.5 0.6 Informer Gating FiLM CFA Middle Last w/o fusion Orthogonal First Last Middle First 0.2 0.4 0.6 Koopa CFA Gating FiLM w/o fusion MiddleMiddle Last First Orthogonal First Last 0.25 0.50 0.75 Nonstationary Transformer Last Orthogonal Gating Middle CFA FiLM Middle FirstFirst w/o fusion Last 0.25 0.50 0.75 PatchTST FiLM CFA Orthogonal w/o fusion Gating Last First Middle Last First Middle 0.4 0.5 0.6 Reformer CFA w/o fusion Gating Middle Last Orthogonal FirstFirst Last FiLM Middle 0.2 0.4 0.6 TSMixer Orthogonal FiLM CFA Gating MiddleMiddle w/o fusion LastLast FirstFirst 0.0 0.2 0.4 0.6 TiDE Gating w/o fusion FiLM CFA First Middle Orthogonal Middle Last First Last 0.4 0.5 0.6 Transformer Gating First Last Orthogonal CFA MiddleMiddle FiLM w/o fusion First Last 0.25 0.50 0.75 iTransformer Performance of Multimodal Fusion (Average of 9 Datasets × 4 Text models × 4 Horizons) ConstrainedNaive-AdditiveNaive-Concat (b) Performance by TS models. CFA Gating FiLM w/o fusion Orthogonal Last MiddleMiddle FirstFirst Last 0.2 0.3 0.4 0.5 0.6 BERT CFA Gating w/o fusion FiLM Orthogonal Middle Last Middle FirstFirst Last 0.2 0.3 0.4 0.5 0.6 GPT2 CFA Gating w/o fusion FiLM Orthogonal Middle Last Middle FirstFirst Last 0.2 0.3 0.4 0.5 0.6 Llama3 CFA Gating FiLM w/o fusion Orthogonal Middle Last Middle FirstFirst Last 0.2 0.3 0.4 0.5 0.6 Doc2Vec Performance of Multimodal Fusion (Average of 9 Datasets × 4 Text models × 4 Horizons) ConstrainedNaive-AdditiveNaive-Concat (c) Performance by Text models. MethodNormalized MSE Concat - F/M/L0.484/0.364/0.621 Additive - F/M/L0.421/0.333/0.334 w/o fusion0.349 Orth./FiLM/Gating0.317/0.275/0.261 CFA (Ours)0.256 (d) Average performance. Figure 3: Performance across diverse settings. (a), (b), and (c) show the performance (normalized MSE) by Dataset, TS model, and Text model, respectively. (d) shows the overall average perfor- mance across all settings. CFA (⋆) consistently achieves the lowest MSE among fusion strategies. 4.2 Performance with Various Settings In this section, we evaluate the proposed method across diverse 1) datasets, 2) TS backbones, and 3) text encoders to verify its general effectiveness. To account for scale differences across datasets, we report normalized MSE averaged over 10 fusion strategies (6 naive & 4 constrained fusion strategies). [1] Various datasets (Figure 3a). We conduct experiments on 9 real-world multimodal datasets [21] for TS forecasting. Across datasets, constrained fusion strategies (green) generally outperform the additive (blue) and concatenation-based (red) fusion. While many fusion methods exhibit dataset-dependent behavior (i.e., improving on some datasets and underperforming on others), CFA consistently outperforms the unimodal model on all datasets and ranks first on 7 of the 9 datasets. [2] Various TS models (Figure 3b). We evaluate the effectiveness of our method across diverse TS backbones (e.g., Transformer-based, linear/MLP-based). Similar to the dataset-level analysis, constrained fusion strategies (green) generally outperforms the additive (blue) and concatenation- based (red) approaches. Specifically, CFA improves over the unimodal baseline on 13 out of 14 backbones, with the exception of Transformer [33]. We attribute this to the fact that the standard Transformer exhibits substantially lower performance than other TS backbones, indicating a limitation of the backbone itself rather than the fusion strategy. For Transformer variants specifically designed for TS forecasting [22, 23, 41], CFA consistently yields improvements over unimodal models. [3] Various text models (Figure 3c). To verify robustness to text encoders, we compare four text models, including three LLM-based encoders [7,29,32] and Doc2Vec [16] to account for scenarios where LLM deployment is restricted. Consistent with previous observations, constrained fusion strategies (green) outperform additive (blue) and concatenation-based (red) methods. Among them, CFA consistently achieves the best performance, demonstrating robustness to the text model. [4] Overall comparison (Table 3d). Table 3d reports the average normalized MSE across more than 2K settings (9 datasets × 14 TS models × 4 text models × 4 horizons) with 10 fusion methods. Among simple fusion strategies, only additive fusion at the middle or last layer improves over the unimodal baseline, whereas other naive approaches fail to do so. This highlights the necessity of constrained fusion for consistent performance gains. Furthermore, CFA achieves the best performance, confirming its superiority over various fusion strategies. Note that we tune each setting over 10 learning rates and compare the best performance to ensure that gains or losses are not due to optimization effects. 6 Text typew/o bottleneckw/ bottleneckImprov. (%) Matching0.16830.1477+12.19 Contradicting0.16350.1560+4.59 Irrelevant0.18510.1480+20.04 (a) Per-type MSE: w/ vs. w/o low-rank bottleneck. 30323436 (r0.26) × 10 6 Matching Contradicting p < 0.001, d = 0.58 Text-Contribution Ratio (b) Text-contribution ratio by text type. Figure 5: Toy experiment on low-rank bottleneck. (a) CFA with a bottleneck consistently out- performs the version without a bottleneck across all text types, with the largest gain observed for irrelevant text. (b) The text-contribution ratio at the adapter output shows that matching text is injected more strongly than contradicting text, indicating that it selectively suppresses conflicting signals. 5 Analysis In this section, we analyze how and why the proposed method operates effectively from seven perspectives, using BERT [7] as the text model: •[1] Performance with irrelevant text. We inject mismatched textual inputs from unrelated datasets and observe that CFA exhibits the smallest performance degradation relative to the unimodal baseline, demonstrating robustness to irrelevant information. •[2] Effect of low-rank bottleneck. Using a synthetic dataset with matched, contradicting, and irrelevant text, we show that the bottleneck reduces MSE across text types. The largest gain occurs for irrelevant text, and matching text is injected more strongly than contradicting text. •[3] Representation similarity. We compute cosine similarity between TS-only and TS+Text repre- sentations and observe that the best performance does not correspond to the largest representation shift, highlighting that effective fusion requires constrained modification. •[4] Visualization of TS forecasting. We visualize the predicted values in TS forecasting and observe that CFA captures late-stage trends missed by the unimodal model. •[5] Temporal attribution analysis. We analyze temporal attribution to quantify how each input TS step contributes to the prediction and find that CFA yields a distinct temporal importance distribution, indicating that it selectively references different input time steps. •[6] Efficiency analysis. We compare the number of training parameters and FLOPs across fusion methods and observe that CFA introduces only marginal overhead relative to the unimodal baseline, while several strategies significantly increase computational overhead. •[7] Information distribution analysis. We compute the rank correlation between MAE and effective rank across diverse settings and find an overall positive trend, suggesting that models with more distributed representations tend to forecast more accurately. [1] Performance with irrelevant text (Figure 4). To assess the effect of multimodal fusion, we conduct an irrelevant text experiment, where we replace the text aligned with each TS by text sampled from entirely different datasets. For instance, when evaluating on the Agriculture dataset, we randomly sample text from the remaining eight datasets, including Economy and Climate. This design examines how each fusion strategy responds when irrelevant textual information is injected. w/o fusion CFA Gating FiLM Middle Orthogonal Last Middle FirstFirst Last 0.2 0.3 0.4 0.5 0.6 Normalized MSE 0.179 0.197 0.225 0.248 0.251 0.254 0.271 0.289 0.381 0.439 0.574 Avg(14 TS models x 9 Datasets x 4 Horizons) Fusion with Irrelevant Text Fusion Constrained Naive-Additive Naive-Concat Figure 4: Irrelevant text experiments. A robust fusion strategy is expected to ignore unrelated information and rely primarily on TS representations, such that its performance does not substantially degrade compared to the unimodal setting. As shown in Figure 4, CFA exhibits performance most similar to the unimodal model and shows the smallest degradation. In addition, constrained fusion strategies except Orthogonal effec- tively filter out unnecessary information and primarily utilize TS representations. These results confirm that CFA incorporates relevant textual signals while remaining robust to irrelevant inputs. [2] Effect of low-rank bottleneck (Figure 5). To analyze the filtering behavior of CFA’s low-rank adapter, we conduct a controlled toy experiment on a synthetic dataset containing three text types: matching (reflecting the TS trend), contradicting (opposing the TS trend), and irrelevant (topically unrelated). We compare CFA w/ and w/o the bottleneck, with details in Appendix E. Table 5a shows that the bottleneck consistently reduces MSE across all text types. The largest improvement occurs for irrelevant text, indicating that the bottleneck suppresses uninformative texts. Without the bottleneck, contradicting text yields lower MSE than matching text, suggesting that all text is treated as undifferentiated noise. With the bottleneck, the expected ordering appears, where matching text yields the lowest MSE, followed by irrelevant text and contradicting text. 7 Setting # 7 - Dataset: Environment - TS Model: PatchTST - Prediction Length: 336 Similar with unimodal Dissimilar with unimodal CFA (MAE=0.502) Middle-Concat (MAE=0.518) Last-Additive (MAE=0.517) Example of single setting Figure 6: Cosine similarity between representations of TS-only and TS+Text models. We apply multiple fusion strategies across diverse settings, where each dot represents a single fusion method. The best-performing method (red) does not consistently induce the largest representation shift from the unimodal baseline, highlighting the importance of controlled integration of textual information into temporal representations rather than simply increasing the magnitude of representation shift. : Ground Truth : Prediction Unimodal / Naive-Additive / Naive-Concat / Constrained Figure 7: [4] Visualization of multimodal TS forecasting. Although the unimodal model captures the initial pattern, it fails to model the subsequent upward trend, whereas CFA accurately captures the rise in the later horizon In addition, some fusion strategies even fail to converge (e.g., concat-first), indicating that naive fusion can hinder effective learning. Figure 5b reports the text-contribution ratio (See Appendix E.2) measured at the adapter output. Matching text produces a higher mean ratio than contradicting text with strong statistical signifi- cance (p<0.001, Cohen’sd=0.58). This indicates that CFA injects helpful text more strongly while suppressing conflicting signals. Visualization of forecast trajectories are illustrated in Appendix E.3. [3] Representation similarity (Figure 6). To quantify how text fusion reshapes temporal representa- tions, we measure the cosine similarity between the layer-wise representationsZ TS ℓ andZ TS+Text ℓ at layerℓ, whereℓ∈1,...,Ldenotes the layer index. We compute the layer-wise cosine similarity between Z TS ℓ and Z TS+Text ℓ , and obtain the final similarity score by averaging across layers: S = 1 L L X ℓ=1 Z TS ℓ , Z TS+Text ℓ Z TS ℓ 2 Z TS+Text ℓ 2 .(2) We repeat this analysis across different settings and fusion strategies. As shown in Figure 6, best- performing fusion method (red) does not consistently correspond to the largest deviation from the unimodal baseline. This indicates that effective fusion requires controlled modification of temporal representations with textual information rather than simply increasing representation shift. [4] Visualization of TS forecasting (Figure 7). To assess how textual fusion influences temporal prediction, we visualize forecasting results on the Agriculture dataset [21] using TiDE [6], with both input and output horizons set to 8. While the unimodal model captures the initial pattern yet fails to model the subsequent upward trend, CFA accurately captures the rise in the later horizon. Moreover, most constrained fusion (green) methods successfully capture the late-stage upward trend that the unimodal model fails to capture. In contrast, several naive fusion strategies (red, blue) even underperform the unimodal baseline, indicating that improper fusion can hinder effective learning. 8 Table 5: Efficiency analysis of fusion strategies. The table shows the average training parameters and FLOPs, where CFA achieves negligible overhead compared to the unimodal baseline. Average Unimodal (w/o text) Naive Constrained AdditiveConcat FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA 1) Train params. (1e+06) 9.382 9.3849.4089.3849.38410.5399.3849.4039.42312.0679.439 + 0.02%+ 0.28%+ 0.02%+ 0.02%+ 12.33%+ 0.02%+ 0.23%+ 0.46%+ 28.62%+ 0.61% 2) FLOPs (1e+08) 1.436 1.4361.4361.4361.4361.6161.4361.4361.4361.8851.436 + 0.00%+ 0.02%+ 0.00%+ 0.00%+ 12.53%+ 0.00%+ 0.02%+ 0.03%+ 31.26%+ 0.04% 1.00.50.00.51.0 ( ) rank (MAE, Effective Rank) 0.0 0.2 0.4 0.6 0.8 1.0 KDE Distribution of Rank Correlation Average (0.198) (a) Distribution of rank correlation. MethodMAE↓Effective Rank↑ CFA (Ours)0.0929 (1/11)24.05 (1/11) Orthogonal0.0967 (4/11)19.37 (3/11) w/o fusion0.0970 (5/11)18.97 (4/11) Additive (First)0.0983 (7/11)16.79 (9/11) Concat (Middle)0.0996 (9/11)16.10 (10/11) Rank corr. (ρ)0.6727 (p-value: 0.023) (b) Example of rank correlation. Figure 9: Rank correlation of MAE and effective rank. (a) Distribution of rank correlations across various settings, showing a positive relationship between MAE and effective rank. (b) Methods with higher effective rank generally achieve lower MAE, resulting in a positive correlation (ρ = 0.6727). t0t1t2t3t4t5t6t7 Input Horizon Additive-first Additive-middle Additive-last Concat-first Concat-middle Concat-last w/o fusion Orthogonal FiLM Gating CFA Fusion Strategy 0.380.010.240.150.130.030.020.04 0.120.170.070.110.130.130.120.15 0.350.030.260.100.100.060.030.07 0.100.120.090.030.100.240.060.28 0.290.010.070.040.160.020.090.31 0.100.120.090.030.100.240.050.28 0.310.100.100.300.010.040.080.07 0.030.230.200.180.010.060.160.13 0.240.110.010.150.170.170.030.14 0.100.020.170.140.120.030.170.25 0.010.210.040.170.290.020.250.01 Temporal Importance Attribution 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Figure 8: Various fusion methods. [5] Temporal attribution analysis (Figure 8). Under the same experimental setting as Analysis [4], we fur- ther analyze the importance scores that quantify how much each input TS step contributes to the prediction. Specifically, we compute a gradient×input attribu- tion with respect to the encoder input and aggregate over channels to obtain one importance value per time step. As shown in Figure 8, CFA assigns a different importance distribution over input TS steps compared to w/o fusion and other methods, indicating that textual information influences which temporal regions are ref- erenced during prediction. See Appendix H for details on the computation of temporal attribution. [6] Efficiency analysis (Table 5). To demonstrate the efficiency of CFA, we compare the average number of training parameters and FLOPs across 10 fusion methods, aggregated over 14 TS backbones and 9 datasets under prediction lengths of 8, 36, and 96 for monthly, weekly, and daily frequencies. While certain strategies substantially increase computational overhead, most designs introduce marginal overhead. Notably, CFA increases parameters by only 0.61% and FLOPs by 0.04% relative to the unimodal model, indicating that it maintains efficiency comparable to the unimodal setting. [7] Information distribution analysis (Figure 9). We analyze fusion behavior using the effective rank of layer-wise representations. Given a hidden representationH, we compute its singular valuesσ i r i=1 and definep i = σ i P r j=1 σ j anderank(H) = exp (− P r i=1 p i logp i ) , where higher effective rank indicates more distributed representations. To assess its relevance, we compute the rank correlation between MAE and effective rank across fusion strategies under diverse settings, including four TS backbones [22,26,23,45], nine datasets, and four forecast horizons. Figure 9a shows a positive relationship between two metrics, as illustrated by the representative example in Table 9b. 6 Conclusion In this paper, we show that naive fusion often underperforms unimodal baselines, suggesting that indiscriminate cross-modal fusion fails to preserve temporal representations. We demonstrate that constrained fusion consistently improves performance by filtering irrelevant auxiliary signals while preserving temporal representations effectively. Furthermore, we propose CFA, a constrained fusion method that suppresses irrelevant textual information in the low-rank subspace. Extensive experiments across various datasets and backbones validate the effectiveness and generality of our approach. Limitations and Future Work. As our study focuses on textual modalities, extending constrained fusion to other modalities (e.g. vision and tabular) remains a promising direction for future research. In addition, although CFA shows consistent improvements across diverse settings, our analysis remains largely empirical, and deeper theoretical understanding would strengthen the framework. 9 References [1]Rafal A Angryk, Petrus C Martens, Berkay Aydin, Dustin Kempton, Sushant S Mahajan, Sunitha Basodi, Azim Ahmadzadeh, Xumin Cai, Soukaina Filali Boubrahimi, Shah Muhammad Hamdi, et al. Multivariate time series dataset for space weather data analytics. Scientific data, 7(1):227, 2020. [2]Sameep Chattopadhyay, Pulkit Paliwal, Sai Shankar Narasimhan, Shubhankar Agarwal, and Sandeep P Chinchali. Context matters: Leveraging contextual features for time series forecasting. arXiv preprint arXiv:2410.12672, 2024. [3]Si-An Chen, Chun-Liang Li, Nate Yoder, Sercan O Arik, and Tomas Pfister. Tsmixer: An all-mlp architecture for time series forecasting. TMLR, 2023. [4] Abdul Monaf Chowdhury, Rabeya Akter, and Safaeid Hossain Arib. T3time: Tri-modal time series forecasting via adaptive multi-head alignment and residual fusion. arXiv preprint arXiv:2508.04251, 2025. [5]Razvan-Gabriel Cirstea, Bin Yang, Chenjuan Guo, Tung Kieu, and Shirui Pan. Towards spatio- temporal aware traffic time series forecasting. In 2022 IEEE 38th International Conference on Data Engineering (ICDE), pages 2900–2913. IEEE, 2022. [6]A. Das, W. Kong, A. B. Leach, S. Mathur, R. Sen, and R. Yu. Long-term forecasting with tide: Time-series dense encoder. arXiv Preprint arXiv:2304.08424, 2023. [7]Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL, 2018. [8]Vijay Ekambaram, Kushagra Manglik, Sumanta Mukherjee, Surya Shravan Kumar Sajja, Satyam Dwivedi, and Vikas Raykar. Attention based multi-modal new product sales time-series forecasting. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 3110–3118, 2020. [9] Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022. [10]Furong Jia, Kevin Wang, Yixiang Zheng, Defu Cao, and Yan Liu. Gpt4mts: Prompt-based large language model for multimodal time-series forecasting. In AAAI, 2024. [11] Yushan Jiang, Kanghui Ning, Zijie Pan, Xuyang Shen, Jingchao Ni, Wenchao Yu, Anderson Schneider, Haifeng Chen, Yuriy Nevmyvaka, and Dongjin Song. Multi-modal time series analysis: A tutorial and survey. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2, pages 6043–6053, 2025. [12] Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, et al. Time-llm: Time series forecasting by reprogramming large language models. In ICLR, 2024. [13]Kai Kim, Howard Tsai, Rajat Sen, Abhimanyu Das, Zihao Zhou, Abhishek Tanpure, Mathew Luo, and Rose Yu. Multi-modal forecaster: Jointly predicting time series and textual data. arXiv:2411.06735, 2024. Preprint. [14]Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [15] Nikita Kitaev, Łukasz Kaiser, and Anselm Levskaya. Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451, 2020. [16]Quoc V. Le and Tomas Mikolov. Distributed representations of sentences and documents. In ICML, 2014. 10 [17]Zihao Li, Xiao Lin, Zhining Liu, Jiaru Zou, Ziwei Wu, Lecheng Zheng, Dongqi Fu, Yada Zhu, Hendrik Hamann, Hanghang Tong, et al. Language in the flow of time: Time-series-paired texts weaved into a unified temporal narrative. arXiv preprint arXiv:2502.08942, 2025. [18]Jiafeng Lin, Yuxuan Wang, Huakun Luo, Zhongyi Pei, and Jianmin Wang. Timi: Empower time series transformers with multimodal mixture of experts. arXiv preprint arXiv:2602.21693, 2026. [19]Chanjuan Liu, Shengzhi Wang, and Enqiang Zhu. Pa-rnet: Perturbation-aware reasoning network for multimodal time series forecasting. arXiv preprint arXiv:2508.04750, 2025. [20]Chenxi Liu, Qianxiong Xu, Hao Miao, Sun Yang, Lingzheng Zhang, Cheng Long, Ziyue Li, and Rui Zhao. Timecma: Towards llm-empowered time series forecasting via cross-modality alignment. AAAI, pages arXiv–2406, 2025. [21]Haoxin Liu, Shangqing Xu, Zhiyuan Zhao, Lingkai Kong, Harshavardhan Prabhakar Kamarthi, Aditya Sasanur, Megha Sharma, Jiaming Cui, Qingsong Wen, Chao Zhang, et al. Time-mmd: Multi-domain multimodal dataset for time series analysis. Advances in Neural Information Processing Systems, 37:77888–77933, 2024. [22]Y. Liu et al. Non-stationary transformers for time series forecasting. In Advances in Neural Information Processing Systems (NeurIPS), 2022. [23]Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itransformer: Inverted transformers are effective for time series forecasting. In ICLR, 2024. [24]Yong Liu, Chenyu Li, Jianmin Wang, and Mingsheng Long. Koopa: Learning non-stationary time series dynamics with koopman predictors. Advances in neural information processing systems, 36:12271–12290, 2023. [25]Huu Hiep Nguyen, Minh Hoang Nguyen, Dung Nguyen, and Hung Le. Spectral text fusion: A frequency-aware approach to multimodal time-series forecasting. arXiv:2602.01588, 2026. Preprint. [26] Y. Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam. Patchtst: A patch-based time series transformer with channel independence. arXiv Preprint arXiv:2211.14730, 2023. [27] Sehyuk Park, Soyeon Caren Han, and Eduard Hovy. Unicast: A unified multimodal prompting framework for time series forecasting. arXiv:2508.11954, 2025. Preprint. [28]Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. Film: Visual reasoning with a general conditioning layer. In AAAI, 2018. [29]Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. OpenAI, 2019. Technical report. [30]Hadi Rezaei, Hamidreza Faaljou, and Gholamreza Mansourfar. Stock price prediction using deep learning and frequency decomposition. Expert Systems with Applications, 169:114332, 2021. [31]Chen Su, Yuanhe Tian, and Yan Song. Multimodal conditioned diffusive time series forecasting. arXiv preprint arXiv:2504.19669, 2025. [32] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023. [33] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NeurIPS, 2017. [34]Chengsen Wang, Qi Qi, Jingyu Wang, Haifeng Sun, Zirui Zhuang, Jinming Wu, Lei Zhang, and Jianxin Liao. Chattime: A unified multimodal time series foundation model bridging numerical and textual data. In AAAI, 2025. 11 [35]Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Guo Qin, Haoran Zhang, Yong Liu, Yunzhong Qiu, Jianmin Wang, and Mingsheng Long. Timexer: Empowering transformers for time series forecasting with exogenous variables. Advances in Neural Information Processing Systems, 37:469–498, 2024. [36] Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. In NeurIPS, 2021. [37]Hao Xue and Flora D Salim. Promptcast: A new prompt-based learning paradigm for time series forecasting. IEEE Transactions on Knowledge and Data Engineering, 36(11):6851–6864, 2023. [38] Yueyang Yao, Jiajun Li, Xingyuan Dai, MengMeng Zhang, Xiaoyan Gong, Fei-Yue Wang, and Yisheng Lv. Context-aware probabilistic modeling with llm for multimodal time series forecasting. arXiv:2505.10774, 2025. Preprint. [39] Kun Yi, Qi Zhang, Wei Fan, Shoujin Wang, Pengyang Wang, Hui He, Ning An, Defu Lian, Longbing Cao, and Zhendong Niu. Frequency-domain mlps are more effective learners in time series forecasting. Advances in Neural Information Processing Systems, 36:76656–76679, 2023. [40]A. Zeng, S. Chen, L. Zhang, and Q. Xu. DLinear: Efficient linear models for time series forecasting. arXiv preprint arXiv:2205.13504, 2022. [41]Yunhao Zhang and Junchi Yan. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In ICLR, 2023. [42]Siru Zhong, Weilin Ruan, Ming Jin, Huan Li, Qingsong Wen, and Yuxuan Liang. Time- vlm: Exploring multimodal vision-language models for augmented time series forecasting. In Proceedings of the 42nd International Conference on Machine Learning (ICML), volume 267 of PMLR, pages 78478–78497, 2025. [43]Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. In AAAI, 2021. [44] Shiqiao Zhou, Holger Schöner, Huanbo Lyu, Edouard Fouché, and Shuo Wang. Balm-tsf: Balanced multimodal alignment for llm-based time series forecasting. arXiv:2509.00622, 2025. Preprint. [45] T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin. Film: Frequency improved legendre mem- ory model for long-term time series forecasting. Advances in Neural Information Processing Systems, 2022. [46]Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In ICML, 2022. 12 A Details of Models We use themmtslibpackage provided by Time-MMD [21]. We adopt 14 TS models and 4 text models implemented in the library. TS Models. We use 14 TS models from three categories as follows: • Transformer-based Models – Transformer [33]: A standard sequence-to-sequence model based on multi-head self-attention. – Informer [43]: A sparse-attention Transformer designed for efficient long-term TS forecasting. –Reformer [15]: An efficient Transformer variant using locality-sensitive hashing and reversible layers. –Autoformer [36]: A decomposition-based Transformer that replaces attention with an auto- correlation mechanism. –FEDformer [46]: A frequency-enhanced Transformer leveraging Fourier decomposition for long-horizon prediction. – Crossformer [41]: A multivariate Transformer that models cross-dimension dependencies. –iTransformer [23]: An inverted Transformer that treats variates as tokens to model inter-variable relationships. –Nonstationary Transformer [22]: A Transformer tailored to capture non-stationary temporal patterns. –PatchTST [26]: A patch-based Transformer that segments TS into subseries tokens for scalable forecasting. • Linear / MLP-based Models –DLinear [40]: A linear model that performs forecasting via direct regression on historical inputs. – TiDE [6]: An MLP-based encoder-decoder model designed for long-term TS forecasting. –TSMixer [3]: A fully-MLP architecture that mixes temporal and feature information for predic- tion. • Other Architectures – Koopa [24]: A Koopman-operator-inspired model that decomposes TS into stable and dynamic components. –FiLM [45]: A frequency-based method that applies Fourier and Legendre projections for denoising and trend modeling. Text Models. We use 4 text models, including three LLMs as follows: • BERT [7]: A bidirectional Transformer pre-trained with masked language modeling. • GPT-2 [29]: An autoregressive Transformer trained to predict next-token distributions. • Llama-3 [32]: A recent large language model with improved text generation and comprehension capabilities. •Doc2Vec [16]: A document-level embedding model that learns fixed-length vector representations of text. 13 B Details of Datasets We use the nine multimodal datasets from various domains proposed in Time-MMD [21]. The input length (L) and forecasting horizons (H) are determined according to the frequency of each dataset (daily, weekly, or monthly). Detailed information about the datasets is provided in Table 6. Table 6: Meta information of the nine Time-MMD [21] datasets. We report the domain, prediction target, dimensionality, data frequency, number of samples, timespan, input length (L), and forecasting horizons (H ) for all nine multimodal datasets. DomainTargetDim.Freq.#SamplesTimespanLH AgricultureRetail Broiler Composite1M4961983–Present246, 8, 10, 12 ClimateDrought Level5M4961983–Present246, 8, 10, 12 EconomyInternational Trade Balance3M4231989–Present246, 8, 10, 12 EnergyGasoline Prices9W14791996–Present4812, 24, 36, 48 EnvironmentAir Quality Index4D111021982–202333648, 96, 192, 336 HealthInfluenza Patients Proportion11W13891997–Present4812, 24, 36, 48 SecurityDisaster & Emergency Grants1M2971999–Present246, 8, 10, 12 Social GoodUnemployment Rate1M9001950–Present246, 8, 10, 12 TrafficTravel Volume1M5311980–Present246, 8, 10, 12 The qualitative descriptions of each dataset are as follows: • Agriculture: Tracks U.S. retail broiler (chicken) composite prices, reflecting supply–demand dynamics and seasonal patterns in the agricultural market. •Climate: Measures drought severity levels across regions, capturing long-term climate variability and extreme weather trends. • Economy: Represents the U.S. international trade balance, indicating macroeconomic conditions and global trade fluctuations. •Energy: Records U.S. gasoline prices, a key indicator of energy market volatility and consumer economic burden. • Environment: Monitors daily air quality index (AQI), reflecting pollution dynamics and environ- mental risk levels. • Health: Tracks weekly influenza-like illness (ILI) proportions, serving as a proxy for epidemic spread and public health trends. •Security: Captures disaster and emergency grant allocations, reflecting the temporal impact of large-scale natural and societal crises. •Social Good: Measures unemployment rates, highlighting labor market disparities and socioeco- nomic stability. •Traffic: Represents travel volume statistics, indicating mobility trends and transportation demand dynamics. 14 C Experimental Setup (1) Evaluation metrics. We evaluate forecasting performance using Mean Squared Error (MSE) and Mean Absolute Error (MAE), which are standard metrics in TS forecasting. (2) Input and output horizons. We consider different forecasting horizon settings depending on the reporting frequency of each dataset, as shown in Table 7. Table 7: Input and output horizon settings. FrequencyLookback window (L)Forecast horizons (H ) Daily96[48, 96, 192, 336] Weekly36[12, 24, 36, 48] Monthly8[6, 8, 10, 12] (3) Optimizer. We use the Adam optimizer [14] for all trainable modules with a batch size of 32. (4) Learning rate. We assign separate optimizers and learning rates to different components. The default learning rates are defined as follows: Table 8: Learning rate configuration. OptimizerTarget moduleLearning rateDefault value Model OptimizerTime Series Modelη TS 1× 10 −4 MLP OptimizerText Embedding MLPη MLP 1× 10 −2 Projection OptimizerProjection Layerη Proj 1× 10 −3 To identify the best configuration, we multiply each default learning rate by the following scaling factors: 0.05, 0.1, 0.5, 1.0, 2.0, 5.0, 10.0, 20.0, 50.0, 100.0, resulting in 10 different learning rate configurations per component. (5) Epochs. We train the model for a maximum of 10 epochs with patience 5. (6) Data split. All datasets are divided chronologically into train:validation:test = 7:1:2. 15 D Model Hyperparameters Table 9 presents the hyperparameter settings of the 14 TS models used in our experiments. We follow the default configurations adopted in Time-MMD [21]. Here,d model denotes the model (hidden) dimension,n heads the number of attention heads,L enc /L dec the number of encoder/decoder layers,d f the feed-forward dimension, and p drop the dropout rate. Note that since DLinear [40] has no hidden representation, all three injection positions (first, middle, and last) operate directly on the channel dimensionC in rather than a latent dimensiond model . Specifi- cally, first fusion adds or concatenates the projected text embeddingW proj e t ∈R C in to the raw input x∈R L×C in before decomposition. Middle fusion injects the text signal into the seasonal and trend components∈R C in ×H independently after theseq_len→ pred_lenlinear projection but before their summation. Last fusion modifies the final output∈R H×C in after the linear mapping. For CFA, the adapter bottleneck dimension is computed as⌊C in /r⌋rather than⌊d model /r⌋. Thus, the low-rank text residual is injected into the variate space instead of a token- or hidden-level space. Table 9: Default hyperparameters of the 14 TS models. Modeld model n heads L enc L dec d f p drop Transformer-based methods Transformer [33]51282120480.1 Informer [43]51282120480.1 Autoformer [36]51282120480.1 FEDformer [46]51282120480.1 Nonstationary Transformer [22]51282120480.1 Reformer [15]51282—20480.1 iTransformer [23]51282—20480.1 PatchTST [26]51282—20480.1 Crossformer [41]51282—20480.1 Linear/MLP-based methods DLinear [40]—0.1 TSMixer [3]512—2—0.1 TiDE [6]512—2120480.1 Others FiLM [45]512—2—0.1 Koopa [24]—0.1 16 E Toy Experiment with Multimodal Fusion We conduct a controlled toy experiment to analyze the effect of the low-rank bottleneck (low-rank adapter) in CFA. The low-rank adapter constrains the projection of text embeddings through a low-rank bottleneck before they are injected into the TS backbone, which is expected to suppress irrelevant or contradicting textual signals while preserving useful guidance. To examine this effect, we compare CFA with and without the low-rank constraint. Throughout this section, w/ bottleneck denotes CFA with the low-rank bottleneck, while w/o bottleneck denotes the middle-additive baseline that directly injects text embeddings without the low-rank constraint. A. Toy dataset construction. We synthesize a univariate TS and pair each time step with a short natural-language description. Each description belongs to one of three categories: •1) Matching: The description correctly reflects the current trend of the TS (e.g., “The value is rising steadily”). This text provides useful information and should be incorporated by the model. • 2) Contradicting: The description contradicts the actual trend (e.g., “The value is declining” when the series is rising). Injecting such text without filtering is expected to degrade forecast quality. • 3) Irrelevant: The description is unrelated to the TS (e.g., a sentence about an unrelated domain). This text contains no useful signal and should ideally be ignored. We construct a dataset (N = 1,000) in which all three text types coexist, with 70% used for training, 10% for validation, and 20% for testing. To illustrate the dataset, Figure 10 shows example TS segments paired with each type of textual description. 1.0 0.5 0.0 0.5 Matching Text Input window Output window "Based on historical patterns and recent data, the value is projected to rise in the upcoming period. Upward momentum is expected to continue over the forecast horizon." 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Contradicting Text Input window Output window "Despite recent signals, external factors suggest the value will fall in the upcoming period. A downward correction is anticipated." 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 Irrelevant Text Input window Output window "Culinary festivals in regional cities attracted visitors from neighboring provinces and districts." Synthetic Dataset: Three Categories Figure 10: Illustration of the toy dataset. Each time step of the TS is paired with a text description, categorized as 1) matching, 2) contradicting, or 3) irrelevant. Matching descriptions align with the TS trend, contradicting ones oppose it, and irrelevant ones provide no useful signal. B. Model and horizon. We use the Nonstationary Transformer [22] as the TS backbone with an input length of L = 8 and a prediction horizon of T = 8. C. Analyses. We examine the effect of the low-rank bottleneck from three perspectives: • Performance comparison (Section E.1): MSE per text type. • Representation analysis (Section E.2): Bottleneck activation norms and text-contribution ratios. • TS visualization (Section E.3): Predictions for matched vs. contradicting text. 17 E.1 Analysis 1: Performance Comparison Table 10 reports the MSE on the test split for case w/ bottleneck (CFA) and w/o bottleneck (middle- additive) across three categories of text descriptions. Table 10: Performance on the synthetic dataset. Results are reported for CFA w/ and w/o bottleneck across three categories of text descriptions. Text typew/o bottleneckw/ bottleneckImprov. (%) Matching0.16830.1477+ 12.19 Contradicting0.16350.1560+ 4.59 Irrelevant0.18510.1480+ 20.04 From Table 10, three observations follow as below: (i) CFA improves performance across all text types. CFA (w/ bottleneck) achieves lower MSE than the middle-additive baseline (w/o bottleneck) for matching, contradicting, and irrelevant descriptions, indicating that the low-rank bottleneck consistently improves multimodal fusion. (i) CFA effectively suppresses irrelevant textual signals. For irrelevant descriptions, which contain no useful forecasting information, the middle-additive baseline still injects the text into the TS backbone, introducing noise into the representations. In contrast, the low-rank bottleneck largely filters out such inputs, yielding the largest MSE reduction (+20.04%). (i) CFA better distinguishes helpful and harmful text. Without the low-rank adapter, contra- dicting text surprisingly yields lower MSE than matching text, suggesting that the baseline fails to properly utilize the semantic content of the descriptions. In contrast, With the low-rank adapter, the expected ordering emerges, where matching descriptions are most beneficial, while contradicting descriptions remain the most challenging (MSE Matching < MSE Irrelevant < MSE Contradicting ). E.2 Analysis 2: Representation Analysis To investigate how CFA selectively filters text, we measure the text-contribution ratio at the output of the low-rank adapter. For each test sample, we computer = ∥o∥ 2 ∥ ̄ e∥ 2 , whereo ∈R d model is the adapter output added to the TS representation, and ̄ eis the pooled text embedding before the adapter. Intuitively,rquantifies how much of the original text signal survives the bottleneck: higherrindicates stronger injection, while lowerrindicates suppression. In this analysis, we focus on the first encoder layer of the Nonstationary Transformer and report r by text category (Matching vs. Contradicting). Table 11: Text-contribution ratio. The table shows how much of the text signal survives the low-rank bottleneck for each text type. Matching text is injected more strongly than contradicting text, indicating that the adapter selec- tively suppresses conflicting information. MatchingContradicting Mean33.5031.56 Std3.563.07 t, pt = 4.03, p < 0.001 Cohen’s d0.58 MatchingContradicting 30 32 34 36 ( r 0.26) × 10 6 p < 0.001, d = 0.58 Text-Contribution Ratio Figure 11: Text-contribution ratio. Table 11 and Figure 11 summarize the distribution ofrfor matching and contradicting text, where values are(r− 0.25745)× 10 6 for readability. Matching text yields a higher mean value (33.50) than contradicting text (31.56), and the difference is statistically significant (t = 4.03,p < 0.001, Cohen’sd = 0.58). This suggests that the low-rank bottleneck suppresses contradictory text while preserving matching information, providing controlled guidance to the TS backbone. 18 E.3 Analysis 3: TS Visualization Figure 12 provides a qualitative comparison of the forecast trajectories for a representativematching (left) and contradicting (right) test sample from the AB dataset. 0.6 0.4 0.2 0.0 0.2 Matching text Input window Output window Pred w/ LoRA (MSE=0.01) Pred w/o LoRA (MSE=0.03) Matching (A) correct trend description "Based on historical patterns and recent data, the value is projected to fall in the upcoming period. Downward pressure is expected to continue over the forecast horizon." 0.6 0.4 0.2 0.0 0.2 0.4 Contradicting text Input window Output window Pred w/ LoRA (MSE=0.01) Pred w/o LoRA (MSE=0.09) Contradicting (B) wrong trend description "Despite recent signals, external factors suggest the value will rise in the upcoming period. An upward correction is anticipated." Effect of CFA with Matching & Contradicting text Figure 12: Forecast comparison for a matched (left) and a contradicting (right) text sample. Green solid: w/ bottleneck. Red dashed: w/o bottleneck. Black solid: ground truth. The vertical dotted line separates the input window from the prediction horizon. Matching text. When the injected text correctly describes the trend, both w/ bottleneck and w/o bot- tleneck produce reasonable forecasts that follow the ground truth. The CFA prediction (w/ bottleneck) tracks the ground truth slightly more closely, suggesting that the low-rank bottleneck preserves useful information while absorbing the helpful signal. Contradicting text. The contrast is clearer for the contradicting-text sample, where the ground truth descends during the forecast horizon, while the injected text suggests an upward movement. The baseline (w/o bottleneck) is misled by this conflicting signal and its prediction rises in the early forecast steps, deviating from the ground truth. CFA (w/ bottleneck), in contrast, disregards the misleading text and predicts a downward trajectory that closely follows the ground truth. This qualitative contrast visualizes the filtering capability of the low-rank bottleneck, as the low-rank constraint prevents harmful text from corrupting the forecast even when the text signal is strongly contradictory. 19 F Details of Constrained Fusion Strategies In this section, we provide formal descriptions and mathematical formulations for the three constrained fusion strategies introduced in Section 3.2. We adopt the notation defined in Section 3: TS embeddings Z TS = [z TS,1 ,..., z TS,L ], text embeddings Z Text = [z Text,1 ,..., z Text,L ]. F.1 Gating Mechanism The gating mechanism determines the relevance of textual information at each time step using a learned gate. Formally, letz TS,t ∈R D denote the TS embedding at time stept, andz Text,t ∈R D the text embedding. A gate g t ∈ [0, 1] D is computed as: g t = σ(W g [z TS,t ; z Text,t ] + b g ),(3) whereσis the sigmoid function,[·;·]denotes concatenation, andW g , b g are learnable parameters. The fused embedding is then: z fused,t = z TS,t + g t ⊙ z Text,t .(4) F.2 FiLM (Feature-wise Linear Modulation) FiLM modulates TS embeddings based on text embeddings by applying feature-wise scaling and shifting as: z fused,t = γ(z Text,t )⊙ z TS,t +β(z Text,t ),(5) whereγ(·),β(·)are learned functions (e.g., MLPs) that generate scaling and shifting parameters from the text embedding. This preserves the temporal structure of TS embeddings while modulating each feature dimension according to textual context. F.3 Orthogonal Fusion Orthogonal fusion projects the text embedding onto the orthogonal complement of the TS embedding to avoid overwriting temporal information: z ⊥ Text,t = z Text,t − z ⊤ TS,t z Text,t ∥z TS,t ∥ 2 z TS,t ,(6) z fused,t = z TS,t + z ⊥ Text,t .(7) This ensures that only components of the text embedding orthogonal to the TS embedding are integrated, preserving the original temporal information. 20 F.4 Controlled Fusion Adapter (CFA) Controlled Fusion Adapter (CFA) integrates textual information into TS embeddings through a lightweight residual adapter with a low-dimensional bottleneck. Unlike direct additive or multiplica- tive fusion, CFA first transforms the text embedding into a compact latent space and then projects it back to the original dimension before residual addition. This design preserves the TS backbone representation while allowing text to provide directional guidance with minimal parameter overhead. Formally, for each time stept, the text embeddingz Text,t ∈R D is first projected to a reduced bottleneck space: z Adapter,t = W down z Text,t ∈R D/r ,(8) whereris the reduction ratio controlling parameter efficiency. The intermediate representation is then normalized and activated: z ′ Adapter,t = φ LayerNorm(z Adapter,t ) ,(9) whereφ(·)denotes a non-linear activation function (e.g., ReLU). Finally, the representation is projected back to the original dimension and added as a residual to the TS embedding: z fused,t = z TS,t + W up z ′ Adapter,t .(10) Here,W down ∈R D r ×D andW up ∈R D× D r are learnable parameters. The residual structure ensures that the original TS embedding is preserved, while the bottleneck controls the magnitude and complexity of textual influence. In practice, initializingW up near zero helps stabilize early training by preventing excessive perturbation of the TS backbone. G Theoretical Perspective on Low-Rank Text Fusion CFA injects textual signals into TS embeddings through a low-rank adapter. Following the formulation in Section 3.3, the injected signal is given by z ′ Adapter,t = W up φ(W down z Text,t ),(11) which is added to the TS embedding as ̃ z TS,t = z TS,t + z ′ Adapter,t .(12) SinceW down ∈R D/r×D andW up ∈R D×D/r , we haverank(W down ) ≤ D/randrank(W up ) ≤ D/r. Fromrank(AB)≤ min(rank(A), rank(B)), the transformationW up W down has rank at most D/r. Therefore the injected textual signal lies in a subspace ofR D whose dimension is at mostD/r. This low-rank constraint restricts the directions in which textual signals influence TS embeddings. As a result, the textual information modifies TS representations only within a low-dimensional subspace. This reduces the capacity of textual signals and suppresses irrelevant information during fusion. 21 H Temporal Attribution for Multimodal TS Forecasting To analyze which past time steps the model relies on when generating forecasts, we assign an importance score to each of theLinput time steps. This enables direct comparison of temporal focus across different fusion strategies. Problem setup. Let the encoder input bex∈R B×L×D , whereBdenotes the batch size,Lthe input horizon, andDthe number of channels. The forecasting model predicts ˆ yover the output horizon, and we define the loss as mean squared error L = MSE( ˆ y, y). Our goal is to quantify the contribution of each input time step t∈0,...,L− 1 to this loss. Gradient-based temporal importance. We adopt a Gradient×Input attribution scheme. For a single sample (omitting the batch index), letx (d) t denote the input value at time steptand channeld. The gradient of the loss with respect to the input is g (d) t = ∂L ∂x (d) t . The element-wise attribution is defined as a (d) t = g (d) t · x (d) t . We aggregate over channels to obtain one importance value per time step: I t = D X d=1 a (d) t = D X d=1 ∂L ∂x (d) t · x (d) t . This yields an importance vectorI = (I 0 ,...,I L−1 ) ∈R L . When the input horizon isL = 8, we therefore obtain exactly 8 importance values, one for each input time step. For visualization and comparison across samples, we optionally normalize the scores: ̃ I t = I t P L−1 t ′ =0 I t ′ + ε , where ε > 0 ensures numerical stability. Implementation details We compute gradients with respect to the encoder input by cloning the input tensor and enabling gradient tracking on the clone. After a forward pass and backpropagation of the MSE loss, we extract∂L/∂xand compute the element-wise product with the original input values. All attribution scores are computed under the same forecasting setting used for evaluation. For models that do not employ attention mechanisms, such as TiDE, the temporal attribution is derived solely from the gradient-based formulation above. Thus, the number of importance values always equals the input horizon L, ensuring consistent interpretation across backbones. 22 I Sensitivity Analysis We examine the robustness of the CFA to the reduction ratioracross datasets and TS models. The results in Table 12 show that performance remains stable for different values ofr. Although r = 4achieves the best average performance in several cases, we adoptr = 8in our experiments considering both performance and computational efficiency. Table 12: Average performance (MAE, MSE) by CFA reduction rank r. Metricr = 2r = 4r = 8r = 16r = 32 MAE1.0311.0251.0291.0351.030 SE13.32613.29013.33413.37413.339 I.1 Performance by TS Model Table 13: Average MAE per model by CFA reduction ratio r. Modelr = 2r = 4r = 8r = 16r = 32 Autoformer [36]1.0261.0211.0311.0391.018 Crossformer [41]1.1251.1341.1271.1371.118 DLinear [40]0.9690.9690.9690.9690.970 FEDformer [46]0.9670.9630.9700.9780.981 FiLM [45]0.9630.9630.9630.9630.963 Informer [43]1.2171.1901.2111.2171.208 Koopa [24]0.9440.9440.9440.9440.945 Nonstationary Transformer [22]0.9040.9090.8990.9160.910 PatchTST [26]0.9480.9430.9480.9480.951 Reformer [15]1.1511.1361.1671.1791.162 TSMixer [3]1.1231.1231.1231.1231.123 TiDE [6]0.9880.9790.9740.9810.986 Transformer [33]1.1491.1341.1281.1621.145 iTransformer [23]0.9530.9430.9500.9400.944 Table 14: Average MSE per model by CFA reduction ratio r. Modelr = 2r = 4r = 8r = 16r = 32 Autoformer [36]12.94712.74413.10713.20312.805 Crossformer [41]14.30214.34514.13514.29114.088 DLinear [40]12.61312.61312.61312.61312.613 FEDformer [46]12.59712.73212.74812.95512.887 FiLM [45]12.76612.76612.76612.76612.766 Informer [43]14.82914.85114.86114.89714.813 Koopa [24]12.42012.42012.42012.42012.421 Nonstationary Transformer [22]12.24712.20812.18512.27612.274 PatchTST [26]12.61512.58012.63212.61312.584 Reformer [15]14.03814.16514.38714.44714.394 TSMixer [3]14.69314.69314.69314.69314.693 TiDE [6]12.97412.75612.77912.79713.051 Transformer [33]14.79714.71414.73314.69114.698 iTransformer [23]12.72712.47112.62112.56812.654 23 I.2 Performance by Dataset Table 15: Average MAE per dataset by CFA reduction ratio r. Datasetr = 2r = 4r = 8r = 16r = 32 Algriculture0.25340.24520.25080.25150.2560 Climate0.86800.86550.86480.86690.8666 Economy0.26790.23640.26100.28250.2518 Energy0.39270.38940.39080.38960.3939 Environment0.51560.51920.51900.51880.5190 Public0.79760.79410.79310.80010.7946 Security5.41685.41295.42185.44415.4309 SocialGood0.45840.45770.45320.46420.4545 Traffic0.30460.30540.30410.30190.3054 Table 16: Average MSE per dataset by CFA reduction ratio r. Datasetr = 2r = 4r = 8r = 16r = 32 Algriculture0.14830.14460.14730.15010.1526 Climate1.15961.15201.15371.15761.1540 Economy0.16020.12780.15400.17900.1390 Energy0.28130.27810.28000.27940.2838 Environment0.48360.48770.48750.48560.4848 Public1.41351.41611.40551.41731.4062 Security115.1568114.8712115.2570115.5611115.3047 SocialGood0.91950.91830.91210.92150.9115 Traffic0.21140.21250.21120.21080.2113 24 J Comparison with Other Methods Table 17 presents a comparison of our method against other TS models, covering both unimodal TS models and architecture-specific multimodal TS models. We use eight Time-MMD multimodal datasets [21], excluding the Environment dataset due to reproducibility issues with the baseline methods. 3 The results are based on SpecTF [25], which is concurrent with our work (2026.02). We apply our approach to three representative TS backbones: a Transformer-based model (Nonstationary Transformer [22]), a Linear/MLP-based model (TiDE [6]), and other architectures (Koopa [24]). Table 17: Comparison with other methods, including architecture-specific multimodal TS forecasting models. Red bold: best, blue underline: second best. Dataset HSpecTF [25]TaTS [17]M-TSF [21]TimeXer [35]FreTS [39]Time-LLM [12]ChatTime [34] CFA (Ours) Transformer [22]+ Linear/MLP [6]Others [24] Agriculture 6 0.0640.0750.0840.0880.0820.0920.1930.0530.0600.056 80.0840.0970.1120.1090.1010.1510.2640.0690.0820.076 100.1100.1270.1330.1440.1250.1760.3450.0910.1030.100 120.1520.1670.1650.1870.1720.2100.4180.1160.1350.128 Avg0.1030.1120.1230.1320.1200.1570.3050.0820.0950.090 Climate 60.9040.9901.0741.0270.2861.1261.4901.0731.2581.253 80.9270.9981.0671.0590.2831.1651.5301.2501.2671.258 10 0.9651.0101.1001.0670.2941.2001.5191.1761.2591.247 120.9601.0121.1011.0580.2931.2101.5701.1631.2741.232 Avg0.9381.0021.0531.0531.2891.1751.5281.1661.2651.248 Economy 6 0.00840.00800.01160.01330.01500.03070.4780.0170.0170.017 80.00860.00830.01170.01400.01490.03100.0560.0200.0180.018 100.00850.00830.01260.01430.01550.03230.0560.0190.0170.017 12 0.00860.00840.01250.01480.01540.03360.0590.0200.0170.018 Avg0.00850.00830.00960.00960.01520.03190.05470.0190.0170.018 Energy 120.0960.1030.1170.1370.1270.1050.1350.1010.1120.107 240.1990.2060.2100.2420.2450.2160.2380.1860.2200.205 360.2820.3130.3070.3240.3370.3070.3280.3060.3000.299 480.4060.4330.4470.4470.4870.4270.4350.4040.4080.402 Avg0.2460.2640.2700.2880.2990.2640.2840.2490.2600.253 Health 120.9850.9791.1961.1811.1411.3081.3820.7241.1061.008 24 1.2561.2971.4401.4021.5461.7241.7581.0321.4251.410 361.3951.4421.6691.4551.6911.8691.9041.3711.6021.625 481.4701.5191.7551.5381.7901.8951.9301.2691.7401.715 Avg1.2761.3401.5081.3941.5421.6991.7301.0991.4681.440 Security 6106.364108.460113.759106.859126.132106.535113.70199.239107.5104.5 8106.947110.180115.563109.134129.167108.205117.083105.3109.4105.5 10109.767111.352116.793110.484129.848109.566123.805107.8111.9108.7 12110.567112.405117.897111.411130.346110.865190.503109.7111.7109.4 Avg108.4110.6116.0109.5128.9108.8136.273105.5110.1107.0 Social Good 60.9070.9431.0880.9241.0080.8881.1930.7930.8430.849 80.9391.0121.1830.9781.0170.9161.1170.8970.9470.914 100.9620.9961.1981.0181.1451.0391.2410.9281.0680.994 121.0401.0581.2641.0891.1741.4921.1891.0201.1471.079 Avg0.9621.0121.1831.0021.0861.0841.1850.9101.0010.959 Traffic 60.1690.1820.1860.1660.2000.2710.2550.1720.2330.196 8 0.1690.1840.1900.1670.1990.2760.2570.1800.2240.207 100.1730.1900.1950.1760.2050.2790.2590.1870.2290.209 120.1750.2130.2170.1970.2150.3330.2610.2430.2890.269 Avg0.1710.1920.1970.1760.2050.2890.2580.1960.2440.220 3 The reproducibility issue with Environment arises in M-TSF [21], which corresponds to the last-additive setting in our experiments; the reported results differ from those obtained in our runs. 25 K Full Results of Main Experiments K.1 Text Model: BERT Table 18: Comparison of multimodal fusion strategies with Nonstationary Transformer (+ BERT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Nonstationary Transformer Agriculture 60.0510.0530.0520.0520.0600.0553.6110.0510.0540.0560.053 80.0710.0690.0710.0670.0720.0711.2430.0720.0710.0730.069 100.0920.1000.0890.0880.0980.0900.3980.0910.0900.0910.091 120.1160.1210.1190.1090.1210.1200.3530.1180.1280.1190.116 Climate 61.2051.3051.3101.1721.1241.2351.2331.2891.1851.2721.073 81.2401.2251.1241.2191.2311.3001.0641.2891.2381.2021.250 101.2831.2961.2751.2401.2171.1721.0591.2451.2241.2961.176 121.2621.2331.1991.2051.2571.2391.0691.2191.2021.2031.163 Economy 60.0160.0210.0180.0170.0160.0168.4690.0170.0170.0170.017 80.0180.0200.0190.0200.0170.0188.9680.0180.0190.0190.020 100.0200.0200.0190.0230.0200.0191.1010.0210.0190.0230.019 120.0200.0220.0190.0220.0220.0201.0870.0210.0210.0190.020 Energy 120.1060.1060.1020.0990.0860.1000.1810.1020.0940.0990.101 240.1590.2050.2100.2060.2070.1880.5550.2080.1660.1790.186 360.3330.3120.3150.3240.3040.2820.4960.3110.3040.2870.306 480.3910.4610.3790.4080.4130.4120.4990.3650.3540.3380.404 Environment 48 0.4300.4280.4330.4330.4320.4360.4510.4270.4250.4280.439 960.4330.4820.4280.4340.4190.4550.4520.4570.4340.4570.439 1920.4430.4990.4380.4420.4200.4730.4550.4300.4320.4750.436 3360.4380.4720.4340.4310.4350.4500.4770.4540.4580.4300.408 Public Health 12 0.8410.8160.8280.9990.8170.9111.2090.8580.7960.7160.724 241.1601.2480.9481.2471.1021.2361.8661.1861.3511.2221.032 361.2471.4421.2051.2981.4321.4421.6191.3611.2011.3401.371 481.5931.6221.4081.3011.6141.2411.7491.5411.5311.3431.269 Security 6103.4102.3102.0100.3101.3101.3123.2100.6100.8102.399.239 8104.9106.7105.3105.6106.1105.0123.6104.5103.5104.3105.3 10107.8107.9108.5108.0107.5107.9122.1107.8108.3107.9107.8 12109.3109.7110.0110.2109.5109.4119.5109.2109.0109.4109.7 Social Good 60.7900.8110.8110.7630.8170.7714.3970.8100.8280.7990.793 80.9090.9510.8800.9120.9440.8832.8170.9510.9110.8990.897 101.0180.9630.9790.9471.0450.9831.0951.0710.9330.9870.928 121.0731.1651.0380.9971.0751.1501.0711.1691.1021.0981.020 Traffic 60.1820.1830.1710.1800.1780.1710.8760.1860.1870.1740.172 80.1850.1840.1880.1940.2020.1910.6220.1900.1850.1850.180 100.1870.1910.1870.1910.1980.1890.2230.1890.1890.1910.187 120.2430.2540.2530.2510.2400.2400.2690.2510.2430.2420.243 26 Table 19: Comparison of multimodal fusion strategies with Koopa (+ BERT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Koopa Agriculture 60.0590.0560.0580.0585.6933.0443.0440.0600.0670.0550.056 80.0770.0780.0720.0720.1730.1080.1080.0760.0920.0730.076 100.0990.0910.0900.0900.4260.6240.6240.1000.1270.0910.100 120.1290.1190.1170.1171.0321.2331.2330.1310.1550.1170.128 Climate 61.2516.5931.2581.2584.1401.2191.2191.2771.0691.2531.253 81.2641.4991.2611.2611.1921.1221.1221.2911.0571.2621.258 101.2473.3551.2541.2541.4041.0451.0451.2961.1121.2431.247 121.2321.2541.2521.2521.1871.1911.1911.2601.0701.2421.232 Economy 60.0210.0330.0270.02713.55616.34616.3460.0280.0440.0260.017 80.0170.0180.0200.0200.6280.5400.5400.0240.0720.0180.018 100.0170.0220.0270.0273.7523.1523.1520.0270.0640.0180.017 120.0180.0280.0280.0280.30415.13915.1390.0270.0850.0220.018 Energy 120.1072140.30.1080.10856.9450.1560.1560.1170.1090.1100.107 240.2160.2560.2090.209283.80.3840.3840.2340.2160.1990.205 360.3000.2840.2960.2961.1990.5530.5530.3100.2960.2960.299 480.3884008.70.3970.3972.9011.6721.6720.4280.4100.3920.402 Environment 480.482271860512.00.4550.45553946020.00.4510.4510.4880.4490.4760.481 960.526488845344.00.4920.4923337.40.4840.4840.5380.4540.4900.536 1920.5671306404992.00.5240.524284732.70.4900.4900.5650.4830.5100.564 3360.52582270.40.5350.53580511.40.4960.4960.5300.4900.5230.528 Public Health 121.0291297.50.9950.99536.9181.0151.0151.2691.2330.9151.008 241.4021.1451.1521.1521.9831.2421.2421.5461.3791.1881.410 361.6141.2871.2591.2592.0831.5321.5321.7621.3111.2831.625 481.7281.4781.3781.3781.7931.9501.9501.8711.4551.3621.715 Security 6105.197.736103.4103.4136.4140.4140.4104.9109.3103.5104.5 8109.1106.4111.6111.6110.4110.4110.4113.7113.0107.2105.5 10108.8100.8108.8108.8117.7118.3118.3110.6111.6108.8108.7 12111.3109.4110.3110.3158.4149.1149.1113.0116.9110.2109.4 Social Good 60.8490.8540.8690.8693.8633.7573.7570.8470.7390.7930.849 80.9311.1950.9110.9110.9290.9620.9620.9390.8490.8770.914 100.9851.1451.0051.0051.2911.2451.2450.9980.8590.8950.994 121.0781.0451.0591.0593.1362.7382.7381.0600.8891.0201.079 Traffic 60.2260.2800.2210.2211.6951.8631.8630.2450.2380.2320.196 80.2140.2130.2250.2250.2040.2130.2130.2270.2190.2110.207 100.2160.2230.2240.2240.6530.4980.4980.2240.2180.2210.209 120.2740.3370.2850.2852.7421.3861.3860.2800.2770.2830.269 27 Table 20: Comparison of multimodal fusion strategies with DLinear (+ BERT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA DLinear Agriculture 60.1140.1270.1340.1320.5750.3200.4540.1350.2330.1330.085 80.3240.1820.1850.1810.9732.0191.3900.2060.2170.2230.099 100.1630.2030.2040.2040.4810.4190.1990.1780.2080.2050.127 120.2170.2390.2400.2420.4280.3500.3260.1620.4100.1800.183 Climate 6 1.1981.2771.2291.2031.3081.2521.2191.3061.0841.2411.090 81.3641.3181.2671.2471.2501.0641.1621.3281.0711.2411.090 101.1661.3121.3331.2851.1811.1011.1051.2891.1051.3051.116 121.1741.3431.3211.2891.1561.1171.1221.2901.0961.2571.098 Economy 60.0900.0920.0980.09718.03313.74913.1210.1070.3450.1090.024 80.2330.1290.1430.14312.67212.21112.8120.1310.2740.1530.025 100.0860.1230.1300.1312.6832.4072.4950.1020.2600.1510.051 120.1090.1280.1300.1291.8841.5552.2580.0640.5760.0910.060 Energy 120.1150.1930.1550.1400.2240.1700.1540.1320.1130.1310.096 240.2190.2830.2590.2590.7930.2870.2800.2620.2990.2250.201 360.3030.3470.3220.3220.5850.3460.3410.3220.3650.3690.284 480.4150.4740.4380.4580.6740.6930.6790.4390.4600.4580.391 Environment 480.4900.4330.4380.4390.4660.4470.4440.4950.4420.4430.489 960.5680.5060.4920.4730.5390.4930.4940.5700.4610.4780.572 1920.5830.5160.5010.4950.5710.5890.6140.5850.5280.5020.587 3360.5380.5110.4880.4840.5180.4830.4870.5360.4900.4810.538 Public Health 12 1.3301.3601.4031.3401.5271.5521.4821.5451.3341.4301.228 241.5781.5051.6541.5712.0051.7091.6631.6771.6901.5271.589 361.6191.6371.6761.6572.1121.6891.6441.7401.6911.6621.622 481.6911.7141.7491.7341.9631.8221.7791.7651.7271.6121.674 Security 6104.4105.1104.9104.3189.4169.6167.0105.7105.9104.6104.8 8109.2109.7107.8107.8153.0151.1147.8110.6107.9108.1109.0 10110.5113.3111.8111.3113.2112.8113.0110.6111.2110.4109.7 12112.0114.9113.3112.5113.3113.3113.2113.0114.6112.8111.7 Social Good 60.8170.8950.9070.8742.9901.9691.9030.8930.9280.8530.779 80.8851.1461.0091.1422.4032.1382.0320.9420.9460.9130.870 101.0261.0501.0901.0121.0740.9540.9591.1410.9831.1210.966 121.0101.1361.1331.0711.1441.0031.0311.0681.0401.0821.003 Traffic 60.2540.2890.3130.2942.9522.1582.6590.2850.3670.2900.227 80.3220.3220.3490.3272.2782.0122.1470.2700.3240.3390.224 100.2430.3230.3450.3310.6460.3160.5680.2640.3150.3410.219 120.2870.3420.3810.3680.5490.5450.3640.3070.3650.3190.261 28 Table 21: Comparison of multimodal fusion strategies with iTransformer (+ BERT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA iTransformer Agriculture 60.0590.0600.0610.0580.0620.0581.2730.0590.0600.0590.059 80.0760.0780.0810.0720.0760.0771.0970.0810.0800.0790.076 100.0970.1030.1010.0900.0970.0990.3470.1010.1030.0980.098 120.1280.1290.1290.1180.1280.1310.3900.1280.1310.1290.127 Climate 61.1001.1891.1401.1281.1301.1411.3261.1411.1301.1501.115 81.1781.2041.2031.1931.1621.1931.1581.2021.2311.1971.227 101.1861.2001.2011.1881.1741.2481.0381.2001.2041.2261.206 121.1761.1741.1631.1811.1971.1821.0351.1541.1911.1981.183 Economy 6 0.0150.0150.0150.0200.0150.01515.7740.0150.0160.0150.014 80.0150.0160.0150.0210.0150.01610.4790.0150.0160.0160.015 100.0150.0160.0150.0170.0150.0161.1910.0150.0150.0160.015 120.0160.0160.0160.0180.0150.0161.1920.0160.0160.0160.016 Energy 120.1110.1080.1020.1110.1070.1110.2140.1020.1070.1070.111 240.2280.2270.2300.2210.2270.2210.6720.2050.2160.2280.231 360.3020.2830.3180.2780.2740.3111.3010.2940.3170.2950.307 480.4080.4020.4050.4030.3740.4140.7090.4030.3950.3880.380 Environment 480.4110.4010.4080.4250.4040.4080.4510.4100.4070.4050.410 960.4180.4060.4290.4460.4180.4220.5100.4320.4250.4130.420 1920.4350.4320.4220.4630.4210.4370.5360.4380.4360.4300.430 3360.4180.4160.4270.4410.4130.4220.4150.4270.4250.4190.424 Public Health 121.0230.9661.0440.9651.0710.9810.9911.0480.9870.9401.043 241.4381.4651.4551.2261.4451.4331.5991.4531.4641.4481.437 361.6371.6471.6251.2911.6761.6201.8831.6231.6351.6271.628 481.7381.7491.7371.4571.7901.7111.9061.7351.7491.7511.747 Security 6 104.9104.2102.5112.7102.7103.2134.4100.8104.0102.8105.1 8108.7106.8106.3113.1105.8106.1134.6107.4106.5106.4108.9 10111.4112.4107.8109.6109.3109.0122.4108.2109.1108.4110.8 12114.0111.1110.0119.3111.1110.6118.6111.2112.6111.7111.1 Social Good 60.8300.8280.8320.7820.8320.8594.2100.8320.8270.8290.811 80.9610.9510.9700.8930.9240.9602.4050.9700.9390.9670.945 101.0611.0661.0620.9871.0631.0561.1281.0620.9461.0531.060 121.1441.1621.1681.0761.1301.1711.1231.1691.1391.1391.125 Traffic 60.1790.1870.1800.1870.2020.1881.6680.1820.1760.1790.176 80.1850.1880.1870.1970.1890.2021.9460.1870.1840.1780.189 100.1950.1950.1940.1990.1940.1960.4110.1880.1910.1870.197 120.2490.2480.2510.2550.2500.2510.2650.2520.2520.2320.238 29 Table 22: Comparison of multimodal fusion strategies with PatchTST (+ BERT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA PatchTST Agriculture 60.0580.0590.0600.0560.0590.0601.3570.0590.0600.0600.058 80.0780.0780.0800.0740.0780.0790.8160.0780.0790.0790.077 100.1000.0980.1010.0930.0980.1010.3820.1010.1010.1010.100 120.1320.1290.1290.1220.1290.1280.3710.1290.1310.1270.129 Climate 61.2571.2341.2381.2351.2341.2531.3321.2381.2481.2151.230 81.2651.2551.2241.2571.2551.2491.1831.2241.2521.2331.236 101.2741.2761.2561.2641.2761.2671.0921.2561.2621.2481.276 121.2391.2481.2421.2541.2481.2551.0901.2421.2641.2401.251 Economy 60.0160.0160.0170.0200.0160.01717.7350.0160.0170.0170.016 80.0170.0170.0190.0190.0170.01710.1800.0190.0170.0170.017 100.0170.0170.0170.0210.0170.0171.3510.0170.0170.0170.017 120.0170.0180.0180.0190.0180.0171.2500.0180.0180.0180.018 Energy 120.1050.1020.1100.1090.1020.1080.1930.1100.1120.1080.113 240.2260.2270.2220.2190.2270.2230.6060.2220.2240.2150.213 360.3100.3020.3060.2990.3020.3171.0740.3050.3190.3160.305 480.4140.4300.4240.4090.4300.4150.5570.4240.4170.4190.415 Environment 480.4590.4610.4570.4510.4610.4590.4690.4620.4560.4540.462 960.5040.4960.5030.4860.4960.5080.5630.4590.4930.4710.475 1920.5420.5410.5310.4940.5410.5400.6040.5390.5390.5190.507 3360.5090.5000.5110.5070.5000.5120.4760.5070.5130.5030.479 Public Health 12 0.9130.8910.9140.8490.8910.9191.0890.9140.9080.8810.901 241.3911.3161.2891.1581.3161.3491.7201.3791.4001.3301.364 361.6231.6201.6061.3621.6201.6241.4631.6051.6111.6161.607 481.7241.7271.7301.4731.7271.7181.7991.7301.7241.7271.743 Security 6104.5104.3102.9105.5104.3103.3142.8103.2104.6104.7104.5 8110.1109.6105.8104.7109.6106.3137.9106.8108.1107.3107.2 10113.3109.9108.8110.5109.9108.8119.7108.1109.8108.9110.8 12109.8111.4110.1111.0111.4111.7119.3110.1110.6112.1113.7 Social Good 60.8000.8020.7870.7640.8020.7854.0510.7880.8020.7920.770 80.8940.8910.8860.8400.8910.8812.8000.8850.9090.8990.880 101.0301.0311.0510.9601.0311.0511.0901.0511.0141.0461.010 121.1261.1181.0871.0141.1181.0771.0891.0851.1051.0761.059 Traffic 60.1840.1810.1840.1790.1810.1881.8190.1830.1880.1720.171 80.1810.1780.1850.1930.1780.1901.9370.1780.1850.1810.177 100.1890.1880.1840.2010.1880.2010.4430.1910.1880.1830.182 120.2380.2440.2530.2530.2440.2450.4340.2350.2410.2420.257 30 Table 23: Comparison of multimodal fusion strategies with FEDformer (+ BERT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA FEDformer Agriculture 60.0600.0630.0640.0580.0620.0581.3140.0630.0570.0610.054 80.0720.0750.0740.0760.0790.0681.0260.0720.0730.0710.069 100.0990.0960.0900.0910.1040.0960.2780.0890.0890.0890.090 120.1170.1130.1200.1180.1240.1190.4350.1180.1200.1230.117 Climate 6 1.1161.1551.0931.1411.0711.0851.2421.0931.0661.0551.058 81.0821.0711.0571.0781.0771.0501.0521.0571.0491.0761.044 101.1051.1091.1241.1021.0541.1120.8981.1241.1131.0941.068 121.0841.1161.1041.1061.1341.0610.9241.1061.1141.0941.092 Economy 6 0.0390.0600.0470.0270.0410.0457.6230.0260.0300.0330.046 80.0470.0530.0510.0550.0290.0497.8120.0450.0450.0460.040 100.0370.0510.0470.0390.0520.0341.6190.0470.0400.0530.044 120.0440.0460.0400.0540.0450.0381.4550.0400.0420.0340.041 Energy 120.0970.1030.0910.0970.1110.0950.1310.0910.0970.0920.093 240.1770.1840.1810.1770.1830.1791.1300.1810.1750.1750.179 360.2540.2960.2630.2510.3460.2540.7260.2560.2530.2540.254 480.3670.3840.4040.3630.4160.3571.2250.4510.3660.3650.356 Environment 480.4840.4710.4720.4630.4810.4300.4730.4830.4750.4500.459 960.4880.5140.5170.5010.4980.4820.4190.5160.5360.5180.493 1920.4820.5130.5280.4970.5280.5490.4760.5280.5050.5050.502 3360.4660.4700.4660.4810.4630.4640.4380.4690.4770.4770.445 Public Health 12 1.0501.0681.0841.0891.1681.0241.0711.0871.0891.0631.063 241.4881.4491.3111.3481.5291.3671.4491.3791.3831.4091.385 361.4191.5681.5541.5181.4921.4921.9271.4351.4211.4931.466 481.4991.5991.5821.6021.7361.5211.5641.5431.6171.5121.481 Security 6105.9110.0106.1102.3106.1107.9170.2104.4103.8110.1107.3 8109.1108.0109.6108.7110.3109.8142.9109.7110.0107.4106.8 10113.8112.3111.6110.9112.4116.2113.7111.6113.3108.2113.3 12117.2116.9114.8114.8114.7117.4116.7114.8115.6111.2114.3 Social Good 60.8740.8860.8070.8220.8610.8621.7830.8060.7730.7930.839 80.9190.8570.8840.8570.9000.8551.9620.8830.8730.8830.853 100.9140.9310.8950.9191.0440.9340.9610.8950.9030.9190.927 121.0150.9770.9940.9601.1081.0001.1170.9930.9730.9860.977 Traffic 60.1490.1560.1510.1580.1460.1532.0070.1610.1550.1500.153 80.1520.1580.1510.1600.1440.1581.3510.1510.1540.1540.159 100.1660.1730.1630.1660.1460.1600.2920.1640.1620.1590.157 120.2240.2290.2290.2400.2130.2130.4280.2320.2230.2210.230 31 Table 24: Comparison of multimodal fusion strategies with FiLM (+ BERT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA FiLM Agriculture 60.0610.1030.0620.0642.7760.0624.7120.0640.0640.0610.063 80.0820.2000.0810.0802.5500.0871.1790.0800.0800.0800.078 100.1050.1820.1050.1000.1490.1090.3370.1050.1070.1030.103 120.1430.1420.1330.1231.0220.1360.4650.1330.1360.1360.132 Climate 61.3051.2741.2821.2681.4271.2661.3201.2701.2871.2331.254 81.2821.2731.2681.2681.0691.2921.0991.3131.2631.2551.264 101.2789.6231.2741.2807.7071.2901.0971.2861.2851.2721.273 121.28010.1151.2801.2891.8821.2911.1101.3011.2811.3111.264 Economy 60.0170.0650.0260.03523.2480.02120.5950.0190.0180.0190.017 80.0180.0480.0200.02014.4500.02013.8530.0170.0170.0180.018 100.0190.0390.0200.0190.0390.0202.6480.0190.0180.0180.018 120.0187.4530.0230.0451.7760.0212.4410.0180.0180.0180.018 Energy 120.1192575.90.1170.1210.4020.1330.1780.1040.1030.1080.102 240.2290.3400.2500.2590.7470.2520.7910.2110.2160.2280.206 360.3020.3820.3330.3281.5550.3391.4470.3030.3180.3150.300 480.4350.4830.4310.4530.6730.4810.6450.4190.4300.4370.429 Environment 480.488177523232.00.4790.4710.4980.4770.5150.4900.4440.4480.490 960.54636514.00.5470.4900.5820.5370.6080.5460.5420.4800.540 1920.574985353600.00.5760.5420.5870.5610.6150.5730.5690.5730.570 3360.5290.5990.5140.5190.5390.5310.4810.5260.5000.5280.527 Public Health 121.1552890.21.1911.1422.0071.2711.2191.1171.1301.1221.106 241.4921.4351.4961.3392.7841.6431.8411.4531.4941.4741.432 361.6351.7791.6321.8382.0191.6812.3461.8731.6581.6601.685 481.7222.0531.7962.0351.9981.7801.8241.7221.7461.7611.739 Security 6124.2105.9102.1100.9208.1103.5191.0102.8101.6102.3107.1 8113.8116.7106.5116.5151.0105.7163.2109.3105.1106.3105.1 10117.0135.0109.3133.597.995108.1111.9109.4108.3109.1114.0 12119.0134.6110.0108.196.776110.3113.2132.4113.1110.9113.8 Social Good 60.90049.8550.8310.8223.6320.8403.8540.8200.7980.8350.818 80.9941.1010.9981.0492.1610.9712.3461.0000.9530.9450.967 101.119114.01.0311.191123.01.1071.1151.0141.1371.0591.123 121.216280.31.0871.09090.5551.1851.0801.2481.2371.1591.165 Traffic 60.2290.2850.2340.2524.3180.2353.5200.2560.2330.2340.236 80.2230.2260.2200.2210.6820.2302.4140.2300.2280.2190.233 100.2160.2240.2170.2150.3960.2190.5570.2410.2280.2170.215 120.2530.5150.2610.2700.2690.2670.5260.2830.2620.2710.251 32 Table 25: Comparison of multimodal fusion strategies with TiDE (+ BERT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA TiDE Agriculture 60.0710.0690.0820.0691.1520.0651.1590.0630.0630.0620.060 80.0910.0900.0920.0911.2260.0931.1790.0820.0820.0800.082 100.1127.1920.1100.11211.7230.1200.4350.1030.1040.1040.103 120.1608.0390.1440.13514.5640.1500.4050.1350.1340.1340.135 Climate 6 1.4561.5771.6061.5761.4911.4181.3631.2521.2441.2391.258 81.4691.5551.4811.4461.0481.5321.1721.2641.2741.2581.267 101.41911.5021.4671.4772.9381.5611.1841.2741.2721.2591.259 121.4673.9241.3951.4492.0431.5081.1541.2551.2781.2641.274 Economy 60.0340.0510.0390.04920.6040.04420.2060.0170.0170.0170.017 80.0490.0400.0350.04113.3860.03113.6640.0170.0180.0170.018 100.0295.0480.0310.04010.3350.0402.4670.0180.0170.0170.017 120.0335.3110.0310.0354.5360.0382.3910.0180.0180.0190.017 Energy 120.1352110.50.1390.133740.50.1340.1990.1150.1150.1080.112 240.2400.3530.2440.2620.7250.2430.7520.2120.2100.2170.220 360.3235843.10.3330.3212234.70.3321.5250.2940.3070.3050.300 480.4420.4680.4430.4300.6240.4420.6310.4030.4140.4070.408 Environment 480.4830.9550.4850.4420.5020.4850.5190.4850.4830.4850.484 960.5430.5130.5420.5120.5760.5390.6360.5410.5390.5390.540 1920.57613991.80.5780.58613866.60.5770.6890.5770.5770.5770.576 3360.53322627.60.5340.5074743.00.5330.4820.5340.5330.5330.534 Public Health 121.300182.91.3201.295240.61.3291.3961.0761.1281.1131.106 241.5202.2131.4991.6051.9081.5671.9011.4321.4341.4481.425 361.6702374.31.6681.7311646.71.6692.2761.6051.6191.6171.602 481.7871.9701.7561.8222.0181.7881.9101.7241.7231.7261.740 Security 6 146.1123.9134.9124.6201.4122.2191.0105.2107.2107.6107.5 8129.7137.7132.5137.5154.3128.9156.5108.3108.9112.4109.4 10132.3147.6135.2133.596.384135.4117.7112.7110.6114.6111.9 12145.3144.8139.2138.9107.5142.4120.8112.7112.3112.3111.7 Social Good 60.9550.9281.0360.9433.9950.9703.9550.8350.8360.8230.843 81.0371.1001.0131.0982.4761.0442.7110.9500.9400.9500.947 101.210134.71.0971.16870.6221.2430.9821.0711.0701.1091.068 121.415156.61.1861.29857.4311.2051.0751.1621.2521.1831.147 Traffic 6 0.2930.3170.2850.3173.5950.3013.1300.2360.2350.2360.233 80.3090.3060.2770.3010.9620.2722.2990.2360.2360.2340.224 100.25111.3960.2450.3206.9530.3020.6410.2240.2310.2250.229 120.31913.8320.3080.3288.8020.3340.5850.2880.2900.2840.289 33 Table 26: Comparison of multimodal fusion strategies with Autoformer (+ BERT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Autoformer Agriculture 60.0720.0720.0650.0630.0710.0621.2910.0650.0680.0660.061 80.0830.0820.0940.0800.0890.0701.2440.0820.0940.0790.089 100.1120.1060.1050.0970.1090.0960.3930.1040.1100.1000.097 120.1360.1400.1390.1290.1410.1340.5050.1250.1320.1280.130 Climate 61.2401.2781.2481.2221.1681.1921.2661.2441.2481.2061.225 81.2001.2341.2651.1991.1801.1681.1221.2411.1621.1321.179 101.2471.2501.2181.2491.2911.1691.0251.2101.2081.2111.211 121.1771.2081.1991.2651.1341.1880.9651.2491.2011.1951.178 Economy 60.0600.0690.0610.0580.0800.05514.6060.0660.0530.0710.066 80.0700.0660.0590.0550.0670.06310.4780.0590.0640.0510.040 100.0600.0640.0560.0330.0750.0471.7430.0430.0580.0620.062 120.0610.0670.0530.0580.0720.0601.5730.0610.0580.0520.045 Energy 120.1650.1450.1420.1480.1810.1250.3700.1530.1630.1700.159 240.2920.2990.3080.3020.3010.3171.2730.3090.3160.3090.323 360.3560.3900.3890.3790.3990.3371.6340.3640.3940.3580.349 480.4850.4780.4340.4950.4950.4791.2580.4830.4620.4670.490 Environment 480.5240.5340.5220.5380.5280.5100.4820.5210.4980.5240.472 960.5490.5540.5110.5640.5600.5400.5150.5000.5380.5310.509 1920.5880.6180.5350.5700.5620.5680.5740.5510.5350.5930.523 3360.5720.5140.5310.5250.4740.4980.4570.5550.5540.4790.516 Public Health 12 1.6681.6611.4711.7401.3631.4771.5051.5711.6661.4731.668 241.9871.9531.7951.9901.9451.8541.6931.8521.9211.8501.787 361.9201.9772.0041.8362.1231.9751.7471.9002.0411.9201.961 481.8561.9872.0541.9251.8751.7751.9301.9661.9831.9381.955 Security 6106.1107.9117.2106.8105.7111.6148.2117.1107.7108.3111.6 8110.7112.1101.7112.5110.9112.5137.2110.1113.4109.3106.7 10115.7115.2116.1115.2111.7111.3116.2116.1112.2111.7113.0 12117.8117.2112.3115.3115.5114.0118.3112.3113.9116.4112.5 Social Good 60.8380.8880.8350.8530.8380.8634.3340.8350.8740.9050.842 80.9530.9601.0040.9490.9900.9822.6460.9570.9380.9470.909 101.0581.0391.0580.9331.0971.0431.1231.0471.0621.0240.972 121.0521.0891.1301.0581.1841.0601.1741.1011.0861.1291.104 Traffic 60.1720.1930.1700.1830.1940.1681.2560.1740.1740.1800.183 80.2060.2000.1980.1970.2190.1791.7180.2110.1750.1860.190 100.1910.2140.2010.2020.1840.1760.4480.1870.1890.2030.201 120.2390.2660.2360.2510.2670.2260.4140.2380.2460.2260.230 34 Table 27: Comparison of multimodal fusion strategies with Crossformer (+ BERT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Crossformer Agriculture 60.0880.1320.1460.1150.0880.1520.2100.1590.1840.2130.166 80.2070.1940.2270.2351.2120.2650.3180.1710.2380.1670.247 100.2520.3010.2150.3760.1860.3000.4700.2290.1860.2550.260 120.3110.3250.2250.3660.4040.4110.3650.3870.3010.3180.393 Climate 6 1.0391.0541.0831.0651.0571.0421.0511.0621.0631.0121.119 81.0801.0701.0471.0661.1121.0721.1021.0951.0741.0801.090 101.0881.1171.0901.0831.0841.0721.0991.0621.0801.0991.113 121.0901.1181.0821.1081.1341.0761.0861.0831.0851.0861.105 Economy 6 0.4680.3150.3200.1660.2820.5080.4990.2480.3400.2890.157 80.1230.6020.3800.6426.1450.4920.7610.6380.3490.2850.588 100.1220.7000.3820.6730.5540.8660.6860.6930.4240.4450.365 120.3230.9680.2990.8760.6031.0411.2690.7460.4480.4720.406 Energy 120.1450.1330.1360.1350.1690.1650.1440.1220.1300.1400.155 240.2520.2560.2560.2390.2810.2250.2570.2340.2430.2530.261 360.3450.3570.3400.3370.3380.3540.3220.3260.3450.3340.338 480.4450.4530.4290.4460.4190.4300.3830.4600.4520.4480.427 Environment 480.4920.4990.4730.5000.4760.4740.4710.5000.4850.4830.448 960.5890.5360.5290.5290.5840.4920.5570.5610.5210.5350.513 1920.5510.5850.5710.5740.5320.5870.6000.4970.5720.5780.508 3360.5270.5390.5560.4940.4990.5390.5380.5220.5330.5270.511 Public Health 12 1.0891.1021.0801.1381.1741.0531.1031.1081.0821.1311.044 241.3051.3201.3521.3751.3941.4151.4401.4241.3281.2861.333 361.3211.3661.3831.3361.3741.3611.3931.3641.3741.3301.378 481.4401.4791.4391.4471.4561.4521.4761.5281.4731.4701.497 Security 6118.2122.5120.1120.2120.9118.4124.9119.6121.7119.7116.4 8121.3126.7123.8123.5133.2122.5127.1123.5123.3123.5122.2 10125.3126.2125.2125.5123.9128.6125.6125.1126.2125.6125.0 12123.3127.3127.2127.6130.2129.3129.1128.0127.0126.8125.6 Social Good 60.7550.7430.7430.7450.7320.7450.7270.7440.7340.7700.744 80.8130.8270.7970.8131.8680.8080.8170.8140.8090.8430.842 100.8850.8900.9140.8900.9270.8980.8740.8710.8630.8780.872 120.9410.9420.8940.9570.9700.9050.9140.9610.9110.9100.899 Traffic 60.2150.2110.2190.2100.2290.2130.2250.2130.2230.2030.217 80.2080.2150.2050.2081.0900.2200.2020.2150.2130.2370.234 100.2110.2270.2250.2280.2260.2240.2280.2200.2200.2190.209 120.2540.2470.2500.2440.2800.2440.2520.2510.2420.2480.246 35 Table 28: Comparison of multimodal fusion strategies with Reformer (+ BERT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Reformer Agriculture 60.1940.2220.3190.2030.2310.1820.3080.1320.1420.1980.183 80.2540.3830.3230.2440.2340.3750.4060.3250.2180.3910.231 100.2150.5190.3230.3390.5290.2990.7010.3220.3630.3790.295 120.4670.4900.5180.4350.6440.5880.3840.5510.4370.4790.242 Climate 61.0251.0361.0121.0921.0561.0600.9671.0101.0471.0281.074 81.0661.0701.0421.0511.0371.1011.0531.0411.0701.0881.079 101.0571.0981.1271.0661.1221.0970.9131.1341.0851.1201.073 121.0751.1091.0611.0331.1031.1031.0481.0631.0951.0711.108 Economy 60.1990.4430.5070.2710.7370.1510.7860.3430.3410.6320.477 80.3370.7200.3420.3070.8020.8140.9930.3370.4090.3720.146 100.1761.1550.7210.5960.8441.0781.6960.6800.2921.0830.558 120.1441.2491.3420.7481.3241.3180.8541.3410.9720.7280.483 Energy 120.2050.2210.2500.2360.2530.2810.2030.1350.1380.1770.190 240.4230.3870.3830.3850.3460.4580.3360.3660.3560.4380.351 360.5090.4750.4930.4840.4370.4900.4430.3870.5220.4610.419 480.5740.5330.5700.5680.5960.5290.5080.5560.5300.5720.532 Environment 480.4070.4040.4450.4200.4350.4300.4290.4330.4010.4180.421 960.4470.4090.4610.4330.4080.4580.4500.4360.4100.3860.436 1920.4190.4110.4580.4430.4110.4830.4320.4230.3900.4380.431 3360.4730.4110.4350.4370.3990.4330.4720.4110.4180.3830.413 Public Health 121.1571.1111.1421.1481.1371.0961.0851.1311.0481.1061.071 241.2311.4171.3201.2931.4451.3951.5001.3351.3391.3501.288 361.3071.3311.2881.4921.4071.2921.4091.2891.2851.3661.365 481.3561.4311.4161.4301.6091.4041.4751.2671.3311.4061.330 Security 6 118.3120.0123.7124.1118.5118.6122.4123.7125.2125.4123.6 8121.6121.1121.8124.5122.7123.8125.0121.8123.1124.0123.4 10126.2125.7124.5125.4129.0122.6126.7124.5125.0125.5125.3 12128.0122.1125.6125.9126.7124.7126.0125.6126.1125.1125.9 Social Good 60.8020.8190.8360.7750.8740.8230.7710.8360.8610.8000.787 80.8460.9180.9110.8520.9090.8930.8490.9000.8620.8870.865 100.9480.9290.9680.9141.0211.0170.9030.9670.9680.9050.989 121.0231.0971.0520.9631.0161.0231.0290.9210.9661.0130.973 Traffic 60.2080.2680.2290.2550.2500.3030.2590.2290.2330.2570.242 80.2240.2370.2250.2250.2260.2790.2300.2250.2290.2370.253 100.2610.2730.3090.2690.2420.2970.2250.2980.2350.2780.246 120.2450.2530.2560.2750.2520.2800.2490.2590.2660.2740.252 36 Table 29: Comparison of multimodal fusion strategies with Transformer (+ BERT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Transformer Agriculture 60.1330.1190.1770.1370.1590.1190.2170.1880.1650.1400.153 80.1470.3340.1430.2430.2100.1740.4330.1630.1610.2110.182 100.1770.5240.2820.2730.1330.2100.6660.2030.1720.1780.158 120.2470.3120.2540.3250.1740.2860.4850.2640.2400.2540.244 Climate 60.9851.0320.9891.0220.8750.9900.9830.9891.0241.0550.984 81.0581.0550.9891.0551.0861.0910.9900.9891.0321.0541.053 101.0681.0911.0561.0721.0661.0031.0161.0561.0211.0831.085 121.1020.9911.0291.0941.1221.0781.0731.0291.0561.0631.018 Economy 6 0.2670.2450.0590.2870.2390.2670.6100.1430.0610.0920.275 80.2870.5660.2830.1540.2820.3200.5710.3900.5060.5150.327 100.2581.0630.3150.6760.0690.1101.2940.4170.2620.2500.247 120.3621.0670.4200.1130.0500.5440.7060.0520.2370.2790.044 Energy 120.1220.1110.0910.1300.1090.1370.1010.1000.1050.0810.121 240.2890.2950.1790.2470.2250.1490.3180.2370.2400.2280.209 360.2610.4140.3260.3500.3800.2720.3100.3190.3180.2930.373 480.4500.4450.3480.5250.3900.4090.3900.3520.4800.3670.402 Environment 480.4290.4610.4280.4260.4120.4050.4360.4050.4200.4130.419 960.4620.4820.4500.4490.4450.4490.4670.4470.4320.4230.439 1920.4500.5350.4740.4460.4540.4370.5110.4740.4450.4270.466 3360.4510.4850.4590.4390.4650.4450.4520.4240.4360.4210.469 Public Health 12 1.1241.1371.2011.1211.0261.0371.0541.1020.8591.1041.021 241.3701.2781.2631.3881.1211.2201.3311.2631.2561.3401.286 361.3171.3301.2951.3081.4381.2841.3511.2951.2931.2581.330 481.3891.4501.3161.4791.4201.3781.4431.4151.4381.3091.482 Security 6124.0127.2126.3123.0122.4127.0123.1126.3125.1124.9125.9 8124.8130.8127.0128.7125.9129.5127.9127.1128.8128.6128.1 10129.7131.4130.6131.5125.0128.5129.5128.6132.8129.7129.0 12130.3117.9131.2128.6127.3132.0131.5131.2130.4130.6127.1 Social Good 60.7310.7840.7740.7630.7870.7180.7690.7580.7920.7440.740 80.7630.7920.8940.8280.8840.8960.8070.8890.8360.8260.791 100.8280.7880.9250.8450.9650.9180.8060.9520.9130.8840.864 120.8320.8540.9030.8280.9460.9150.9260.9030.9340.9070.921 Traffic 60.1670.1720.1570.1620.1750.1700.1530.1570.1560.1720.153 80.1590.1560.1750.1740.1920.1750.1750.1750.1580.1640.187 100.1810.2170.1820.1710.2020.2060.1820.1810.1830.1780.194 120.2140.2430.2070.2120.2460.2390.2060.2070.2110.1990.221 37 Table 30: Comparison of multimodal fusion strategies with TSMixer (+ BERT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA TSMixer Agriculture 60.0830.1050.1420.1170.1540.1420.1110.1030.1580.1130.092 80.1250.2200.2170.1800.2780.1820.1650.1850.2170.1740.136 100.4280.5151.1350.3900.3801.0770.2400.3231.3090.6190.204 120.8341.3521.0801.0801.4960.4580.3860.3971.3981.0120.281 Climate 61.0541.0611.0611.0801.0851.0831.1011.0651.0541.0691.070 81.0881.0731.0971.0841.1001.0851.0961.1111.1081.0911.095 101.1051.1161.1771.1431.1061.1061.0971.1151.1231.1241.113 121.1061.1681.1321.1531.1481.1061.1121.1411.1361.1321.106 Economy 60.0740.1120.0990.1080.0650.0910.0820.0960.0670.0750.050 80.0630.4110.4050.1580.6220.2450.1540.2010.2190.1170.080 100.1091.4451.7480.7090.1071.9000.1360.3732.0451.2390.155 120.8742.4452.2922.2042.5450.6030.5231.2252.0472.1540.208 Energy 120.1420.1610.1640.1400.1570.1490.1460.1620.1520.1630.115 240.2860.2740.2780.3030.2550.2370.2820.2380.2720.2610.264 360.3600.3470.3620.3600.3690.3590.3410.3150.3560.3470.360 480.4560.4360.5040.4700.4490.4700.4180.4160.4880.4460.445 Environment 480.5420.5050.5090.4870.4610.4790.4930.5250.4380.4880.502 960.6060.4950.5180.4890.5150.4490.5490.5920.4780.4740.556 1920.6600.6450.5370.5000.6070.5830.5310.6410.5040.4980.619 3360.5810.5760.5800.5550.5610.5590.5300.5870.5340.5600.558 Public Health 121.1500.9911.0991.1371.1601.1021.0651.0901.1951.0951.086 241.3931.5511.4071.4151.3721.4131.4621.3081.4801.4521.444 361.5001.7211.5871.4721.5361.5621.5031.3621.5051.5251.466 481.6871.6081.7381.6271.6181.7421.5991.5861.7001.6351.625 Security 6 128.0127.8128.0128.9125.4127.4129.6126.7128.6127.6124.4 8127.4124.2128.3128.8125.6119.5133.6128.0126.0127.5126.4 10128.2131.5130.5132.6128.6128.5128.7134.2128.8129.7128.2 12130.7133.2133.3133.5132.4127.2130.7133.0132.3132.5131.3 Social Good 60.7780.8120.7640.8270.7830.7560.7510.8050.8260.7740.742 80.8620.9391.0340.9400.7910.9290.8840.8830.9000.8760.819 100.8670.9451.1030.9350.8810.9240.9550.9931.1121.0450.832 120.9020.9771.1340.9690.9730.9390.8901.0251.0641.1000.873 Traffic 60.2030.2260.2300.2050.2210.2220.2330.2210.2370.2310.190 80.2160.2480.2280.2340.2760.2510.2550.2870.2490.2280.185 100.2290.3260.4240.3200.2380.4500.2380.2690.3880.2550.207 120.3030.5270.5440.3960.5120.2830.2800.3620.4770.5060.266 38 Table 31: Comparison of multimodal fusion strategies with Informer (+ BERT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Informer Agriculture 60.2120.3730.2830.3190.3080.1940.2280.1890.3950.2930.198 80.3150.4150.2510.3460.2040.2800.3220.1930.3920.3520.348 100.4170.2360.3860.4400.5780.3530.3810.3210.3520.3110.403 120.5650.4050.6210.5930.3200.6480.5970.5810.4360.3650.453 Climate 6 1.0701.0661.1111.0581.0191.0341.0221.0461.0311.0611.040 81.0811.0651.0831.0730.9161.0871.0731.0471.0731.0491.108 101.0601.0741.0811.0670.9881.0721.0601.0741.1061.0751.104 121.0501.0861.0971.0730.9161.0361.0671.0951.0851.0951.096 Economy 6 0.4970.7650.2720.7240.5970.3720.7370.4060.5320.3680.691 80.5291.0030.5960.8480.3371.1531.0260.5410.4610.4320.467 100.7611.0110.7091.0011.1350.2310.8040.3740.8330.6241.208 121.1131.2160.5951.0360.8031.5011.1410.0620.7770.9140.768 Energy 120.1370.1540.2120.1690.1850.1720.1710.2170.1550.1890.155 240.3160.2960.3020.3160.2910.3400.3380.2490.2990.3080.300 360.4110.3720.4440.3750.4300.3740.3550.4300.3170.3870.367 480.4830.4640.4880.5460.4660.4950.4420.4980.4780.4910.528 Environment 480.4240.4220.4280.4380.4400.4150.4420.4470.4370.4400.409 960.4370.4370.4480.4460.4460.4370.4430.4080.4120.4050.416 1920.4490.5080.3990.4630.4270.4680.4190.3980.4280.4140.410 3360.4640.4640.4730.4690.4350.4660.4850.4430.4100.4010.468 Public Health 12 1.2361.0541.0751.0661.0110.9971.0561.0161.0841.0021.131 241.3731.1751.3831.3761.2671.3211.2861.3391.2551.4261.252 361.5401.3491.4491.4421.2001.4311.4731.4341.4411.5441.406 481.6431.5511.6591.6371.3101.5961.5691.5241.6251.6701.633 Security 6124.8127.2126.3128.0124.0127.6126.3124.3126.4127.4127.6 8126.7130.4128.9129.9126.4127.2126.7125.5125.4126.3125.6 10127.8126.6126.7126.1130.8130.0128.3129.2130.8131.2129.1 12131.1132.6131.2131.8128.0131.9130.5132.1130.2128.0131.3 Social Good 60.7260.7060.7290.7230.7790.7500.7330.7470.7300.7260.753 80.7870.8030.7810.8080.8670.8160.8250.8170.7920.7950.778 100.8250.8470.8330.8660.8200.8780.8350.9090.8490.8150.873 120.8330.8790.8300.8580.9290.8770.8580.9390.8680.9180.872 Traffic 60.1640.1720.1730.1490.1580.1630.1500.1780.1520.1500.170 80.1610.1910.1580.1690.1870.1630.1630.1810.1690.1620.167 100.1980.1870.1950.1820.2050.1810.1790.1770.1740.1730.182 120.2250.2280.2070.2200.2030.2150.2190.2110.2050.2030.222 39 K.2 Text Model: GPT2 Table 32: Comparison of multimodal fusion strategies with Nonstationary Transformer (+ GPT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Nonstationary Transformer Agriculture 60.0550.0610.0500.0520.0620.0541.7460.0510.0530.0580.053 80.0710.0730.0700.0680.0730.0701.0260.0700.0700.0740.067 100.0920.1020.0900.0880.1000.0910.3670.0920.0920.0900.092 120.1160.1210.1190.1100.1270.1200.4280.1210.1260.1180.116 Climate 61.2061.3161.3021.1921.1061.2371.2961.3041.1791.2261.084 81.2401.3011.1281.2321.2481.3021.1541.2861.2381.2031.246 101.2831.2961.2811.2241.2481.1501.0781.2461.2111.2971.192 121.2621.2371.2051.1931.2571.2311.0831.2211.2031.2031.156 Economy 60.0160.0180.0180.0170.0170.0167.8500.0180.0170.0180.017 80.0180.0190.0190.0200.0200.0198.3420.0180.0190.0190.019 100.0200.0200.0190.0230.0210.0211.0540.0210.0200.0230.019 120.0200.0220.0220.0200.0230.0201.0010.0200.0200.0210.020 Energy 120.1060.1150.1040.0930.0850.0940.1780.1020.1040.0980.105 240.1590.2020.1770.2090.2070.2200.5950.1950.2090.2060.231 360.3330.2970.3030.3130.2970.2920.9270.2980.2950.2730.299 480.3910.4590.4410.4130.4200.4020.5860.3660.3620.3980.342 Environment 480.4300.4360.4460.4370.4480.4320.4480.4360.4260.4310.430 960.4330.4770.4570.4410.4220.4610.4510.4550.4460.4410.438 1920.4450.5000.4540.4440.4620.4880.4610.4340.4310.4790.423 3360.4380.4990.4300.4270.4550.4310.4840.4360.4400.4370.476 Public Health 120.8410.7930.8190.9400.8210.8911.1430.8300.8000.7140.714 241.1601.1740.9541.2141.1081.2941.6621.1811.3741.2121.030 361.2471.2081.2411.2921.4561.4611.4731.3801.5151.3391.276 481.5931.7201.5521.2581.6971.3121.5841.5491.4491.3471.323 Security 6 103.4102.4105.9105.6101.3106.0121.3101.4100.7102.6100.3 8104.9105.7107.5106.9106.0104.7124.7104.5103.5104.1104.8 10107.8108.0107.9108.2107.5108.0122.5107.7108.4108.3108.5 12109.3109.5109.8109.0109.5109.5117.3109.2108.7109.6109.6 Social Good 60.7900.8060.7910.7500.8320.7704.4290.8010.8290.7930.798 80.9080.9450.8840.9111.0210.8712.8700.9530.9050.8840.893 100.9960.9660.9690.9251.0600.9731.0921.0720.9280.9520.926 121.0731.0451.0270.9881.0891.0251.0561.1471.0681.0721.022 Traffic 60.1820.1730.1820.1910.1860.1720.6160.1850.1880.1750.173 80.1850.1900.1880.2010.1950.1860.5460.1910.1860.1840.185 100.1870.1860.1820.1920.1900.1770.2520.1900.1820.1920.188 120.2430.2550.2510.2570.2350.2400.2620.2440.2480.2420.242 40 Table 33: Comparison of multimodal fusion strategies with Koopa (+ GPT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Koopa Agriculture 60.0590.0570.0580.0583.5811.5391.5390.0610.0720.0570.056 80.0770.0700.0710.0710.1880.1870.1870.0770.1290.0740.076 100.0990.0920.0900.0901.2800.4060.4060.1000.1180.0910.100 120.1290.1180.1180.1180.7990.6620.6620.1300.1740.1170.129 Climate 61.25114.4691.2551.2556.5921.2191.2191.2681.0791.2461.253 81.2641.5411.2671.2671.1951.1141.1141.3201.0721.2681.258 101.2473.6191.2541.2541.4261.0531.0531.2991.1151.2461.247 121.2321.2481.2511.2511.1951.2021.2021.2491.0841.2491.232 Economy 60.0210.0370.0260.02611.63411.95711.9570.0300.0540.0230.017 80.0170.0190.0190.0190.7390.7960.7960.0230.0340.0180.018 100.0170.0250.0210.0214.0703.3923.3920.0280.0530.0170.017 120.0180.0300.0290.02913.55814.69414.6940.0290.0630.0250.018 Energy 12 0.1077203.50.1070.107195.30.1620.1620.1150.1110.1100.107 240.2160.2960.2050.205988.70.3940.3940.2180.2130.1960.205 360.3000.2990.2930.2930.6810.5040.5040.3140.3090.2970.299 480.38822512.60.4040.40431.8151.0751.0750.4340.4140.3850.402 Environment 480.4821124061696.00.4570.457237790768.00.4420.4420.4870.4620.4640.481 960.5261906434688.00.4730.4730.9540.4440.4440.5390.4560.4840.533 1920.5674849368576.00.5260.52639476484.00.4980.4980.5670.4980.5260.564 3360.5281709358592.00.5310.531593.90.4950.4950.5290.5150.5100.528 Public Health 121.0291956.10.9400.940113.80.9410.9411.3431.3010.9081.008 241.3951.1901.1391.1391.8271.2211.2211.5821.3801.1641.410 361.6141.2431.2851.2852.0721.6261.6261.7381.3021.2721.625 481.7291.5301.3741.3742.0991.7361.7361.8851.4871.3621.715 Security 6104.997.303103.8103.8136.5138.2138.2103.0108.8103.6104.5 8109.1120.2108.8108.8108.2110.5110.5112.9114.3104.6105.5 10108.8105.3107.6107.6117.7116.7116.7109.8111.0108.7108.7 12111.3109.8110.1110.1158.0160.8160.8114.6113.1109.7109.4 Social Good 60.8470.8480.8710.8713.1683.4273.4270.8810.7530.8300.851 80.9311.0190.9390.9390.9750.9620.9621.0110.8590.8760.912 100.9981.0581.0151.0151.3681.2431.2430.9960.8620.9380.994 121.0811.0571.0771.0773.2082.7542.7541.0850.8891.0271.079 Traffic 60.2260.2780.2460.2460.5700.4860.4860.2510.2390.2420.196 80.2140.2250.2230.2230.2000.2040.2040.2240.2180.2210.207 100.2160.2290.2270.2270.2750.2410.2410.2290.2260.2210.209 120.2740.2930.2830.2830.6790.5340.5340.2930.2710.2860.264 41 Table 34: Comparison of multimodal fusion strategies with iTransformer (+ GPT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA iTransformer Agriculture 60.0590.0600.0620.0570.0600.0581.0360.0600.0600.0600.059 80.0770.0790.0820.0720.0770.0801.0570.0810.0790.0810.076 100.0960.1040.1030.0900.1010.1020.3010.1030.1010.1030.099 120.1280.1310.1340.1170.1310.1330.3030.1270.1320.1290.125 Climate 61.0961.1851.1741.1311.1221.2121.3261.1231.1681.1511.121 81.1781.2071.2321.2021.1691.2151.1871.2311.2581.2031.236 101.1861.2111.2231.1881.1821.2491.0501.2251.2231.2311.212 121.1761.1751.1551.1791.1551.1801.0371.1491.2111.1771.187 Economy 60.0150.0150.0150.0180.0150.01515.9140.0150.0160.0150.014 80.0150.0150.0150.0190.0140.01610.0600.0150.0160.0160.016 100.0150.0160.0150.0160.0150.0160.8840.0150.0160.0160.015 120.0160.0160.0160.0190.0150.0161.0160.0160.0160.0160.016 Energy 120.1110.1040.1030.1150.1020.1100.2240.1030.1080.1080.112 240.2280.2170.2260.2070.1940.2170.7320.2250.2270.2030.222 360.3020.2820.3110.3060.2570.3051.2410.3120.2950.3010.295 480.4080.3980.3690.4000.4010.3840.6630.3940.4050.3930.393 Environment 480.4110.4270.4110.4220.4120.4300.5100.4110.4110.4050.416 960.4180.4260.4260.4450.4440.4310.6470.4330.4250.4210.438 1920.4350.4340.4200.4600.4370.4510.5700.4330.4430.4320.439 3360.4180.4200.4220.4420.4170.4220.4140.4250.4220.4200.428 Public Health 12 1.0230.9691.0360.9421.0580.9761.0071.0390.9720.9530.975 241.4381.4651.4521.2091.4461.4351.5501.4491.4621.4401.460 361.6371.6671.6191.2901.6771.6171.8771.6171.6311.6261.628 481.7381.7471.7351.4681.7951.7211.7331.7331.7491.7481.752 Security 6104.9104.2102.5102.1103.0103.6133.0102.7104.7102.9104.0 8108.7106.8106.3114.0106.4105.7134.6107.0106.1106.3108.9 10111.4111.4107.9109.5109.3108.4120.8108.1109.5108.3110.6 12114.0111.4109.8119.7110.9110.7118.2111.1111.9111.1111.8 Social Good 60.8250.8260.8230.7820.8410.8204.2690.8230.8170.8230.804 80.9610.9580.9660.8950.9230.9482.4820.9680.9410.9470.921 101.0611.0651.0580.9861.0771.0421.1281.0571.0511.0391.047 121.1441.1521.1661.1031.1271.1661.1241.0461.1341.1391.048 Traffic 60.1790.1910.1700.1900.1860.1630.5740.1770.1730.1630.173 80.1850.1820.1920.1980.1950.1811.3580.1860.1800.1750.189 100.1950.1960.1830.2030.1980.1910.2520.1850.1810.1850.186 120.2490.2380.2400.2550.2510.2540.2510.2420.2360.2410.235 42 Table 35: Comparison of multimodal fusion strategies with DLinear (+ GPT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA DLinear Agriculture 6 0.1140.1200.1320.1320.4550.8020.3920.1120.2350.1330.085 80.3240.1730.1750.1831.1681.3721.1820.2060.2320.2210.099 100.1630.1980.2030.2090.5320.4450.4360.1970.2210.2160.127 120.2170.2410.2430.2420.4460.3640.4260.1620.3480.1800.183 Climate 61.1981.2921.2551.2261.3121.2411.2261.3121.0801.2851.090 81.3641.3291.2881.2641.2401.1721.1601.3181.0841.3681.090 101.1661.3261.3361.3061.1931.1251.1221.2811.1091.3141.116 121.1741.3501.3561.3371.1641.1311.1291.3361.0871.2811.098 Economy 60.0900.0920.0970.09818.05015.14913.8410.1070.4310.1090.024 80.2330.1330.1410.14311.97310.79211.6500.1310.1840.1540.025 100.0860.1200.1300.1302.4882.1692.2650.1390.1480.1450.051 120.1090.1190.1300.1311.5431.6131.7590.0650.2300.0920.060 Energy 12 0.1150.2110.1650.1500.2280.1910.1660.1400.1600.1380.096 240.2190.2970.2770.2580.7100.2840.2860.2700.2900.2400.201 360.3030.3570.3440.3281.0650.3490.3450.3380.3550.3440.284 480.4150.4400.4710.4290.6790.7050.6730.4430.4450.4730.391 Environment 480.4900.4620.4600.4520.4990.4720.4620.5070.4350.4610.489 960.5680.4670.4730.4590.5370.4910.4760.5650.4740.4660.572 1920.5830.5260.5190.5100.5740.5670.5630.5870.5820.5180.587 3360.5380.5070.5050.4970.5020.4750.4860.5390.5080.4950.538 Public Health 121.3301.4571.4651.3811.2861.3091.1861.6641.4141.3641.228 241.5781.5391.3971.4451.7841.6531.6051.6881.6951.4521.589 361.6191.6711.6881.6482.2471.7501.6791.7871.8431.6351.629 481.6911.7491.7251.7241.8421.8371.8091.8041.7751.6161.674 Security 6 104.4105.8105.0104.5192.5176.2170.7105.8110.3104.6104.8 8109.2110.0108.8108.3151.3149.4147.6110.2107.0108.6109.0 10110.5112.5110.9110.8113.0112.2112.9111.0106.1110.8109.7 12112.0114.5114.4113.3113.4112.8113.4113.3111.2113.2111.7 Social Good 60.8170.9410.9490.9183.0332.0721.8690.9220.9310.8780.779 80.8851.0551.0291.0042.5222.3042.1390.9580.9600.9660.870 101.0261.1581.1421.1081.1030.9620.9511.1800.9871.1310.966 121.0101.2321.1551.2401.0541.0111.0041.0891.0451.1281.003 Traffic 60.2540.3020.3280.3090.6730.6770.6180.3130.3710.3130.227 80.3220.3370.3590.3411.6172.1831.5280.2690.3320.3590.224 100.2430.3340.3540.3410.5110.2970.3580.2940.3140.3570.219 120.2870.3630.3900.3790.5620.3750.5250.3270.3160.3320.261 43 Table 36: Comparison of multimodal fusion strategies with PatchTST (+ GPT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA PatchTST Agriculture 60.0590.0600.0620.0570.0600.0610.6390.0620.0590.0600.059 80.0780.0790.0810.0760.0790.0800.8140.0810.0790.0790.077 100.1000.0990.1010.0970.0990.1020.3420.1010.1020.1010.100 120.1290.1290.1290.1200.1290.1290.3940.1280.1290.1300.129 Climate 61.2571.2341.2401.2371.2341.2591.3521.2401.2631.2241.225 81.2651.2551.2261.2631.2551.2501.1351.2261.2571.2301.253 101.2741.2761.2611.2811.2761.2721.0931.2611.2711.2501.276 121.2391.2481.2461.2601.2481.2571.0951.2461.2691.2421.253 Economy 60.0160.0160.0160.0170.0160.01816.3000.0170.0180.0170.016 80.0170.0170.0180.0190.0170.0179.7310.0180.0180.0180.017 100.0170.0170.0170.0190.0170.0171.0140.0170.0180.0170.017 120.0170.0180.0170.0220.0180.0181.0120.0180.0180.0180.018 Energy 120.1050.1020.1110.1120.1020.1070.2260.1110.1060.1110.113 240.2260.2270.2130.2130.2270.2160.6390.2130.2080.2130.215 360.3100.3020.3100.2900.3020.2901.3820.3100.3090.3010.306 480.4140.4300.4170.3920.4300.4190.6440.4160.4050.4160.418 Environment 480.4590.4610.4580.4550.4610.4510.5140.4580.4520.4620.447 960.5040.5110.4500.4760.5110.4860.6900.4650.4390.4570.449 1920.5480.5360.4430.5190.5360.4940.6570.4450.4560.4520.438 3360.4970.5000.4370.5190.5000.4520.4710.4570.4580.4470.433 Public Health 12 0.9130.8910.9260.7740.8910.9481.0110.9320.8690.9120.907 241.2991.3161.3891.1631.3161.3811.6891.3311.3341.3261.301 361.6231.6201.6071.3661.6201.6181.4801.6071.6051.6131.596 481.7241.7271.7311.4741.7271.7241.7091.7311.7271.7241.754 Security 6104.5104.3102.5104.2104.3103.3140.4102.4105.0102.9104.5 8110.1109.6105.3107.9109.6106.3138.0105.1108.1107.3107.4 10112.5109.9108.9109.7109.9108.3118.2107.6110.2109.3110.7 12110.8111.4110.2110.3111.4112.2118.4110.7110.7112.7113.7 Social Good 60.7940.8020.7820.7640.8020.7743.8670.7820.7860.7930.772 80.9030.8960.8810.8450.8960.9002.7180.8810.8980.9010.881 100.9911.0511.0460.9641.0511.0491.1021.0461.0171.0320.980 121.1331.1131.0630.9881.1131.0701.0911.0621.0571.0881.047 Traffic 60.1830.1810.1820.1850.1810.1790.3630.1790.1770.1800.173 80.1850.1780.1940.1970.1780.1801.9200.1880.1820.1840.192 100.1890.1880.1830.2030.1880.1940.3060.1860.1850.1800.183 120.2400.2440.2460.2550.2440.2360.3850.2410.2360.2350.257 44 Table 37: Comparison of multimodal fusion strategies with FiLM (+ GPT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA FiLM Agriculture 60.0610.0830.0630.0662.7060.0651.9550.0630.0640.0610.063 80.0820.0870.0810.0821.4230.0880.8160.0800.0830.0810.078 100.10521.5290.1040.0990.1490.1060.3620.1070.1060.1020.103 120.1430.1390.1330.1220.9730.1360.4490.1330.1370.1370.132 Climate 61.3051.2631.2801.2601.5591.2641.2991.2191.2931.2371.254 81.2821.2731.2771.2691.0341.3171.1551.3101.2691.2701.264 101.27810.9761.3051.28014.5181.2971.1061.2841.2871.2721.273 121.28021.6821.2801.2889.3791.2851.1121.3081.2811.3181.264 Economy 60.0170.0370.0270.03321.3520.02118.8020.0190.0180.0180.017 80.0180.0470.0210.02111.2800.02313.1570.0180.0170.0180.018 100.0190.0400.0190.0180.5530.0222.3640.0180.0180.0180.018 120.01836.5340.0200.0431.6850.0212.2380.0180.0180.0180.018 Energy 120.1199874.30.1200.1320.3210.1480.1990.1050.1070.1090.102 240.2290.3580.2560.2660.7250.2520.7250.2150.2190.2330.206 360.30234795.40.3200.3351.4810.3231.3510.3000.3150.3090.300 480.4350.4860.4430.4430.6990.5060.6730.4190.4300.4480.429 Environment 480.488218219.00.4820.4860.4990.4880.5000.4910.4770.4660.490 960.54635537.80.5120.4940.5790.4830.6040.5460.5310.4940.540 1920.5743251248128.00.5420.5460.6040.4980.6710.5750.5490.5140.570 3360.5292551701760.00.4780.52258696568.00.4990.4860.5270.4810.5000.527 Public Health 121.1553605.61.1951.1842.0731.4811.2491.1211.1381.1291.106 241.4921.4171.4961.3713.4811.6461.4501.4501.4821.4251.432 361.6351.6861.6581.8532.6021.6682.3571.7191.6921.6421.685 481.7222.0661.7752.0522.2021.7901.8501.7281.8061.7761.739 Security 6124.2106.8102.1100.9210.4103.4189.5102.8101.6102.0107.1 8113.8116.8106.0116.4148.3105.5164.3110.8106.3106.2105.1 10117.0136.7109.0133.495.991107.9111.8109.4108.2108.8114.0 12119.0128.9109.8108.298.285111.2113.1112.2113.6111.6113.8 Social Good 60.900193.90.8540.8274.3180.8484.2210.8190.8100.8150.818 80.9941.3631.0051.0272.1581.0552.7601.0020.9640.9440.967 101.119397.31.0401.202423.61.1061.0961.0141.1121.0461.123 121.2161045.61.0741.071320.61.1821.0811.2181.1911.1541.165 Traffic 60.2290.2980.2340.2521.3950.2350.6060.2680.2290.2390.236 80.2230.2180.2200.2212.3190.2291.4380.2300.2260.2200.233 100.2160.2270.2170.2150.3000.2180.4230.2400.2200.2170.215 120.2531.6050.2520.2700.2420.2680.3260.2850.2580.2740.251 45 Table 38: Comparison of multimodal fusion strategies with TiDE (+ GPT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA TiDE Agriculture 60.0730.0680.0820.0681.7200.0651.6990.0630.0630.0620.060 80.0880.0910.0920.0921.0050.0930.8260.0820.0820.0800.082 100.11217.1630.1100.11219.2120.1200.4190.1030.1040.1040.103 120.16020.6070.1440.12429.7330.1500.3530.1350.1340.1340.135 Climate 61.5151.6011.6061.6001.4961.4181.3791.2521.2441.2391.258 81.4861.5671.4811.4581.0811.5321.1841.2641.2741.2581.267 101.40814.3501.4671.4865.7181.5611.1971.2741.2721.2591.259 121.46810.7391.3951.4803.5211.5081.1641.2551.2781.2641.271 Economy 6 0.0330.0520.0390.05319.5100.04419.0470.0170.0170.0170.017 80.0430.0420.0350.04112.4500.03112.7750.0170.0180.0170.018 100.03024.1020.0310.03947.4780.0401.8570.0180.0170.0170.017 120.03327.8940.0310.03757.6380.0381.9460.0180.0180.0190.017 Energy 12 0.1358167.20.1390.1352692.90.1340.2140.1150.1150.1080.112 240.2410.3890.2440.2620.7790.2430.7840.2120.2100.2170.220 360.33026003.60.3330.3288011.00.3321.3820.2940.3070.3050.300 480.4430.4670.4430.4390.6840.4420.6920.4030.4140.4070.408 Environment 480.4840.8080.4850.4750.4720.4850.4830.4850.4830.4850.484 960.5410.5120.5420.5130.5700.5390.6300.5410.5390.5390.540 1920.5778866.80.5780.5615217.40.5770.6120.5770.5770.5770.576 3360.5337686.40.5340.510977.90.5330.4940.5340.5330.5330.534 Public Health 121.2991837.71.3201.3281131.51.3291.2441.0761.1281.1131.106 241.5132.1801.4991.6461.6731.5671.6851.4321.4341.4481.425 361.67611842.91.6681.7555477.21.6692.3081.6051.6191.6171.602 481.7871.9771.7561.8441.8451.7881.7731.7241.7231.7261.721 Security 6146.1125.0134.9125.0202.6122.2192.6105.2107.2107.6107.5 8129.7138.8132.5138.7154.2128.9155.8108.3108.9112.4109.4 10132.3162.6135.2134.7101.0135.4116.3112.7110.6114.6111.9 12145.2149.2139.2139.4102.7142.4119.8112.7112.3112.3111.7 Social Good 61.0120.9741.0360.9613.9880.9703.9500.8350.8360.8230.843 81.0621.1291.0131.1292.6731.0442.4800.9500.9400.9500.947 101.250571.21.0971.185269.01.2431.0111.0681.0701.1091.068 121.415737.31.1861.298231.71.2051.0591.1621.2521.1831.147 Traffic 60.3040.3260.2850.3260.4390.3010.4380.2360.2350.2360.233 80.3090.3120.2770.3100.4690.2720.4240.2330.2300.2340.224 100.24931.6140.2450.3289.2880.3020.6230.2240.2310.2250.229 120.31939.8030.3080.33613.8680.3340.4040.2880.2900.2840.289 46 Table 39: Comparison of multimodal fusion strategies with FEDformer (+ GPT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA FEDformer Agriculture 60.0620.0630.0620.0570.0610.0620.6040.0640.0620.0550.059 80.0710.0760.0740.0710.0830.0730.7260.0750.0780.0690.069 100.0920.1030.0900.0940.1040.0960.3620.0950.0930.0920.093 120.1150.1180.1160.1230.1310.1220.3160.1160.1220.1220.117 Climate 61.1161.0611.0981.1821.0711.0861.2611.0981.0951.0771.060 81.0821.0911.0601.0911.0301.0631.0511.0551.0931.0751.101 101.1051.1021.1091.1081.0531.1150.8961.1111.0951.0981.066 121.0841.1161.1001.0921.1181.0540.9311.1001.1201.1121.093 Economy 60.0390.0470.0490.0360.0410.04510.2290.0490.0460.0290.048 80.0470.0530.0530.0500.0290.0367.8420.0520.0360.0530.041 100.0400.0510.0480.0450.0510.0350.9000.0480.0410.0360.044 120.0460.0450.0410.0560.0460.0321.0950.0400.0370.0470.044 Energy 120.0980.1110.0940.0960.1130.0930.1290.0940.0970.0910.092 240.1830.1840.1840.1750.1840.1790.7900.1840.1760.1780.179 360.2540.3320.2680.2510.3040.2570.7920.2610.2530.2530.256 480.3750.4030.3700.3670.4740.3590.7760.3700.3660.3580.386 Environment 480.4430.5020.4780.4860.4730.4670.5030.4790.4820.4580.471 960.4870.5110.4860.5270.5130.5180.4450.4800.4800.4910.457 1920.5080.5080.5050.5050.5090.4990.4690.5080.4950.5130.503 3360.4490.4850.4580.4860.4630.4840.4540.4600.4660.4700.449 Public Health 121.0501.0511.0591.0561.1541.0571.0581.0511.0541.0981.052 241.4881.3701.3531.3701.5311.4221.5801.3481.4071.3741.350 361.4221.5521.5791.5541.5041.4931.9241.5441.4871.5371.502 481.6211.5681.6011.6251.7381.5011.5041.5411.6221.6101.502 Security 6105.9110.1103.6102.9106.2105.7162.7105.9115.0104.6107.4 8109.1107.8109.8108.7110.3109.6142.9109.4110.1107.4109.6 10113.8112.9111.6111.4112.3114.9114.0111.6113.0110.0113.3 12117.2117.5115.0114.8114.2112.9115.8111.8115.2115.5114.4 Social Good 60.8790.8530.8080.8730.8600.8191.8250.8080.7710.8020.836 80.8850.8730.8970.8480.9050.8522.1210.8960.8550.8480.848 100.9140.9180.8920.9200.9820.9400.9840.8920.8960.9110.908 121.0150.9870.9830.9541.1141.0101.1080.9810.9871.0010.971 Traffic 60.1520.1630.1530.1600.1440.1490.3510.1540.1640.1500.153 80.1530.1600.1560.1690.1440.1580.2270.1560.1540.1560.159 100.1660.1750.1670.1680.1470.1610.3440.1680.1630.1610.161 120.2280.2330.2320.2400.2170.2130.3760.2320.2330.2210.228 47 Table 40: Comparison of multimodal fusion strategies with Autoformer (+ GPT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Autoformer Agriculture 60.0700.0720.0660.0750.0660.0661.2260.0690.0690.0620.068 80.0900.0860.0800.0860.0910.0780.4830.0940.0970.0770.093 100.1100.1170.1060.1070.1150.1120.3480.1110.1020.1130.094 120.1280.1440.1350.1390.1410.1340.4460.1310.1390.1260.129 Climate 61.2401.2921.2431.2221.1821.2061.3191.2281.2391.2061.204 81.2001.2241.2521.2031.2931.1761.1031.2961.1781.2211.106 101.2471.2751.2281.2111.2391.1901.0351.2561.2061.1681.197 121.1771.1811.2161.2471.2071.1920.9911.2191.2061.1941.133 Economy 60.0510.0770.0680.0670.0580.05811.7320.0690.0740.0670.050 80.0580.0690.0580.0580.0650.0659.9850.0580.0660.0610.074 100.0600.0660.0570.0490.0720.0491.1100.0570.0670.0550.055 120.0610.0740.0440.0590.0740.0561.0450.0560.0520.0510.045 Energy 120.1650.2080.1580.1580.1710.1450.3390.1400.1440.1660.171 240.2920.3000.3140.2960.3290.2960.7100.3120.3050.3290.291 360.3520.3830.3780.3990.4030.3871.3800.3660.3850.3700.374 480.4850.4680.4790.4520.5000.4760.7020.4460.4910.4560.464 Environment 480.5500.5070.5020.5580.5320.5150.4810.4980.5410.4920.485 960.5200.5670.5240.4910.5590.5110.5360.5180.5440.5080.496 1920.5890.5500.5470.5600.6140.5310.5050.5410.5550.5530.505 3360.5650.5250.5230.5360.5480.5080.4830.5040.5490.5350.502 Public Health 12 1.6681.6321.5131.5961.4251.6551.4491.6261.5221.6091.567 241.9402.1281.8471.9411.7731.9501.6391.8451.9581.9891.860 361.9311.9831.8781.8512.1051.9612.1631.9162.0082.0051.883 481.7672.1371.8141.8832.0481.9051.8012.0261.8871.9851.926 Security 6106.1107.9111.8107.9105.7111.6151.7113.4107.9107.1108.8 8110.7112.2109.0112.5111.0112.7140.4109.8111.4110.1112.8 10115.7115.2116.1115.1111.3110.4115.3114.6112.3111.6113.0 12117.8117.2112.1115.4116.0118.8117.3112.2113.8116.2112.5 Social Good 60.8910.9000.7930.8440.8680.8414.4170.8390.8840.8950.814 80.9290.9210.9520.9611.0641.0202.7500.9790.9310.9380.967 101.0521.0411.0880.9961.1201.0031.1251.0601.0591.0051.036 121.1941.1171.0621.0471.1751.0771.0951.1041.2241.1151.151 Traffic 60.1720.2070.1680.1830.1910.1660.3860.1820.1780.1730.186 80.2070.1970.2070.1930.2220.1670.8990.2160.1890.1950.198 100.1910.1890.2020.2050.1970.1880.3930.2030.1930.2010.201 120.2390.2800.2360.2490.2650.2320.3530.2340.2340.2260.232 48 Table 41: Comparison of multimodal fusion strategies with Crossformer (+ GPT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Crossformer Agriculture 60.0880.1360.2180.1520.1530.1250.2210.1140.1790.1300.176 80.1960.2190.2230.2060.8910.1760.3470.2480.1030.2070.224 100.2480.3180.2180.3850.2660.4130.4120.3000.2720.2760.265 120.3110.3660.2530.4110.6020.4100.4840.2420.3150.3640.323 Climate 6 1.0391.0181.0871.0911.0561.0611.0551.0801.0631.0611.089 81.0801.0701.0311.0971.1131.0831.0841.1111.0921.0921.120 101.0881.1171.0921.1011.0831.0751.1101.1071.0821.0981.087 121.0901.1181.0861.1341.1341.0801.0871.0911.0911.0881.117 Economy 6 0.4680.4440.2600.4090.3600.6710.4720.4790.1200.0890.149 80.1230.6120.2990.6416.3510.4520.9110.5540.0960.4030.488 100.1220.6830.3780.4630.5351.0440.7960.8190.3560.6540.270 120.3231.0840.7270.7730.6410.9660.6150.3390.5350.8510.379 Energy 120.1450.1300.1330.1480.1460.1380.1520.1390.1550.1300.143 240.2520.2500.2580.2350.3200.2260.2700.2560.2470.2520.247 360.3450.3340.3370.3360.3460.3180.3280.3220.3280.3340.337 480.4450.4550.4460.4410.4130.4400.3820.4250.4540.4500.435 Environment 480.4920.4440.4570.4980.4440.4450.4420.4500.4560.4620.439 960.5890.4610.4940.5200.4720.4730.5190.4730.4860.4930.475 1920.5510.4820.5730.5020.5260.5930.6140.5900.5470.5600.497 3360.5270.5530.5550.5090.4930.5880.5420.5260.5470.5650.503 Public Health 12 1.0891.1131.0531.1251.2231.0481.0951.1001.1501.1161.050 241.3051.3301.3431.3111.4271.4101.4081.4011.3241.3051.330 361.3211.2801.3971.4111.3811.3651.3821.3691.3671.3701.321 481.4401.5111.4511.4901.4481.4521.5551.5211.4861.4721.525 Security 6118.2121.5119.0120.6120.9122.2125.1119.9124.1118.9117.3 8121.3124.4125.2124.2130.9120.7126.1123.7123.8123.5120.5 10125.3126.1125.1125.9123.9124.5127.6126.7126.2125.3124.0 12123.3128.1126.9127.6129.9129.5129.3128.1127.5127.6125.9 Social Good 60.7280.7430.7490.7390.7320.7280.7470.7470.7300.7650.750 80.8160.8340.8000.8131.8740.8050.8120.8100.8190.8320.816 100.8720.9140.9150.8870.8590.8810.8720.8630.8650.8830.880 120.9360.9530.8970.9470.9160.8930.9490.9090.9130.9080.933 Traffic 60.2150.2100.2190.2130.2290.2140.2110.2130.2200.2100.220 80.2080.2150.2090.2100.3530.2210.2030.2160.2130.2360.232 100.2110.2260.2250.2260.2250.2250.2230.2230.2210.2160.222 120.2540.2500.2440.2450.3010.2420.2500.2500.2420.2500.249 49 Table 42: Comparison of multimodal fusion strategies with Reformer (+ GPT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Reformer Agriculture 60.2020.2310.1470.1520.2550.1740.3170.1500.2310.2200.163 80.2210.4110.3020.2390.2420.3690.4160.3040.1870.1910.233 100.3190.5250.4520.4160.5230.3140.7140.3280.2510.4570.246 120.5330.5140.3920.4960.6150.7210.3770.3160.2770.2120.241 Climate 61.0251.1021.0281.0911.0641.1141.0801.0381.0761.0111.082 81.0661.0891.0801.0691.0421.0991.0751.0401.0471.0851.087 101.0581.1071.1281.0681.0921.0860.8921.1501.0781.0811.078 121.0761.1221.0681.0361.1081.1031.0500.9811.0681.0641.063 Economy 60.1990.3900.5110.3860.6070.1110.7640.4120.5110.1130.482 80.3370.4600.2980.1720.5610.3051.0270.2860.6790.2100.254 100.1760.4630.6890.3310.9221.1351.7620.6710.5490.9250.558 120.1440.7511.1510.4891.1091.3471.1291.0971.1620.8020.394 Energy 120.2070.2170.2640.2470.2620.2780.2430.1740.1680.2520.218 240.4230.4090.4170.3880.5820.4080.3640.3250.3110.3970.356 360.5100.5060.4780.4060.4570.4710.4400.3790.4230.4160.463 480.5750.5790.5650.5390.8770.5280.4930.4980.4270.5420.539 Environment 480.4150.4340.4160.4240.4180.4220.4210.4160.4150.4200.397 960.4600.4280.4790.4370.4850.3990.4750.4320.4000.3920.413 1920.4300.4560.4330.4430.4150.4300.4420.4230.4030.4130.422 3360.4510.4110.4410.4380.4020.4310.4340.4080.4020.4090.429 Public Health 121.2001.0801.1191.1291.1441.1021.0891.1161.0401.2351.131 241.2581.2971.3121.3711.3751.5331.3741.3011.3301.3511.243 361.3211.2871.2891.4801.3651.2921.3321.2861.3161.3621.286 481.3631.4821.4631.3461.4391.5291.5161.4421.3991.4011.406 Security 6 118.3120.8123.9124.2118.1120.4122.5123.9124.9124.4123.8 8121.6119.8121.8124.0122.0125.3124.9121.8123.6124.1122.3 10126.2125.0124.6123.7129.0124.3126.8124.6124.9125.6125.4 12128.0120.8125.7126.5126.8124.0124.8125.7126.3125.1126.0 Social Good 60.7750.8430.8530.7930.8360.8700.7960.8800.8870.7730.822 80.8640.8660.9140.8600.9210.8980.8500.9100.8570.8890.868 100.9490.9621.0160.9121.0290.9710.9060.9890.9430.9090.970 121.0061.1380.9760.9531.0020.9880.9800.9740.9580.9610.985 Traffic 60.2080.2600.2260.3040.2320.2890.2450.2260.2870.2510.240 80.2240.2620.2240.2180.2330.2960.2340.2240.2310.2370.234 100.2610.2650.2750.2460.2460.2920.2240.2580.2190.2480.263 120.2440.2570.2640.2600.2530.2830.2490.2620.2670.2730.256 50 Table 43: Comparison of multimodal fusion strategies with TSMixer (+ GPT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA TSMixer Agriculture 60.0830.1350.1410.0890.1050.1810.0980.1390.1080.1140.108 80.1290.2350.2980.2360.4200.2760.1430.2020.2300.1820.122 100.4280.9341.3010.4860.2591.3250.2380.2381.5140.8620.253 120.6601.5931.3800.9781.6250.5880.5940.5901.8711.6540.280 Climate 6 1.0541.0671.0651.0801.0851.0931.1331.0591.0661.0661.070 81.0881.0941.0981.0801.1001.0861.0991.1071.0961.0801.095 101.1051.1151.2061.1461.1081.1151.1361.1171.1181.1241.113 121.1061.1931.1361.1761.1561.1051.1171.1491.1541.1361.106 Economy 6 0.0740.1150.1420.1270.1110.0990.0810.2060.1400.0800.050 80.0630.3270.5510.2860.9800.3170.1420.3210.5440.1490.044 100.1091.8532.2520.9290.2602.3390.1720.2031.3491.8550.155 120.8742.7402.8633.0372.8080.8631.0670.5042.8032.7580.208 Energy 120.1420.1340.1460.1540.1580.1440.1530.1290.1540.1420.115 240.2860.2880.2760.2950.2770.2450.2840.2750.2710.2590.264 360.3600.3470.3710.3690.3800.4080.3830.3490.4090.3400.360 480.4560.4560.4830.5810.4050.4850.4580.4560.5080.4690.445 Environment 480.5420.4400.4510.4590.4450.4490.4820.5260.4430.4370.502 960.6060.4600.4790.4830.4370.4460.4780.5940.4720.4690.556 1920.6600.5960.5050.4990.6080.5270.4880.5970.4850.4640.619 3360.5810.5920.5290.5340.5740.5440.4990.5820.5260.5020.558 Public Health 12 1.1500.9961.1061.1421.2231.1051.1451.1151.2181.0871.086 241.3931.5581.4021.4101.3501.3831.4901.3261.4991.4451.382 361.5001.7321.6541.4781.5001.5771.4771.4201.6141.5391.466 481.6871.6261.7361.6161.6181.6911.6011.5181.6711.6481.625 Security 6128.0128.4129.3129.5125.6125.5129.2124.0129.9128.2124.4 8127.4124.1128.6127.2126.8117.2132.8129.4127.0125.6126.4 10128.2131.4131.4132.0128.5130.8130.4133.5129.4130.2128.2 12130.7133.0133.1134.1132.8127.4128.9133.1131.7132.5131.3 Social Good 60.7690.8430.8720.8540.7810.7640.7300.7720.8260.8110.739 80.8760.9190.9740.9450.7920.9440.8660.8090.9460.9380.855 100.8760.9611.1360.9240.8830.9570.9400.9101.1991.0960.861 120.9421.5271.1921.0361.0440.9580.9160.9201.1861.1530.839 Traffic 60.2030.2330.2320.2070.2340.2230.2260.2300.2980.2340.190 80.2160.2580.2330.2510.2680.2480.2490.2790.2800.2270.185 100.2290.4180.5510.3310.2430.5480.2300.2500.5610.2860.207 120.3030.7990.6320.3980.7130.2890.2880.3240.7760.5780.266 51 Table 44: Comparison of multimodal fusion strategies with Transformer (+ GPT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Transformer Agriculture 60.1330.1330.1650.2230.1100.1300.2750.1740.1300.1520.146 80.1470.3260.1520.1970.2030.2010.3760.1960.1620.2070.191 100.1770.5050.2260.2660.1510.1760.7030.2340.1770.1850.158 120.2330.3730.2470.2460.1760.3190.3200.2470.2440.2070.356 Climate 60.9851.0860.9881.0430.9481.0311.0250.9881.0311.0770.982 81.0581.0560.9901.0491.0941.0931.0360.9901.0351.0411.037 101.0681.0931.0521.0421.0871.0051.0601.0521.0081.0731.085 121.1020.9791.0321.1171.1121.0841.0631.0321.0501.0641.026 Economy 60.2670.3940.0670.6360.0730.1190.7180.1450.0750.0770.101 80.3100.3750.2430.1870.2170.3830.5780.4210.4710.5840.245 100.2580.6780.2730.4940.0950.1021.3040.2780.4290.3120.112 120.3620.4910.4920.2720.0720.4510.6760.5420.3620.3300.086 Energy 120.1220.1230.1040.1300.1050.1430.1010.1010.0900.0760.134 240.2890.3490.2510.2800.2000.1610.1680.2060.2040.1440.201 360.2610.3500.3120.3530.3750.2330.3190.3060.2640.2830.342 480.4500.4800.4370.4630.4390.3890.5350.3920.4470.4190.432 Environment 480.4290.4410.4160.4320.4200.3990.4320.4120.4060.4140.419 960.4620.4700.4340.5180.4380.4450.4440.4070.4410.4380.425 1920.4500.5030.4730.4810.4620.4770.5110.4630.3970.4100.432 3360.4510.4870.4420.4450.4640.4550.4570.3990.4540.4090.419 Public Health 121.1241.0841.2141.2480.9651.1401.1631.0990.8401.1031.022 241.3701.2891.2191.4001.1391.2331.4771.2671.2971.4051.267 361.3171.3281.3841.3751.3071.2881.3611.3841.2831.3641.324 481.3891.3591.3181.5301.4271.3641.3511.3651.4441.3111.473 Security 6 124.0124.7126.2123.5122.9126.9124.6126.2127.6126.2128.2 8124.8128.8126.7129.8121.1128.5127.8125.9130.2128.9128.4 10129.7124.5130.5130.7126.4128.8129.3128.6131.4130.6131.4 12130.3127.0131.3128.6127.5133.0131.4131.3130.4130.5126.3 Social Good 60.7580.6870.7550.7370.7970.7590.7740.7790.7720.7330.728 80.7670.7770.8670.8030.8610.8460.7790.8760.8640.8390.792 100.8590.8870.9120.8520.8830.9240.8210.9210.8920.9140.870 120.8290.8920.9280.8620.9420.9570.8550.9320.9160.9290.934 Traffic 60.1670.1780.1570.1840.1600.1680.1540.1570.1550.1650.157 80.1590.1610.1780.1770.1890.1680.1730.1750.1620.1640.168 100.1810.2340.1920.1750.1940.2110.1910.1920.1830.1810.213 120.2140.2560.2110.2090.2420.2320.2050.2110.2210.2000.219 52 Table 45: Comparison of multimodal fusion strategies with Informer (+ GPT). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Informer Agriculture 60.2120.4590.1680.3060.2830.1390.3530.2520.4110.3290.264 80.3470.4110.4410.2650.2220.2580.3040.1900.3240.2140.416 100.4640.2260.4000.4850.6110.2690.4660.2730.3240.2850.320 120.5790.6340.4400.5670.2790.4220.5590.4980.5370.3080.457 Climate 6 1.0701.1401.1321.0971.0161.0961.0321.0561.0441.0371.040 81.0811.0721.1121.0920.9261.0861.0831.0731.0861.0571.055 101.0601.0831.1051.0820.9301.0641.0741.0801.0931.1071.110 121.0501.0691.0551.0750.9181.0671.0571.1241.0871.0761.111 Economy 6 0.4970.8940.3350.4350.4800.2150.4490.3290.4790.3350.879 80.5290.9450.5170.5720.3170.8540.6990.5350.4860.5600.585 100.7610.0710.6240.9221.1461.0640.6420.4130.7240.6621.112 121.1131.0560.6411.0250.8171.3100.6460.4830.7940.8200.279 Energy 120.1370.1970.1730.1890.2050.2220.1460.1870.1490.1570.156 240.3160.2870.2970.3190.2590.3230.3210.3020.3240.3500.337 360.4110.3670.4450.3930.3740.3680.3860.3640.4230.3920.370 480.4830.4920.4950.5610.4650.4380.4250.4440.4700.4150.483 Environment 480.4240.4490.4260.4430.4240.4450.4540.4280.4250.4430.420 960.4370.4910.4860.4640.4580.4680.4580.4060.4310.4360.417 1920.4600.4690.4340.4860.4440.5010.4300.4270.4490.4310.412 3360.4600.4920.4760.4830.4450.4600.4850.4880.4170.4500.459 Public Health 12 1.2361.0791.0141.0691.1861.0721.0521.0531.1760.9731.153 241.3731.2691.3871.3671.3541.2231.2131.4031.2501.2331.412 361.5401.4121.5721.4581.2301.4651.5021.5081.4481.5591.406 481.6431.6061.6641.6261.3331.4931.5901.5231.6601.6701.675 Security 6124.8127.3126.1127.4124.8122.7127.0126.3127.4128.4129.3 8126.7128.4127.6129.2128.0128.0126.5124.0125.1126.4125.0 10127.8130.2128.3127.4130.6130.0129.0129.1130.3131.5130.2 12131.1132.4131.1131.8128.4131.9129.6132.1130.4128.6129.6 Social Good 60.7260.6990.7460.7320.7600.7410.7310.7650.7310.7270.750 80.7830.7910.7820.7860.8690.8130.7970.8180.8010.8270.787 100.8250.8140.8660.8720.8190.8170.8390.8860.8600.8440.900 120.8410.8630.8310.8400.9660.9030.8560.9540.9200.9430.859 Traffic 60.1640.1750.1770.1540.1650.1600.1490.1730.1720.1520.164 80.1610.1950.1860.1660.1760.1670.1670.1760.1840.1700.179 100.1980.1850.2020.1900.2150.1830.1870.1760.1750.1760.186 120.2250.2270.2150.2210.2020.2140.2130.2210.2040.2070.215 53 K.3 Text Model: Llama3 Table 46: Comparison of multimodal fusion strategies with Nonstationary Transformer (+ LLAMA3). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Nonstationary Transformer Agriculture 60.0490.0500.0510.0540.0600.0551.4350.0520.0530.0510.054 80.0710.0720.0720.0700.0740.0710.5010.0700.0720.0720.069 100.0920.0960.0910.0940.0950.0950.3330.0920.0910.0900.090 120.1160.1180.1200.1180.1200.1190.3590.1160.1160.1180.115 Climate 6 1.2391.2971.2541.1951.0671.2421.1321.2431.2581.2520.998 81.2441.3141.2921.2391.1391.2101.1941.1411.2011.2091.297 101.2221.2911.2561.3151.2941.1921.0611.2491.2801.2541.243 121.2731.2991.2021.2401.2421.1791.0541.2451.2291.2071.214 Economy 60.0170.0170.0170.0190.0170.0169.6520.0170.0190.0160.018 80.0180.0190.0190.0190.0210.0186.2990.0180.0190.0170.020 100.0200.0210.0190.0200.0190.0200.8950.0210.0190.0200.019 120.0210.0210.0200.0240.0200.0220.8820.0200.0200.0200.021 Energy 120.1040.0960.0930.1060.0900.1090.1870.1030.1040.0830.092 240.1900.2440.1970.2090.1740.1990.6630.1970.2090.1920.206 360.2930.3080.2820.3030.2330.2921.2110.2810.3290.2890.244 480.4250.4510.4390.4110.3920.4350.7610.3970.3430.3120.417 Environment 480.4330.4340.4410.4320.4490.4380.4860.4360.4330.4290.430 960.4400.4660.4670.4520.4490.4330.4700.4430.4560.4490.450 1920.4340.4770.4420.4790.4490.4560.5110.4170.4570.4220.434 3360.4310.5020.4310.4500.4380.4490.4630.4440.4380.4380.435 Public Health 120.7810.8730.7821.0500.9270.9041.1040.8870.8170.7480.807 241.1651.1771.1041.3291.2991.1511.6471.2450.9131.0671.091 361.2141.3891.2071.2151.4001.3001.7501.3711.2441.1921.244 481.8291.9951.2941.4311.7961.3581.5021.4821.3921.3371.287 Security 6103.7101.7101.3108.6101.3106.1125.5101.6100.7102.3101.8 8105.2105.5111.3107.4105.7101.2125.4105.1105.0103.8104.4 10108.9107.9111.3107.9107.8107.1120.0107.8107.7107.2107.5 12109.1109.5110.7109.5109.6107.4119.9109.8109.5109.5109.1 Social Good 60.8280.7860.7580.7310.7570.7704.3420.7690.7530.7690.785 80.9000.9720.8510.8741.0350.8792.8180.8970.8930.8680.938 101.0310.9710.9900.9581.0991.0131.1090.9850.9771.0120.959 120.9831.0790.9330.9871.0831.1041.0731.0501.0551.0391.040 Traffic 60.1780.1830.1850.1780.1840.1700.4050.1810.1770.1700.179 80.1850.1900.1820.1880.1940.1920.3190.1870.1760.1830.185 100.1860.2040.1820.1930.1930.1810.2080.1960.1880.1840.185 120.2340.2460.2560.2610.2470.2530.2650.2530.2580.2460.246 54 Table 47: Comparison of multimodal fusion strategies with PatchTST (+ LLAMA3). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA PatchTST Agriculture 60.0600.0600.0590.0570.0600.0611.3630.0560.0580.0590.061 80.0770.0780.0800.0750.0780.0780.7190.0800.0800.0790.078 100.1000.1000.1020.1000.1000.1010.3040.1020.1020.1010.101 120.1300.1290.1310.1290.1290.1300.3920.1310.1310.1300.129 Climate 61.2571.2291.2351.2321.2291.2511.2201.2351.2411.2671.240 81.2481.2521.2211.2511.2521.2661.3171.2211.2571.2421.244 101.2731.2781.2571.2841.2781.2651.1091.2571.2501.2601.262 121.2411.2511.2501.2631.2511.2571.1001.2501.2631.2401.249 Economy 6 0.0170.0160.0170.0180.0160.01713.8090.0170.0160.0170.017 80.0170.0170.0170.0190.0170.0186.8100.0170.0170.0160.017 100.0170.0160.0170.0200.0160.0181.7520.0170.0180.0170.017 120.0180.0170.0180.0210.0170.0181.6640.0180.0180.0170.017 Energy 120.1090.1060.1110.1250.1060.1080.2600.1110.1120.1110.108 240.2110.2080.2170.2420.2080.2190.7300.2220.2200.2190.212 360.3070.3180.3040.3230.3180.3011.2140.3100.3100.2860.312 480.4160.4150.3930.4460.4150.4160.6870.3940.4040.4190.423 Environment 480.4620.4630.4580.5060.4630.4510.4890.4550.4540.4580.453 960.5080.5010.4610.4720.5010.4550.5140.4550.4560.4660.452 1920.5400.5390.4500.5030.5390.4470.5910.4580.4390.4650.436 3360.5110.5080.4410.5240.5080.4530.4820.4460.4410.4420.426 Public Health 120.9260.9470.9370.8830.9470.9660.9090.9491.0070.9450.890 241.4031.2831.4231.1911.2831.2981.4951.2641.4181.2871.312 361.6021.6141.6201.3911.6141.6282.0301.6211.6311.6191.662 481.7211.7431.7601.5471.7431.7521.6021.7611.7631.7401.785 Security 6 105.0104.3102.8105.8104.3102.6141.8103.0104.2103.6103.7 8107.8110.1105.7104.4110.1106.1141.6105.7106.4107.6107.1 10110.0111.5108.6113.1111.5109.2119.1109.0110.6109.8108.1 12111.1111.0110.7111.2111.0110.3118.9111.3111.5111.0113.1 Social Good 60.7900.7940.7950.7740.7940.7924.2790.7950.8080.7870.813 80.9140.9070.8910.8450.9070.8932.7960.8910.9120.8690.933 101.0221.0411.0120.9661.0411.0231.0831.0120.9661.0271.043 121.1191.0751.0521.0451.0951.0871.0571.0141.0991.1121.088 Traffic 60.1700.1730.1780.1910.1730.1810.4230.1770.1750.1690.177 80.1830.1800.1830.1890.1800.1900.2940.1840.1830.1890.186 100.1960.1950.1990.2020.1950.1960.2140.1960.1890.1840.190 120.2420.2430.2300.2490.2430.2530.2420.2420.2370.2290.245 55 Table 48: Comparison of multimodal fusion strategies with iTransformer (+ LLAMA3). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA iTransformer Agriculture 60.0600.0590.0600.0580.0640.0591.2860.0600.0600.0600.060 80.0770.0770.0800.0740.0780.0790.9610.0800.0790.0800.078 100.0960.1000.1020.0980.1010.1000.1870.1010.1020.1010.099 120.1290.1280.1310.1260.1310.1310.3090.1300.1300.1290.128 Climate 61.1161.1161.1131.1241.1311.1601.3211.1131.1171.1451.125 81.1791.1731.1971.2341.0781.1811.3341.1971.1791.1921.196 101.2331.2121.1971.2401.1741.1821.0691.1971.2151.1881.189 121.1811.1801.1831.2001.1921.1981.0551.1821.1851.1841.188 Economy 60.0150.0150.0150.0190.0150.01510.9380.0150.0150.0150.015 80.0150.0150.0150.0160.0150.0159.1760.0150.0150.0160.015 100.0160.0160.0160.0160.0150.0161.4730.0160.0150.0160.016 120.0160.0160.0160.0170.0160.0161.1250.0160.0160.0160.016 Energy 120.1120.1080.0990.1060.0960.1170.2480.1060.1080.1140.114 240.2320.2150.2070.2370.1770.2080.8370.2100.2250.2200.218 360.2860.3230.2980.3260.2730.2941.1810.2910.3020.2840.286 480.4060.3890.3760.4180.3840.4140.6710.3760.3790.4070.398 Environment 480.4070.4200.4150.4290.4040.4180.4950.4230.4210.4110.416 960.4170.4200.4310.4610.4090.4300.5470.4490.4270.4220.436 1920.4250.4400.4490.4540.4300.4290.4920.4430.4370.4420.445 3360.4210.4220.4320.4430.4210.4270.4230.4260.4270.4140.433 Public Health 12 0.9981.0050.8920.9270.9310.9880.9970.9490.9561.0140.991 241.4471.4581.4831.1671.4681.4471.4511.4791.5161.4481.448 361.6291.6821.6301.3331.6771.6431.7511.6281.6411.6241.659 481.7491.7591.7541.4811.7891.7471.6891.7541.7511.7431.777 Security 6104.9103.2102.3113.6103.6103.1141.9102.1104.6102.7104.4 8109.0107.6105.3113.8105.9106.0138.4105.1106.8107.7106.1 10111.8110.1107.8116.9110.2108.7121.6106.7110.4108.7111.4 12114.5110.1110.0114.7110.9110.4118.3109.9110.5110.8111.6 Social Good 60.8360.8300.8200.7840.8450.8384.2970.8200.8220.8220.827 80.9580.9670.9490.9290.9320.9402.8140.9490.9300.9050.964 101.0571.0601.0431.0531.0721.0471.1051.0431.0730.9931.066 121.1381.1651.1301.1641.0981.1271.1361.1091.1491.1571.099 Traffic 60.1770.1760.1810.1860.1730.1860.4440.1810.1730.1790.174 80.1840.1940.1960.1970.2010.1880.3760.1910.1790.1790.183 100.1960.1980.1870.1990.2030.1970.2120.1870.1800.1780.197 120.2450.2520.2460.2660.2510.2330.2430.2480.2400.2290.247 56 Table 49: Comparison of multimodal fusion strategies with Koopa (+ LLAMA3). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Koopa Agriculture 6 0.0580.0680.0650.0651.2821.8891.8890.0580.0690.0590.057 80.0770.1000.0920.0920.1410.0950.0950.0780.1210.0780.076 100.0990.0930.0910.0911.3480.5080.5080.1000.1350.0970.099 120.1280.1370.1330.1330.9811.7311.7310.1290.1340.1370.128 Climate 61.25014.0141.2771.27714.6581.2641.2641.2921.1041.2501.248 81.2901.4921.2811.2811.2071.1121.1121.2921.0741.2541.263 101.2471.4241.2771.2771.2151.0871.0871.2591.1121.2571.246 121.2361.2501.2751.2751.1781.3701.3701.2561.0811.2621.232 Economy 60.0170.0280.0290.02914.05012.12612.1260.0290.0380.0350.017 80.0180.0200.0210.0210.7710.3020.3020.0190.0950.0190.017 100.0180.0290.0340.0342.7122.1622.1620.0280.1020.0210.018 120.0180.0310.0250.02515.1618.5538.5530.0250.0530.0320.017 Energy 120.10710830.70.1240.124448.70.1470.1470.1180.1060.1160.107 240.2141093.20.2380.2382.7650.3860.3860.2340.2280.2210.204 360.2980.2810.3290.3290.6110.6110.6110.3150.3230.3100.296 480.4060.4970.4130.4132.1520.9040.9040.4240.4020.4340.405 Environment 480.4841961952000.00.4780.478405995840.00.4530.4530.4880.4400.4410.484 960.5282833084928.00.4620.462960262208.00.4660.4660.5430.4540.4690.533 1920.5676696846336.00.5420.54245984964.00.4990.4990.5720.5060.5390.564 3360.53113381508096.00.5330.533138449984.00.4950.4950.5340.5070.5290.527 Public Health 121.0455541.80.9610.961234.61.0261.0261.2391.0110.9641.027 241.3961.1581.2241.2241.8381.3071.3071.5171.7791.1771.400 361.6291.3191.3471.3472.2081.5281.5281.7981.4571.3061.618 481.7322.8001.4461.4461.7261.8031.8031.8691.7601.4141.714 Security 6104.7105.4102.3102.3135.7144.0144.0104.3111.3103.7104.7 8108.6108.7111.3111.3108.8107.4107.4113.9114.7110.5108.5 10109.1107.3107.9107.9117.8118.1118.1110.8117.9109.1108.6 12111.4110.3109.8109.8157.0144.3144.3114.5117.3108.3111.7 Social Good 60.8480.9230.8800.8804.1574.1214.1210.8520.8610.8420.848 80.9251.2070.9470.9470.9680.9310.9310.9460.8680.8950.924 100.9871.1111.0921.0921.3031.1581.1580.9850.8560.9400.992 121.0851.2031.0681.0683.1653.2053.2051.0750.9351.1271.054 Traffic 60.2240.2310.2180.2180.3550.4490.4490.2360.2010.2210.211 80.2110.2140.2090.2090.2030.2040.2040.2500.2160.2070.202 100.2160.4060.2230.2230.2310.2420.2420.2290.2190.2150.206 120.2790.3320.2730.2730.4060.3300.3300.2840.2680.2680.268 57 Table 50: Comparison of multimodal fusion strategies with DLinear (+ LLAMA3). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA DLinear Agriculture 60.1120.1320.1330.1330.8501.3521.1880.1360.2440.1360.067 80.3230.1870.1870.1871.3710.8841.3360.2100.3320.2280.081 100.1630.2080.2080.2080.4580.3280.2400.1730.1910.1980.133 120.2160.2460.2500.2460.4070.2890.3480.1620.5400.1920.173 Climate 6 1.3081.3071.2741.2311.3591.2951.3441.3321.1601.2151.075 81.2221.3451.3051.2651.3101.2771.2481.2681.0901.3231.108 101.1571.3561.3481.2711.1961.1321.1261.3111.0831.2901.112 121.2021.3421.3161.3031.1721.1241.1331.2061.1521.2581.096 Economy 60.0710.0860.0860.09013.4279.36311.5530.1070.5010.0980.027 80.2330.1200.1420.1419.7248.2409.1210.0960.4120.1500.032 100.0850.1190.1290.1301.8401.1760.8100.1330.2950.1360.047 120.1090.1290.1290.1291.5820.8940.6940.0670.6050.0910.066 Energy 120.1140.2030.2010.1640.2380.2280.1880.1170.2400.1800.107 240.2180.3120.3050.2800.8090.7890.6780.2460.3150.2580.199 360.3010.3440.3690.3491.2600.9360.7740.3140.3440.3650.282 480.4100.4270.4200.4050.7210.6760.6550.4440.4390.4300.386 Environment 480.4890.4490.4390.4390.5030.4760.4820.5120.4590.4460.494 960.5730.4660.4820.4790.5390.4990.4790.5680.4650.4630.572 1920.5910.5070.5020.5080.5420.5540.6160.5860.5760.5150.582 3360.5370.5150.4980.4950.5120.5020.4940.5370.5060.4970.537 Public Health 12 1.3371.4681.3191.2391.3411.3201.2901.6451.8571.2251.198 241.5771.5531.3751.3451.6241.5571.5681.6501.9971.4691.535 361.6221.7631.6271.4772.0871.9531.8871.8281.7751.6411.558 481.6941.7711.6871.6011.8301.8321.8201.7661.8791.6181.664 Security 6104.3106.1105.4105.0184.5169.1171.9106.1109.1104.6104.8 8109.2109.9107.6107.7150.5148.0147.8110.4117.9108.5108.9 10110.3112.9111.2110.5112.6112.1112.5110.9118.0110.4109.8 12112.0114.5113.3112.6113.4112.9113.4112.7119.9112.6111.6 Social Good 60.9990.8470.9200.8154.1912.6362.6420.8660.8720.8870.780 80.9090.9891.0971.0282.7342.4932.1290.9060.9571.0580.863 100.9601.2161.1241.1801.2061.0701.0441.0781.0301.1170.968 121.0801.1971.2551.2641.1771.0811.0391.1421.1121.1081.000 Traffic 60.2530.2750.2850.2820.5050.3460.5160.3080.2840.2770.222 80.3220.3350.3440.3400.4670.4190.4540.2680.3120.3390.221 100.2440.3280.3520.3330.3450.2870.2700.2610.3150.3590.219 120.2890.3500.3710.3600.3150.3090.3010.2930.2990.3170.261 58 Table 51: Comparison of multimodal fusion strategies with TiDE (+ LLAMA3). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA TiDE Agriculture 60.0740.0740.0800.0751.6990.0691.7470.0610.0650.0620.061 80.0940.0990.0900.0910.9010.1001.0040.0830.0810.0810.081 100.11318.7110.1130.13250.9300.1170.3490.1030.1030.1030.104 120.15623.7920.1420.14761.2860.1480.4320.1340.1350.1330.134 Climate 6 1.5921.5891.6021.6491.6971.4541.4751.2571.2441.2351.249 81.4681.5361.3821.5601.1331.5801.3111.2581.2781.2531.259 101.39840.9491.4031.5328.2411.5651.1791.2751.2701.2561.267 121.50011.3051.4071.4983.4361.4721.1861.2541.2741.2631.270 Economy 60.0370.0440.0370.03714.0800.04513.6000.0170.0170.0170.017 80.0390.0460.0360.0408.6100.0319.7780.0180.0170.0180.020 100.0303.3590.0320.04012.4260.0401.5120.0180.0170.0170.018 120.0334.5450.0310.0371.9180.0382.2900.0180.0180.0170.017 Energy 120.13610055.30.1400.1763650.80.1350.2910.1060.1080.1090.106 240.2410.4090.2360.3060.7470.2380.7470.2180.2010.2160.219 360.32826053.80.3320.39814910.80.3271.2920.2950.2960.2960.294 480.4430.4830.4390.4520.6840.4410.6880.3990.4060.4040.405 Environment 480.4850.7330.4850.4900.4860.4860.4770.4870.4850.4850.484 960.5430.6980.5400.4890.5630.5470.4930.5400.5420.5430.539 1920.57814630.40.5760.53726545.10.5770.6210.5750.5770.5760.577 3360.53340955.40.5340.51717652.00.5340.4890.5330.5350.5340.533 Public Health 121.2971640.61.3111.4271600.41.2861.2901.0871.1071.0851.124 241.5122.2321.5321.7421.5001.5381.6351.4241.4271.4561.466 361.68826231.01.6731.9119720.81.6772.0931.6191.6191.5991.625 481.8002.3211.7631.9821.9811.8041.8931.7321.7411.7001.719 Security 6 146.0125.6135.0125.4191.8122.1185.7104.8106.9107.0108.0 8129.6138.3132.5138.6150.1128.9155.2107.8109.4112.2108.5 10132.2151.2134.8135.0104.8135.2117.6113.0110.2113.4111.6 12144.9153.1139.3139.0106.1142.2120.3112.9112.5113.1112.2 Social Good 60.9260.9791.0021.0104.4070.9404.4100.8360.8350.8140.823 81.0981.1991.0321.2172.7481.0762.7700.9650.9520.9530.956 101.129389.21.1831.257215.81.2901.0671.1121.0631.0851.072 121.224536.61.2181.256155.41.2651.1611.1501.1431.1561.143 Traffic 6 0.2890.2980.2950.2750.5590.3010.4610.2370.2360.2360.233 80.3030.3400.2770.3100.4510.2720.3710.2310.2300.2330.223 100.25142.2140.2490.32114.6410.2960.2730.2240.2310.2250.228 120.31958.2620.3050.33349.6990.3290.2930.2880.2900.2840.289 59 Table 52: Comparison of multimodal fusion strategies with FiLM (+ LLAMA3). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA FiLM Agriculture 6 0.0600.0840.0680.0681.7400.0631.8540.0600.0610.0620.062 80.0820.1080.0910.0873.0020.0891.1760.0800.0820.0800.079 100.10622.4210.1040.1151.0260.1090.3210.1120.1040.1030.105 120.13820.3310.1410.1310.3070.1440.4180.1350.1360.1360.141 Climate 61.3541.2621.3061.3131.3131.2691.2861.2411.2881.2481.231 81.2831.4561.2861.2921.0391.3451.2261.2831.2961.2651.266 101.28414.6151.2941.2961.1681.3071.0921.2761.2931.2701.273 121.27611.7151.2841.3297.6561.2871.1061.2981.2931.2771.268 Economy 60.0170.0490.0190.03721.7260.01917.8040.0190.0160.0190.017 80.0180.0330.0210.02011.4140.02010.3340.0170.0180.0170.017 100.0190.0240.0200.0191.2690.0192.2520.0190.0180.0180.018 120.0183.1310.0190.0290.7170.0202.1280.0180.0180.0180.017 Energy 120.1228155.30.1310.1690.3260.1320.2170.1090.1110.1280.110 240.2301.4680.2370.3040.8220.2570.7560.2170.2220.2280.218 360.3120.3990.3570.4461.3150.3871.1950.3150.3180.3280.311 480.4280.4950.4600.5230.7430.5100.6900.4190.4320.4770.427 Environment 480.491949097920.00.4860.4750.5180.4700.5300.4870.4670.4850.488 960.547556371.00.4830.4790.6040.4910.5660.5460.5090.5090.544 1920.5742790298624.00.5180.5211.2430.5250.6420.5740.5500.5220.566 3360.5291012101632.00.5400.52481147368.00.4910.4880.5270.5030.5270.527 Public Health 121.1572.9651.4021.2962.0711.4011.0801.1271.1381.3111.107 241.6061.5251.7371.3542.2311.5851.5991.4531.4881.5731.453 361.6181.7741.8101.9082.3151.6981.8381.8941.7041.6711.644 481.7062.3291.7482.1371.7711.8921.6591.7361.7841.7121.735 Security 6 127.4110.9102.2101.1207.3103.0189.1102.8103.1104.6107.0 8113.8117.5105.7116.5136.6105.6161.8115.9104.4106.5105.1 10116.9136.1107.9133.4101.7108.0111.7109.4111.1108.9115.0 12119.1137.6110.1108.2103.3109.9112.9113.2112.3109.7113.8 Social Good 60.867136.50.8660.8604.1800.8734.3400.8280.8280.8490.814 81.0031.6840.9371.0442.6790.9512.8630.9910.9150.9560.960 101.139599.91.0421.222266.71.0411.1221.0241.0591.0191.116 121.230723.81.1151.148201.71.1201.1181.2231.1891.0661.166 Traffic 60.2280.2690.2270.2390.9610.2330.5140.2430.2380.2440.241 80.2230.2110.2120.2152.4860.2280.3970.2280.2220.2190.235 100.2160.2210.2160.2090.8200.2180.2560.2350.2210.2140.216 120.2533.2080.2490.2631.0600.2620.2460.2830.2600.2730.249 60 Table 53: Comparison of multimodal fusion strategies with FEDformer (+ LLAMA3). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA FEDformer Agriculture 60.0560.0600.0640.0610.0630.0610.9320.0650.0670.0560.057 80.0730.0780.0730.0940.0870.0730.9690.0720.0690.0820.070 100.0870.1070.1000.0920.1040.1020.3270.1070.0960.0910.088 120.1110.1310.1270.1430.1330.1210.4600.1270.1220.1340.113 Climate 6 1.1021.1411.1221.0571.0551.0691.2791.0521.0961.0831.108 81.0911.0471.0911.1231.0421.0231.1051.0921.0691.0631.034 101.0851.0871.1631.0721.0491.0860.9151.1561.1091.0851.073 121.1061.1041.1151.1011.0961.1510.9501.1141.0891.0891.088 Economy 6 0.0390.0490.0500.0360.0450.0459.2370.0500.0510.0330.049 80.0470.0400.0420.0560.0270.0517.0250.0420.0360.0470.050 100.0390.0450.0430.0410.0600.0321.4450.0430.0420.0360.045 120.0550.0590.0390.0620.0460.0331.3010.0390.0390.0320.048 Energy 120.1020.1330.1110.1410.1180.1180.2600.1110.1140.0980.094 240.1820.1990.2370.3030.2480.1870.7670.2270.2440.1900.181 360.2580.3510.3790.4030.3710.3051.3020.3050.3480.3350.254 480.3660.4400.3670.4460.4680.4580.7050.5020.4660.4660.409 Environment 480.4630.4930.4960.5010.5140.4890.4910.4910.5140.5010.481 960.4890.4760.4740.4980.4790.4780.4590.5070.4970.5040.480 1920.4930.5350.4930.4810.5270.5020.4780.4850.4990.4950.484 3360.4610.5040.4750.4830.4900.4720.4550.4800.4820.4890.481 Public Health 12 1.1421.0371.0591.1011.1751.0621.0551.1091.0401.0281.092 241.3791.4361.4221.4131.5451.4741.4011.4401.4861.4311.405 361.4381.6601.5581.5301.7101.4741.6391.4641.4911.5511.452 481.5721.6901.6991.6211.7701.6981.4961.7161.4471.6721.542 Security 6106.0110.1106.5113.3106.1104.8166.4106.5114.8101.2107.2 8111.6108.8107.9108.4110.4109.0141.3108.0111.1109.5109.8 10111.5112.8111.7107.6112.8117.3116.3111.7112.4107.8115.3 12117.8115.8115.4113.6114.3116.0118.1115.4112.6106.2114.4 Social Good 60.8830.8050.8370.8370.8730.8614.3820.7980.8430.8630.879 80.9050.8750.8740.9160.9110.8542.8110.8900.8630.8880.869 100.9540.9670.9030.8951.1420.9191.1400.9050.9440.9030.936 121.0730.9911.0251.0611.1610.9731.1471.0281.0260.9631.067 Traffic 60.1520.1580.1530.1510.1570.1510.3940.1560.1540.1590.149 80.1560.1630.1580.1620.1500.1570.3180.1580.1570.1550.158 100.1620.1760.1680.1640.1480.1650.1980.1680.1660.1580.165 120.2280.2350.2340.2390.2080.2240.2120.2300.2240.2200.224 61 Table 54: Comparison of multimodal fusion strategies with Autoformer (+ LLAMA3). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Autoformer Agriculture 60.0710.0760.0830.0940.0720.0730.8770.0760.0790.0710.063 80.0880.0860.0890.0910.0830.0901.3160.0950.1000.0830.084 100.1020.1130.1230.1210.1100.0990.2810.1240.1210.1090.101 120.1320.1370.1630.1510.1360.1440.2010.1630.1480.1450.134 Climate 61.2491.2821.2511.2951.1901.2291.3651.2501.2471.1991.204 81.1861.2391.3031.2171.2521.1711.2631.2331.1991.1791.185 101.1931.3071.2191.2801.2801.2320.9981.2351.2601.2371.193 121.1611.2221.2071.2461.2051.2421.0071.1741.2461.2311.135 Economy 60.0610.0720.0500.0590.0650.05010.5190.0500.0600.0540.044 80.0630.0820.0640.0540.0660.0738.1580.0640.0750.0570.064 100.0710.0670.0520.0640.0910.0521.6330.0480.0650.0520.039 120.0760.0650.0530.0570.0690.0531.2040.0490.0550.0510.053 Energy 120.1060.1240.1390.1480.1830.1360.3120.1550.1380.1470.144 240.2780.3220.2980.3180.3300.2740.8270.3130.3130.2900.290 360.3780.3910.3910.3980.3860.3731.2310.4020.3830.3870.377 480.4810.4840.4850.4710.4790.4870.7110.4940.4760.4920.468 Environment 480.5400.5230.4920.5070.5160.4990.4980.5100.5000.5070.463 960.5440.5450.4960.4970.5150.5130.5170.5090.5230.5050.473 1920.5950.5650.5350.5240.5710.5160.6040.5340.5370.5040.501 3360.5120.5480.5340.5260.5390.5420.5000.5390.5280.5410.515 Public Health 12 1.5831.4161.4441.5811.7681.7041.4451.4681.4481.6151.387 241.9192.0962.0221.8961.8411.8321.4511.8621.8101.8001.790 361.9202.0031.8391.9742.1211.9252.0191.9401.8042.0481.942 481.8852.0701.9251.9681.9971.8921.6212.0531.8552.0061.831 Security 6106.7107.0108.4105.8104.9109.7147.0108.4108.6111.0110.2 8111.0108.1110.5112.0110.6112.3139.0110.5112.9105.0110.1 10116.2116.5116.3113.4111.5110.8117.0116.3115.4113.0112.1 12117.2117.1115.4114.7115.4118.0118.0113.6115.2116.3112.8 Social Good 60.8610.9050.8180.8390.8290.8834.4040.8460.8680.8150.873 81.0011.0080.9820.9911.0641.0062.7420.9940.9730.9550.970 101.0181.1141.1211.0811.0461.0351.1321.1231.0731.0321.007 121.0591.2581.1421.1271.1601.1731.1481.1281.2141.0261.067 Traffic 60.1790.1910.1790.1770.1910.1840.3700.1850.1610.1760.190 80.1860.1830.1930.1780.2360.1710.2980.1950.2020.1840.204 100.1990.2050.1910.2060.2170.1880.2040.1910.1860.1950.188 120.2610.2530.2160.2530.2510.2200.2510.2300.2430.2200.264 62 Table 55: Comparison of multimodal fusion strategies with Crossformer (+ LLAMA3). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Crossformer Agriculture 60.1750.1540.1830.1150.1410.2450.1580.1800.1870.1180.153 80.2140.1950.2120.2220.6590.3050.1720.2450.2450.1420.263 100.3220.2600.2480.3920.2120.3340.3160.2530.3040.2730.350 120.3320.4140.3230.4200.4860.4550.5270.4140.2690.3900.323 Climate 6 1.0411.0911.1381.0421.0671.0491.0681.0471.0641.0201.045 81.0661.0811.0951.0811.0521.0861.0861.0801.0911.0681.058 101.0951.0901.0961.0941.1001.0841.1001.1031.0951.0911.095 121.0991.1091.0951.0811.1461.0751.1001.0881.1061.0631.109 Economy 6 0.3670.3860.2930.2900.2110.2980.5500.4070.4520.2080.234 80.4020.6900.4640.5545.6840.6620.9870.5180.1630.1510.199 100.4680.8070.4470.7510.3890.6600.7550.3580.2270.6750.492 120.3440.8430.9840.8410.4980.9710.8020.8570.7200.5210.189 Energy 120.1390.1390.1730.1370.1600.1450.1310.1370.1500.1530.144 240.2250.2260.2540.2760.2720.2770.2640.2550.2640.2580.250 360.3330.3750.3610.3250.3550.3480.3510.3080.3320.3400.326 480.4350.4440.3980.4100.4270.4170.3890.4280.4110.4380.432 Environment 480.4890.4530.4660.4730.4360.4420.4830.4570.4430.4680.448 960.5360.4460.4770.4830.4490.4570.4850.4550.4620.4670.463 1920.5500.5090.5520.5670.5610.5450.5370.5150.4910.5540.489 3360.5380.4910.5210.5100.5070.5480.5510.5260.5150.5810.502 Public Health 12 1.0701.1361.0711.0741.4651.1321.0861.0661.0501.0970.975 241.3361.4261.3061.3521.3521.3941.3791.3501.4001.3141.420 361.3901.4111.3561.3451.4091.4551.3321.3791.3291.3641.357 481.5091.4621.4511.4331.5051.4581.5691.4381.4521.4211.397 Security 6121.3123.3123.3122.3121.3119.0124.0123.3121.7121.3121.0 8120.7126.5124.5125.9132.4127.0126.8123.6123.3125.2122.1 10123.8128.0126.1125.9122.2128.8127.6125.6124.0126.3124.6 12127.3127.8128.6127.3129.4128.5129.3128.5126.3127.6127.2 Social Good 60.7540.7300.7410.7520.7660.7520.7440.7570.7320.7430.750 80.8100.8210.8230.8111.8950.8030.8330.8220.8410.8370.828 100.8980.9640.8840.8600.9330.9040.8700.8820.8740.8770.869 120.9500.9780.9190.9370.9670.9140.9470.9410.9380.9160.943 Traffic 60.2130.2150.2170.2130.2090.2170.2180.2200.2320.2150.216 80.2030.2140.2200.2230.3110.2230.2120.2090.2180.2130.218 100.2160.2130.2350.2140.2170.2000.2170.2240.2180.2260.222 120.2390.2400.2420.2610.2710.2340.2490.2530.2490.2410.249 63 Table 56: Comparison of multimodal fusion strategies with Reformer (+ LLAMA3). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Reformer Agriculture 60.1930.1880.2510.1740.2730.1540.3590.1850.2480.2600.193 80.2480.3380.3280.2800.3470.4150.2450.2730.1610.2140.266 100.1790.6740.2810.3140.5440.4130.7440.2590.2680.4930.419 120.4740.3330.4300.3580.6340.2830.2950.2440.5990.3050.241 Climate 61.0131.0131.0621.0431.0381.0081.0681.0261.0301.0001.071 81.0771.0761.0611.1061.0591.0840.9261.0371.0661.0441.079 101.0631.0891.1091.0791.0781.0780.9931.0981.0871.0931.098 121.0981.0671.0981.0461.1241.0891.0401.0991.1041.0861.101 Economy 60.3480.2260.2930.3890.6770.3310.9480.2920.3360.1900.264 80.2120.2070.4920.2120.4200.8361.0610.4930.4450.5750.289 100.5621.3760.6900.1201.9620.5961.6860.2340.3881.0850.043 120.7661.1630.6720.1801.6711.1250.7481.1131.1350.3640.969 Energy 120.2350.2150.2210.3030.2270.2510.2260.1500.1670.2220.217 240.4010.4240.4090.3810.4340.3950.3510.3510.2630.3030.353 360.4620.4850.4560.4590.5010.4750.4230.4930.4420.4120.408 480.4850.5640.5360.5250.7400.5090.5840.5380.4950.5320.507 Environment 480.4110.4240.4240.4280.4130.4220.4110.4070.4080.4110.409 960.4730.4440.4120.4340.4690.4180.4300.4050.3950.4370.412 1920.4580.4390.4200.4510.4030.4440.4300.4330.3950.4220.407 3360.4510.4230.4210.4330.4500.4330.4350.4130.4040.4190.421 Public Health 121.2441.1191.1851.1891.4201.0971.2031.0201.0621.0831.082 241.2331.4821.3371.3801.3301.3821.3081.3171.2401.2941.267 361.3661.6011.3941.5001.3501.4361.2101.2961.3221.3191.250 481.4271.5251.3641.3241.4161.5841.4981.3641.4631.5851.471 Security 6 124.6123.6123.6124.2118.5121.7122.1123.6124.1123.7123.4 8123.9118.0122.5124.2119.0123.0125.0122.5122.6124.2125.5 10127.2124.4124.6122.4130.0125.5127.1125.8123.8125.1125.2 12123.4126.1126.0126.5125.5125.1126.8126.0125.8126.0126.6 Social Good 60.8050.7870.7990.8550.8400.8350.8690.7950.8310.8180.824 80.8770.8420.8920.8890.8760.8780.9380.8890.8990.8490.889 100.9360.9600.9900.9561.0180.9180.9700.9900.9590.9380.935 121.0520.9551.0391.0041.0891.0330.9891.0120.9541.0390.988 Traffic 60.2080.2410.2470.2220.2350.2810.2570.2480.2430.2500.237 80.2360.2880.2690.2470.2170.2590.2170.2680.2340.2570.231 100.2530.2600.2570.2310.2380.2630.2360.2580.2570.2480.260 120.2480.2680.2850.2810.2580.3220.2440.2820.2750.2820.284 64 Table 57: Comparison of multimodal fusion strategies with Informer (+ LLAMA3). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Informer Agriculture 60.2640.3620.3270.3040.2610.2890.2700.2120.2160.2340.243 80.4500.3860.3610.3760.1710.4140.2690.2720.3880.1950.241 100.4020.4120.4400.3230.5740.2510.3390.2990.3430.3050.434 120.5070.6480.5200.5300.4710.6390.4940.5830.5740.4200.543 Climate 6 0.9941.0841.0681.0721.0351.1141.0271.0641.0801.0431.078 81.0580.9821.1051.0810.9231.0681.1121.0741.0931.0671.083 101.0841.1011.1001.0601.0421.0661.0891.0761.0481.0911.092 121.0681.0911.0831.1060.8891.0141.0961.0971.0711.1111.092 Economy 6 0.7070.8690.3960.5670.5440.6240.5830.5300.6420.6280.254 80.8150.8120.7560.5200.2750.2750.7810.2610.1430.4520.648 100.8070.8240.9990.7141.2281.1531.0490.5960.6190.6330.832 121.0931.4451.0311.1810.6660.1791.2050.9500.6161.4290.967 Energy 120.1460.1580.2180.1900.2610.1820.2080.1620.1610.1580.168 240.2730.2690.3200.3200.2560.2750.3270.3040.2630.2970.304 360.3730.3580.3570.4080.3890.3510.4250.3280.3510.3060.369 480.4990.4890.5220.4940.4100.4900.4290.3980.4020.4660.433 Environment 480.4300.4180.4690.4460.4430.4640.4740.4420.4380.4300.450 960.4210.4840.4490.4440.4440.4420.4390.4300.4400.4520.444 1920.4670.4890.4350.5080.4370.5090.4780.4980.4930.4650.459 3360.4570.4770.4700.4820.4270.4690.4870.5130.4860.4780.426 Public Health 12 1.1141.2701.2520.9821.5451.1001.0101.0031.1031.2871.141 241.2721.6211.3031.2001.3141.3441.2281.3821.2781.3691.302 361.4901.4651.4931.4541.3131.4141.4381.3141.5511.5521.517 481.6191.4031.7551.6591.3261.7261.5701.5991.7011.5921.606 Security 6122.5126.0122.3123.9125.0126.7122.7125.6122.0127.5124.8 8130.2129.3129.2126.2129.7127.0128.1128.9128.4126.8126.9 10130.4130.8129.6127.9130.5124.0129.0128.9129.4128.0130.1 12131.4132.0127.9130.8128.4130.0129.8131.1130.2132.1130.3 Social Good 60.7490.7460.7640.7280.7640.7950.7470.8120.7480.7890.751 80.7930.7760.8300.8620.8420.8380.8080.8450.8350.8580.810 100.8280.8900.8970.8740.8640.8820.9180.8860.8940.8950.872 120.9800.8880.9150.8661.0620.8930.9190.9130.8960.9460.948 Traffic 60.1720.1870.1650.1630.1540.1620.1440.1690.1610.1660.169 80.1710.1610.1620.1600.1620.1670.1580.1800.1890.1630.160 100.1870.1890.2170.2000.1750.1890.1850.1640.2040.1730.197 120.2080.2100.2180.2130.2020.2210.2040.2190.2100.2240.215 65 Table 58: Comparison of multimodal fusion strategies with TSMixer (+ LLAMA3). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA TSMixer Agriculture 60.1020.0860.1190.1040.1210.0940.1000.1120.1620.0980.085 80.1030.1500.1710.3300.2420.3160.1800.1030.2640.2380.105 100.3950.4860.8920.4690.2940.9370.4970.4810.7780.4220.229 120.6171.1091.2450.6061.3740.5620.4940.6741.3861.2060.337 Climate 61.0691.0601.0791.1381.0771.0881.1871.0631.0671.0621.067 81.0941.0861.0801.2321.0771.1561.1621.0891.0971.0821.083 101.1021.1581.1861.2041.1081.1271.0991.1471.1121.1321.101 121.1091.1241.1401.1741.1541.0941.1461.1441.1581.1401.106 Economy 60.0540.0620.0730.0930.1060.1430.1260.1530.1410.0940.063 80.0780.2680.2190.2760.8070.3480.1410.2270.2150.1000.068 100.2631.5091.8021.7750.2661.9920.3180.5931.0031.2290.085 120.9172.2401.9761.9312.4720.6721.1460.5902.8202.3740.173 Energy 120.1420.1380.1350.1590.1510.1420.1610.1510.1370.1410.148 240.2610.2870.2970.3290.2660.2360.3030.2500.2830.2870.268 360.3550.3300.4100.4790.3150.3600.3630.3490.3800.3310.372 480.4510.4120.4350.6950.4280.5830.4480.4160.4100.4440.460 Environment 480.5420.4320.4500.4540.4360.4420.4530.5200.4550.4370.509 960.6060.4560.4660.4850.4560.4570.4860.5810.4390.4710.559 1920.6380.5310.4720.5390.5990.5230.5110.5910.4480.4950.656 3360.5880.6520.4520.5430.5220.5380.5120.5820.4960.4840.580 Public Health 121.1551.1171.1731.3921.2961.2511.2091.1551.1781.1581.129 241.4281.6491.3881.4831.3471.4711.4501.3151.6671.3451.377 361.4791.7491.6131.5811.5011.5031.5491.4741.6931.5561.439 481.5991.5431.7771.6331.6071.6871.6731.6251.5841.7251.529 Security 6 128.3127.8127.8126.4125.9123.9130.0124.0127.8128.0128.0 8127.9125.8127.3128.6125.2122.6131.8129.1126.3127.1127.0 10129.3129.1132.8131.0126.8132.4129.3129.1131.4131.9128.8 12131.8133.6133.3132.3133.2127.9131.3133.9134.5132.4131.6 Social Good 60.7490.7950.8610.7790.7320.7311.0430.8420.8350.7920.741 80.8270.8750.8860.9030.8830.8500.8450.8430.9450.8100.824 100.9060.9210.8781.0571.0040.9210.9420.9591.1410.8780.873 121.0911.0411.0140.9751.0531.0160.9790.9521.2650.9240.847 Traffic 60.2030.2190.2220.2390.2210.2160.2240.2590.2550.2520.188 80.2160.2600.2660.2530.2460.2510.2200.2590.3070.2390.195 100.2280.3310.5700.2910.2470.5320.2370.2600.2900.3440.207 120.2930.6400.6920.3120.7130.2810.2910.3520.7790.4810.249 66 Table 59: Comparison of multimodal fusion strategies with Transformer (+ LLAMA3). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Transformer Agriculture 60.1280.0900.1270.1790.1320.2110.2670.1440.1700.1810.131 80.1730.2300.2780.1370.1570.2660.2850.2770.1930.1810.148 100.1820.4180.2790.2240.1300.2340.5710.2910.2050.3250.214 120.2380.2930.2180.4370.1580.1980.3820.2190.2210.5140.272 Climate 61.0321.0371.0801.0010.9551.0810.9801.0801.0091.0181.047 81.0591.0641.0801.0631.0311.0401.0121.0791.0391.0340.982 101.0661.0411.0641.0711.0981.0830.9941.0641.0621.0301.071 121.0351.1031.0811.0711.0981.0621.0421.0821.1021.0671.081 Economy 6 0.4010.2830.0590.3380.1400.2260.5920.0610.2020.1450.127 80.0760.4130.4030.1030.0610.2520.8820.0870.0610.0500.163 100.0750.3570.2520.7370.0700.1601.3170.1230.5680.2780.228 120.3190.1430.4320.1960.1400.2740.6700.3030.4160.1160.043 Energy 120.1450.1080.0800.1420.1100.1480.1120.0860.1140.0910.100 240.2590.2990.2610.2440.2450.1860.2860.1980.2210.1800.195 360.4030.3570.3140.3710.3510.2920.3230.2710.3520.2380.333 480.4790.3970.4040.4270.3780.4010.5480.3890.4630.3230.421 Environment 480.4440.4170.4140.4200.4250.4300.4300.4140.4350.4100.404 960.4720.4440.3930.4850.4360.4380.4340.4100.4080.4270.402 1920.4630.5410.4710.4990.4690.4110.5350.4070.4360.4710.409 3360.4580.5020.4700.4300.4600.4540.4720.4600.4210.4440.424 Public Health 12 0.9041.0541.0691.1210.9991.0171.0871.1121.0401.1021.022 241.2811.2611.2901.4031.1921.2641.3971.2011.1821.3101.296 361.3941.4081.3591.2991.2811.2011.2111.4431.3171.2841.346 481.3381.4701.4401.4321.6001.5291.4481.4381.3961.4341.325 Security 6123.7127.0127.5124.6122.5126.7122.6127.5126.7127.0127.8 8126.6129.5129.7129.2123.8128.0126.6129.5125.8129.1127.3 10129.6131.3129.8129.9124.6130.5129.5131.5129.3130.6131.7 12127.2126.7132.7130.4127.3129.7130.6131.9130.8130.2130.7 Social Good 60.7430.7570.7660.7710.8260.7530.8060.7660.7520.7480.788 80.8050.8500.8750.8110.9150.8210.8100.8590.8600.7750.818 100.9000.8180.9340.9090.9330.8160.9080.9070.8590.9160.909 120.9220.8840.9460.9670.9050.9510.8780.9510.9460.9420.882 Traffic 60.1760.1830.1720.1470.1720.1640.1580.1720.1610.1510.152 80.1860.1640.1840.2190.1820.1590.1620.1840.1620.1740.168 100.1750.2040.1910.1770.2050.1820.1790.1910.2060.1720.181 120.2120.2520.2330.2190.2400.2240.2070.2330.2140.2030.233 67 K.4 Text Model: Doc2Vec Table 60: Comparison of multimodal fusion strategies with Nonstationary Transformer (+ Doc2Vec). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Nonstationary Transformer Agriculture 60.0510.0510.0520.0520.0560.0534.0780.0520.0560.0540.051 80.0680.0670.0700.0680.0680.0712.4670.0680.0720.0720.072 100.0920.0900.0910.0850.0990.0920.6280.0920.0890.0890.093 120.1200.1170.1160.1090.1180.1180.7220.1160.1190.1180.118 Climate 61.2411.2911.1851.3091.1621.2361.2371.3091.1751.2931.089 81.3001.3231.1221.2561.2941.2811.0731.2121.2981.1671.270 101.2111.2441.2441.3011.2001.1891.1271.2191.2381.1891.221 121.2191.2461.2321.2321.2631.1531.0781.2551.2121.2201.149 Economy 60.0170.0160.0170.0180.0170.01613.7660.0170.0170.0160.018 80.0180.0180.0180.0190.0200.0226.9840.0180.0180.0210.018 100.0190.0210.0200.0220.0180.0211.1250.0190.0200.0200.019 120.0210.0220.0210.0260.0220.0211.0400.0200.0200.0210.020 Energy 12 0.1070.1170.1040.1140.0950.1110.1880.1010.1080.1110.115 240.1980.2340.1860.2290.1970.2200.6270.2330.1980.1960.203 360.2800.3270.2980.3010.3700.3250.9740.3240.3640.3380.328 480.3680.4510.4220.3660.4430.4550.7120.3850.4010.3580.371 Environment 480.4190.4240.4370.4670.4780.4290.4640.4390.4410.4410.416 960.4440.4880.4970.4710.4620.4530.4340.4620.4580.4600.465 1920.4440.4860.4660.5090.4460.4810.4640.4360.4370.4590.444 3360.4370.4580.4310.5000.4410.4490.5140.4320.4350.4430.453 Public Health 120.7450.8870.8870.8970.7270.8901.0620.9690.8430.7840.858 241.1371.1621.1031.3121.1531.1181.9791.1791.0651.2161.162 361.4181.3601.2461.2301.2801.3131.5351.3261.4671.5171.366 481.3641.6551.4121.3021.6331.3311.6051.5531.3001.4821.530 Security 6103.099.768105.2101.7101.3101.6135.7101.7100.6101.7100.3 8105.3105.7108.7105.6105.6104.9126.3104.3105.7104.2104.2 10110.2108.3109.6110.5107.6107.6120.0106.7107.3107.4108.1 12110.8109.8111.3108.1110.0109.1113.4109.5109.7109.5109.5 Social Good 6 0.8030.7980.8350.8360.7840.7614.2090.7720.7780.8870.762 80.9400.9210.8590.9730.9250.8712.6710.9180.9550.8780.868 101.0241.0921.0390.9341.0040.9931.2510.9770.9461.0280.939 121.0491.0621.0911.0401.1041.1220.9761.1071.1351.0861.048 Traffic 60.1790.1770.1750.1710.1840.1680.6150.1800.1700.1780.179 80.1870.1830.1840.1980.2010.1810.5230.1850.1950.1810.186 100.1890.1900.1910.1840.1990.1840.2200.1930.1850.1900.182 120.2510.2480.2520.2530.2320.2380.2510.2360.2460.2400.247 68 Table 61: Comparison of multimodal fusion strategies with iTransformer (+ Doc2Vec). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA iTransformer Agriculture 60.0590.0580.0600.0550.0610.0604.7160.0600.0600.0590.060 80.0770.0760.0770.0710.0760.0782.3970.0770.0760.0760.077 100.0980.0980.0950.0880.0990.1000.9060.0940.0970.0980.099 120.1270.1270.1290.1160.1300.1270.6070.1290.1300.1260.129 Climate 61.1301.1401.1191.1441.1511.1141.2381.1151.1481.1491.135 81.1931.2181.2021.1631.1731.1731.1451.2041.2031.1921.179 101.2311.1831.2091.1951.1511.2031.0811.2071.2041.2031.208 121.1841.1871.1881.1941.1791.1651.0451.1861.2191.1801.184 Economy 6 0.0150.0150.0150.0220.0150.01512.7230.0150.0150.0150.015 80.0150.0150.0150.0160.0150.0158.1230.0150.0150.0150.016 100.0160.0150.0160.0170.0150.0161.5760.0160.0150.0150.016 120.0160.0160.0160.0180.0160.0160.9050.0160.0160.0160.016 Energy 120.1160.1190.1240.0960.1120.1130.2060.1220.1210.1240.131 240.2300.2300.2320.2140.2240.2500.8340.2340.2260.2410.243 360.3150.3090.3150.3140.3050.3331.3120.3300.3200.3180.328 480.3990.4280.4250.4060.4950.4240.6930.4340.4360.4200.422 Environment 480.4130.4270.4160.5120.4250.4180.4900.4220.4210.4220.421 960.4170.4250.4220.4700.4150.4170.6940.4290.4230.4310.419 1920.4320.4420.4460.4940.4290.4250.6270.4260.4180.4190.436 3360.4220.4230.4240.4620.4190.4230.9850.4310.4180.4160.427 Public Health 120.9761.0111.0710.9390.9960.9901.1361.0610.9501.0170.941 241.4461.4781.4511.2421.4991.4381.6361.4451.4731.4481.441 361.6291.6551.6601.3791.6691.6202.0371.6591.6501.6331.638 481.7411.7471.7571.5181.8091.7801.7411.7561.7681.7591.758 Security 6 104.7104.2103.1110.6103.4102.2139.8102.6104.2102.5102.7 8108.8105.8105.2112.8108.1106.1141.6105.2106.5105.5109.2 10111.9109.8108.1114.6110.1110.7122.7108.7110.3109.3108.2 12114.3112.6111.5117.2111.5109.7121.3110.7111.5109.2111.5 Social Good 60.8210.8310.8300.7750.8520.8564.1740.8310.8380.8310.835 80.9600.9480.9540.9430.9370.9632.7620.9530.9410.9610.955 101.0561.0641.0691.0011.0541.0581.0841.0681.0631.0501.061 121.1501.1451.1531.0681.1021.1341.0511.1531.1491.1331.130 Traffic 60.1820.1820.1760.1880.1840.1720.4580.1720.1680.1640.173 80.1860.1920.1830.1950.1870.1880.7890.1840.1790.1880.183 100.1950.1910.1810.1960.1900.1810.3040.1810.1840.1840.190 120.2540.2340.2280.2560.2500.2510.2760.2260.2250.2260.242 69 Table 62: Comparison of multimodal fusion strategies with Koopa (+ Doc2Vec). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Koopa Agriculture 6 0.0560.0560.0560.0562.9673.3762.3460.0590.0620.0540.056 80.0780.0900.0710.0700.1770.3310.2970.0760.0830.0760.075 100.1000.1300.0900.0902.2050.7380.7140.1010.1010.0910.100 120.1290.1330.1160.1155.8422.9132.5810.1270.1460.1160.129 Climate 61.2491.7831.2631.2631.8341.2161.2151.2541.0171.2591.243 81.2721.2551.2771.2791.1431.0911.0881.2941.0251.2401.262 101.2491.2401.2461.2511.1951.0541.0541.2451.0461.2271.244 121.2421.2441.2371.2381.2111.2211.2221.2741.0571.2381.231 Economy 60.0170.0280.0320.03314.67314.90113.7980.0240.0480.0200.018 80.0170.0200.0240.0231.0011.0641.0450.0270.0290.0180.017 100.0180.0290.0260.0254.9572.4232.4430.0270.0360.0190.017 120.0180.0400.0250.02114.38513.30810.9770.0200.0470.0230.018 Energy 120.1087.9630.1080.1081.5900.2030.2100.1140.1150.1080.105 240.2100.3190.2200.2234.4870.4460.4750.2410.2200.2150.220 360.3060.2890.2970.2970.7190.5720.6730.3280.3140.2920.284 480.40434.5760.4230.4161.3050.9480.9540.4290.4010.3840.408 Environment 480.48095645.10.5751.23424986.40.7220.8020.4900.5870.9210.482 960.53358063.81.1020.86537111.50.9800.9340.5350.7570.7310.524 1920.568347050.20.7130.7363030.70.6490.6470.5731.2510.8220.567 3360.53081048.20.7130.57718262.91.0210.9130.5290.5470.6130.529 Public Health 121.0357.6980.9550.9613.4281.0231.0181.1981.3130.9410.952 241.3961.1631.1661.1971.8591.2351.2341.5301.3911.1541.412 361.6241.3001.2921.2872.1631.6241.6861.8061.3941.2561.629 481.7331.4061.3841.3902.3161.9432.0201.8771.4421.3721.724 Security 6 104.697.318103.5103.5134.4138.9139.8104.2107.2103.4105.1 8109.4116.9111.1111.0111.0112.1112.1105.7110.2111.0106.6 10108.9107.9108.8108.8117.8118.4118.3110.1113.2109.1108.5 12111.0102.5110.6110.6139.1137.6137.4115.1114.9110.8111.2 Social Good 60.8520.9550.8000.8003.4093.5063.5060.8480.7870.8690.838 80.9221.2550.8930.8920.9921.0001.0260.9260.7970.8790.921 100.9911.1370.9640.9661.1021.2311.1490.9710.9120.8860.992 121.0831.0110.9940.9943.0202.9672.9471.0710.8980.9731.080 Traffic 60.2310.2940.2200.2210.2950.5060.4790.2150.2180.2150.202 80.2120.2160.2140.2140.2050.2190.2210.2260.2110.2050.204 100.2190.2210.2210.2200.3140.3000.3330.2580.2000.2100.209 120.2740.3210.2790.2791.0130.5150.6130.2740.2630.2780.262 70 Table 63: Comparison of multimodal fusion strategies with DLinear (+ Doc2Vec). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA DLinear Agriculture 6 0.0940.1290.1320.1201.9112.8612.0800.0750.1420.0960.070 80.3240.1670.1840.1853.3093.9142.6660.2060.1740.2140.088 100.1640.1950.2050.2030.3120.2800.2760.1510.1470.1910.135 120.2170.2400.2400.2400.3810.2880.2890.1620.4060.1780.164 Climate 61.1401.2631.2091.1981.2131.2181.2271.2711.0541.0871.087 81.2951.2831.2901.2731.1961.0741.1001.2771.0741.1081.112 101.1551.3001.2841.2681.1411.1181.1121.3301.1121.1671.109 121.1751.2391.2131.1821.1151.1111.1071.1911.1001.1371.096 Economy 60.0950.0870.0970.09713.03912.55712.8460.1070.0970.0460.023 80.2340.1020.1420.14410.6139.27010.7970.1300.1050.1500.038 100.0860.1150.1320.1300.4930.4680.4820.1400.0380.1400.045 120.1090.1290.1310.1291.1490.4330.3960.0620.2560.0920.054 Energy 12 0.1160.1410.1440.1320.1500.1420.1430.1260.1130.1230.104 240.2220.2550.2400.2520.4270.2950.2890.2410.2250.2100.204 360.3000.3320.3090.3200.7580.3320.3410.3150.3340.3000.293 480.4120.4820.4720.4650.6410.4470.4490.4260.3890.3980.407 Environment 480.4910.9401.1130.7770.7730.5920.5770.4940.8331.2900.492 960.5680.7920.6940.5771.1580.8190.7640.5901.0610.9780.568 1920.5920.6821.3140.7300.8860.7820.6490.6081.2520.9000.592 3360.5380.6000.5230.5330.8280.5550.7440.5540.6460.7840.537 Public Health 121.3571.3361.4281.3621.4381.5731.6181.5811.3501.3491.249 241.5931.5821.6711.6201.7071.7751.7201.6241.4831.5461.500 361.6361.6471.5311.5081.6641.6701.6421.6911.5961.6171.595 481.6831.7081.6271.5911.8641.7451.7531.7341.6861.6781.672 Security 6 104.3104.5104.7104.6167.4165.1165.5105.6104.4104.6104.8 8109.2108.1107.6107.8150.9146.2145.5110.3106.5108.4108.9 10110.5111.6109.7109.9111.4108.4111.0110.5110.9110.0109.8 12111.8112.2111.8112.1113.3112.8113.2113.4111.3111.5111.6 Social Good 60.8470.8140.8280.8183.0131.5491.5760.9040.8210.7860.785 80.8780.9120.8970.8972.1011.8041.9260.8890.8620.8630.869 100.9490.9831.0221.0010.9440.9580.9360.9770.9820.9720.958 121.0151.0001.0370.9971.0011.0120.9981.0151.0041.0181.002 Traffic 60.2520.2620.2990.2610.4720.5070.4610.2960.3450.2350.224 80.3280.2910.3250.3010.5490.8670.6180.2680.2920.3050.223 100.2380.2670.3150.2890.3490.2700.2630.2270.2590.2680.219 120.2880.3080.3340.3160.3650.4070.2770.2810.3090.3060.262 71 Table 64: Comparison of multimodal fusion strategies with PatchTST (+ Doc2Vec). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA PatchTST Agriculture 60.0600.0600.0580.0590.0600.0592.6630.0600.0580.0610.057 80.0800.0800.0780.0750.0800.0792.3200.0780.0770.0790.079 100.0990.1000.1010.0930.1000.1010.5020.0990.1010.1010.099 120.1290.1290.1290.1190.1290.1300.5890.1290.1280.1300.126 Climate 61.2551.2371.2381.2381.2371.2381.2831.2381.2051.2511.234 81.2611.2481.2271.2491.2481.2581.1431.2261.2521.2551.243 101.2801.2631.2611.2631.2631.2421.1041.2601.2461.2771.267 121.2381.2451.2511.2531.2451.2541.0901.2511.2601.2461.248 Economy 60.0160.0160.0170.0170.0160.01717.7500.0170.0160.0170.017 80.0170.0170.0170.0200.0170.0179.5220.0180.0160.0170.017 100.0170.0170.0160.0190.0170.0171.5900.0160.0170.0180.018 120.0180.0170.0170.0200.0170.0181.4350.0170.0180.0170.017 Energy 120.1090.1110.1140.1100.1110.1130.2220.1170.1120.1090.116 240.2210.2110.2220.2250.2110.2160.6060.2320.2310.2280.231 360.3160.3130.3260.3180.3130.3291.4460.3260.3300.3250.314 480.4010.4060.4440.4210.4060.4260.5750.4450.4390.4410.429 Environment 480.4650.4630.5180.5000.4630.4700.5580.5000.4620.4620.462 960.4940.4970.5010.7950.4970.4930.6390.5220.5070.5090.522 1920.5430.5450.5170.6030.5450.5410.8370.5360.5470.5300.562 3360.5030.5150.5140.5600.5150.5300.8440.5150.5100.4930.525 Public Health 12 0.9250.8760.9280.8380.8760.9490.9130.9491.0340.9060.896 241.3861.3591.3901.2671.3591.2651.6221.3861.3811.3901.425 361.6121.6001.6301.3891.6001.5932.2231.6371.6141.6231.612 481.7491.7551.7371.5671.7551.7361.7471.7321.7611.7591.771 Security 6103.8104.7102.8105.7104.7103.3145.1103.3104.0104.2104.6 8107.6107.2105.9110.7107.2106.4135.5106.5106.7107.1107.6 10111.5112.0108.2109.2112.0108.4120.1108.4110.6107.9111.2 12110.9111.5110.8111.7111.5111.1120.7110.8113.3111.8112.1 Social Good 60.8060.7940.7950.8180.7940.7963.8280.7940.7980.7940.795 80.9050.9000.8850.8650.9000.9072.3970.8840.9040.9010.890 101.0081.0281.0461.0171.0281.0501.0621.0001.0231.0321.039 121.1311.0801.0900.9701.0801.0761.0531.0911.0641.1071.083 Traffic 60.1770.1740.1800.1770.1740.1790.4810.1690.1860.1720.186 80.1820.1860.1790.1890.1860.1810.2990.1850.1850.1790.181 100.1930.1850.1870.1980.1850.1920.2140.1870.1830.1830.194 120.2370.2490.2340.2430.2490.2470.3210.2360.2460.2250.239 72 Table 65: Comparison of multimodal fusion strategies with FiLM (+ Doc2Vec). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA FiLM Agriculture 60.0600.0880.0610.0597.1380.0622.9620.0630.0620.0610.061 80.0820.3110.0820.0765.5130.0841.7290.0800.0790.0790.079 100.1040.1680.1040.1000.6720.1040.6840.1040.1050.1020.103 120.1371.7250.1330.1240.3170.1370.5510.1340.1360.1350.135 Climate 6 1.3541.2541.2681.2561.5371.2721.2841.2501.2441.2341.300 81.2791.2711.2781.2661.0551.2811.1281.2951.2731.2771.266 101.2822.8651.2961.2781.6661.2901.0951.2691.2851.2701.271 121.2702.3921.2791.2851.6871.2861.1011.2691.2671.2781.266 Economy 60.0180.0420.0210.04123.6120.01821.3750.0180.0190.0170.018 80.0190.1090.0180.01911.2060.01911.5010.0170.0180.0170.017 100.0190.0830.0180.0220.1370.0201.4740.0180.0180.0170.018 120.0190.5080.0190.0400.0360.0201.6320.0180.0180.0180.018 Energy 120.11613.4300.1140.1280.3530.1500.2800.1030.1050.1040.104 240.2210.3260.2390.2170.6730.2460.8100.2090.2260.2310.217 360.32044.5380.3260.3211.5900.3481.4010.3000.3030.3290.313 480.4450.6230.4370.4280.6800.4590.6480.4320.4210.4230.425 Environment 480.49165958.30.4860.6380.6520.5330.5660.4900.5160.4930.488 960.545247802.20.5451.3120.6950.6350.7750.5490.5680.5460.542 1920.572284870.70.5760.8550.6020.5760.7160.5700.6010.5730.570 3360.5307.0760.5640.631692.90.5320.7710.5290.5370.5310.528 Public Health 121.15930.0681.1760.9831.9501.3051.1311.1241.1451.1311.129 241.4911.4281.5101.3382.0241.5671.7451.4311.4851.5051.458 361.6121.8061.6911.6952.2421.6602.0871.6611.6631.6521.669 481.7052.0231.7721.8661.9871.7601.8731.7161.7681.7181.738 Security 6 127.397.900102.1101.1214.1103.2186.0102.8102.1104.4107.0 8113.8116.9107.9116.6148.7107.0160.8112.1104.1106.8105.1 10117.0132.9109.9133.699.472108.9109.8109.5108.8110.2115.2 12119.0106.4109.9108.8104.0112.0111.0120.5109.5112.0112.8 Social Good 60.9211.5960.8340.8284.1190.8713.6220.8290.8470.8310.820 81.05312.3300.9881.0502.4161.0102.5891.0140.9340.9630.963 101.1061.5471.1221.1985.7961.0921.0951.0141.1131.0821.107 121.2078.1331.1101.1043.5601.1931.0881.2271.2011.1381.164 Traffic 6 0.2280.2760.2380.2491.0730.2360.4400.2400.2340.2310.242 80.2230.2240.2200.2210.3190.2300.4690.2320.2130.2210.234 100.2160.2260.2170.2160.4130.2190.4070.2380.2140.2170.215 120.2535.0920.2590.2670.6870.2640.3860.2830.2440.2740.250 73 Table 66: Comparison of multimodal fusion strategies with TiDE (+ Doc2Vec). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA TiDE Agriculture 60.0720.0700.0860.0703.2040.0693.6750.0600.0630.0610.060 80.0870.0840.0900.0931.5320.0911.5820.0830.0810.0820.081 100.1120.2660.1120.1140.1680.1210.4610.1030.1040.1030.103 120.1510.4960.1460.1311.2620.1501.0140.1350.1340.1330.134 Climate 61.4431.5331.5531.5281.5731.4331.4091.2591.2351.2381.261 81.5061.4481.3841.4611.1111.5461.1381.2621.2781.2501.366 101.3971.5581.4021.4231.4171.5461.1771.2791.2701.2581.254 121.4681.5181.4221.3901.2511.4721.1321.2551.2741.2621.269 Economy 6 0.0330.0460.0390.04820.4580.05021.6350.0190.0170.0160.018 80.0390.0460.0380.03911.7990.03111.9150.0180.0180.0170.017 100.0290.0500.0320.0380.1480.0401.6790.0180.0170.0180.018 120.0330.0420.0310.0341.6030.0381.5930.0180.0180.0180.017 Energy 12 0.1354.7230.1380.1340.8420.1350.2100.1060.1020.1050.106 240.2420.4350.2380.2360.8290.2490.8180.2180.2190.2110.204 360.32653.2770.3320.31821.7790.3351.6550.2920.3020.2890.295 480.4440.5300.4370.4160.7600.4410.7930.4060.4080.4140.403 Environment 480.4841.1180.4831.0190.8040.4830.7810.4830.4830.4840.482 960.5431.3410.5431.2620.6610.5390.6680.5410.5370.5420.543 1920.576151.70.5770.80511910.00.5770.7570.5770.5760.5780.576 3360.5336258.20.5330.5392568.80.5320.6600.5330.5330.5350.533 Public Health 121.3232.9241.3141.1562.0021.2791.3071.1071.0781.0831.089 241.5372.3071.5081.4811.7991.6061.8271.4501.4781.4391.426 361.67610.5001.6691.60012.9841.7022.2381.6371.6381.6091.623 481.7722.2181.7761.7471.8471.7681.9561.7151.7721.7221.727 Security 6146.5123.0134.9123.4197.7122.0188.9104.4106.6107.2108.1 8129.5133.4132.5133.7152.8129.0154.9107.9108.5112.4108.5 10132.2138.2135.1132.199.922135.3117.1112.6110.1113.5112.7 12145.2142.5139.1134.4103.0142.3120.1112.5112.8112.8111.8 Social Good 60.9680.9141.0430.9164.1890.9154.0170.8360.8840.8620.825 81.1401.0781.0591.0252.2491.0572.3110.9520.9430.9520.950 101.1272.4801.1841.1873.0211.2640.9861.0571.0621.0641.078 121.3516.8331.1851.2661.4301.2161.0601.1921.1601.1561.142 Traffic 60.2980.3030.2860.3080.7300.3160.7630.2360.2350.2370.233 80.3030.3000.2770.2980.3510.2840.3910.2320.2290.2330.225 100.2480.8610.2450.2971.3110.2960.4750.2230.2310.2250.228 120.3191.0900.3080.3240.4960.3340.3160.2880.2880.2840.290 74 Table 67: Comparison of multimodal fusion strategies with FEDformer (+ Doc2Vec). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA FEDformer Agriculture 60.0550.0580.0600.0540.0630.0603.2120.0580.0570.0620.061 80.0720.0760.0690.0700.0800.0732.8780.0690.0750.0730.068 100.0860.0880.0940.0940.1010.0870.7870.0950.0920.0880.088 120.1120.1280.1260.1180.1280.1190.6420.1210.1250.1140.117 Climate 6 1.0841.0931.1291.0581.0551.0841.1911.0691.1031.0831.081 81.0971.0601.0481.0651.0341.0640.9651.0621.0561.0631.059 101.1131.1081.0971.1021.0491.0810.9261.0991.0771.0701.069 121.1091.0761.1041.0951.0921.0900.9131.1071.1111.0911.071 Economy 6 0.0380.0360.0270.0400.0370.0399.9610.0300.0460.0340.038 80.0500.0470.0450.0530.0280.0416.8770.0270.0510.0500.043 100.0390.0420.0340.0470.0430.0351.1500.0350.0310.0360.042 120.0510.0330.0480.0520.0410.0351.1640.0320.0370.0360.043 Energy 120.0970.1050.0920.0950.1140.0930.1890.0930.0960.0920.093 240.1770.2420.1900.1760.1930.1800.8680.1980.1870.1810.186 360.2570.2700.2590.2470.2850.2611.4380.2560.2530.2740.264 480.3650.3760.3690.3780.5000.3820.8610.3620.3800.3630.384 Environment 480.4610.4780.4660.5180.4930.4480.6620.5030.6080.5100.470 960.4800.5630.5140.7080.4970.4700.4630.5430.5050.5230.493 1920.4840.5080.5260.5850.5690.5180.6910.5110.4950.5140.505 3360.4700.4850.4640.5350.4680.4820.6970.4650.4810.4860.485 Public Health 12 1.0701.1101.0901.0621.1291.0991.0251.0831.0881.0781.108 241.3111.4121.3981.4151.4061.3831.4801.3461.3531.3051.402 361.5571.5501.5441.5361.4891.5361.9611.5021.5001.4951.466 481.5581.6001.5751.5411.5921.4971.6041.6041.5131.5241.519 Security 6105.9109.3106.7113.2106.1105.0161.1106.6105.4108.7106.9 8107.9107.1107.3109.3110.2105.4141.2107.2110.5109.8110.0 10113.7112.1111.7114.6111.5115.2117.5111.7112.2109.3114.4 12117.4118.5114.5117.6114.1122.6118.5114.5113.6111.0114.1 Social Good 60.9050.8440.8410.8060.8910.8602.2530.8170.8150.8670.859 80.8900.8500.8500.8630.8560.8372.7100.8500.8120.8950.865 100.9480.9240.9020.9001.0910.8941.0800.9020.8840.9600.900 121.0431.0060.9821.0031.0920.9601.0660.9800.9620.9850.964 Traffic 60.1520.1520.1520.1510.1450.1470.6530.1520.1520.1500.152 80.1550.1600.1570.1510.1520.1560.3470.1560.1530.1580.155 100.1660.1640.1620.1650.1470.1630.2060.1630.1680.1640.162 120.2210.2430.2290.2220.2120.2250.2310.2270.2170.2110.219 75 Table 68: Comparison of multimodal fusion strategies with Autoformer (+ Doc2Vec). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Autoformer Agriculture 60.0710.0700.0660.0650.0680.0613.3570.0630.0650.0650.065 80.0880.0890.0840.0830.0840.0831.8490.0790.0900.0740.092 100.0950.1090.1070.1040.1120.0981.0300.1100.1030.0980.108 120.1360.1300.1280.1330.1440.1330.6370.1270.1280.1260.120 Climate 61.2411.2041.2351.2241.1731.1191.2971.2601.2201.1651.221 81.2011.2151.2041.1561.2471.1571.0081.2011.1701.2231.147 101.2111.2381.1961.1881.1541.2151.0821.1951.1511.1461.277 121.1831.1651.1891.1851.2171.1710.9701.1931.1581.1251.188 Economy 60.0580.0590.0510.0510.0720.06113.5060.0530.0580.0680.042 80.0630.0670.0620.0690.0630.0557.7930.0470.0520.0550.067 100.0610.0690.0550.0450.0740.0391.7720.0540.0620.0700.047 120.0550.0490.0510.0800.0490.0321.6020.0430.0470.0530.059 Energy 120.1280.1790.1470.1670.2150.1320.3610.1400.1700.1860.165 240.3000.3310.3370.3010.3280.3070.8880.3160.3240.2990.288 360.3680.3730.3990.4100.3810.3681.4970.3800.3660.3670.395 480.4940.4760.4840.4960.4850.4780.7590.4630.4810.4730.488 Environment 480.5230.5960.5590.8270.6180.5720.5580.5230.6150.5540.540 960.5410.7180.7560.6200.5990.5631.3720.7460.5950.6490.577 1920.5680.5830.6240.8960.6130.6580.9150.6940.9500.8890.634 3360.5140.5550.5370.6730.5430.6050.7230.5610.6090.5660.568 Public Health 12 1.5061.5791.7481.5581.7081.3891.5961.5381.5291.5131.603 241.8822.0282.0241.8901.8951.9501.6101.8271.7651.8991.870 361.8671.9071.9831.8762.2011.9101.8331.8771.8631.8261.852 481.9531.8922.0591.8501.9061.8521.6641.9251.8201.8671.921 Security 6105.8108.2108.3105.5104.4109.0154.2108.0108.0109.3109.8 8110.5108.7110.9111.2111.0111.6134.2108.3106.8107.4114.7 10115.2112.8115.5115.0112.5112.7114.4115.5111.5116.4114.6 12117.2116.9115.4115.3114.0118.3118.6115.4111.6116.8113.1 Social Good 60.8960.8120.8540.8470.8190.9104.2200.8370.8570.8600.835 80.9720.9830.9571.0221.0100.9372.7150.9921.0330.9500.988 100.9831.0091.0751.0421.1050.9891.0341.0061.0410.9991.037 121.0921.1221.0671.1251.1941.1361.1351.1041.1901.0891.059 Traffic 60.1730.1920.1700.1580.1670.1640.5200.1670.1700.1740.196 80.1860.1880.1980.2000.2450.1800.4290.1920.1950.1940.187 100.1920.1910.1850.2100.2150.1710.1970.1950.1890.2080.185 120.2410.2600.2340.2380.2330.2270.2870.2330.2500.2350.224 76 Table 69: Comparison of multimodal fusion strategies with Crossformer (+ Doc2Vec). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Crossformer Agriculture 60.1520.1170.1950.1510.1300.2260.1920.1550.1180.1390.173 80.1980.2480.2810.1490.7430.2950.2390.2830.2610.2540.167 100.2650.2150.2470.3460.2610.3950.3780.2680.2990.1920.251 120.3800.3570.3630.4010.3070.5090.5380.3960.2820.3260.459 Climate 6 1.0631.0531.0271.0491.0241.0531.0811.0931.0671.0241.050 81.0791.1091.0811.1151.1361.0721.0941.0601.0881.0811.079 101.0791.0971.0751.1051.0841.0971.0801.1051.0851.1011.101 121.0991.0961.1111.1161.1291.0901.0801.0721.0941.0901.106 Economy 6 0.4770.0770.4090.2290.3530.6970.5190.2180.2470.1380.180 80.3660.2960.1620.1795.8810.6660.3260.1800.1670.2550.515 100.3740.7970.6880.5880.3470.7641.0150.1460.3640.4410.251 120.7220.5810.5681.0730.5590.6481.0650.3760.2950.5610.168 Energy 120.1300.1310.1510.1710.1770.1510.1620.1510.1320.1480.114 240.2560.2380.2380.2650.4510.2880.2880.2830.2800.2650.249 360.3440.3360.3590.3400.3460.3510.3510.3350.3440.3320.355 480.4400.4490.4330.4170.4270.4660.4700.4400.4530.4590.428 Environment 480.4920.5800.5110.5730.5490.5040.5710.4990.5810.5310.529 960.5420.5300.6390.7510.6030.5910.5730.5840.6370.6060.533 1920.5210.5900.6490.6860.6580.6190.6080.5880.6160.5910.584 3360.5280.5400.5930.6620.5300.5580.6190.5870.5870.5770.543 Public Health 12 1.1721.1801.1061.1321.2471.1361.1261.1361.1321.0441.023 241.3561.3771.3501.3591.6861.4441.4631.3541.3371.3151.349 361.3911.3921.3891.2861.3621.4631.4331.4051.3881.2921.366 481.4801.4381.4521.5231.5041.4321.5401.4711.4831.4721.427 Security 6120.5120.7121.4117.5120.3120.6121.7120.3120.1119.7119.4 8121.8124.1124.9124.6133.0126.9126.9123.3121.8122.8122.0 10124.1125.8125.6125.6124.2125.4125.2126.0124.5125.9122.4 12127.2121.3128.6128.3128.5125.1127.8129.2124.9128.3126.0 Social Good 60.7190.7290.7380.7450.7420.7300.7240.7420.7370.7540.733 80.8170.8190.8170.8001.4720.7950.8210.8320.8210.8120.813 100.8790.9020.8510.8500.8750.8630.8720.8800.8670.8690.883 120.9280.9400.9260.9310.9430.9110.9150.9480.9000.9120.929 Traffic 60.2310.2170.2070.2120.2150.2160.2200.2220.2290.2040.209 80.2100.2330.2280.2190.3240.2200.2010.2100.2130.2220.229 100.2200.2190.2230.2250.2240.2150.2080.2120.2150.2220.217 120.2430.2500.2530.2500.2690.2450.2430.2470.2430.2480.256 77 Table 70: Comparison of multimodal fusion strategies with Reformer (+ Doc2Vec). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Reformer Agriculture 60.1890.1830.2770.1810.2050.1400.3220.2650.1810.1990.183 80.2170.2750.2150.1730.3770.2350.4170.2650.2950.1970.383 100.4890.4190.4710.2730.4080.3320.7320.3710.2940.5020.341 120.4140.5850.3330.3660.5650.6180.6010.3490.3900.2590.406 Climate 60.9831.0131.0141.0451.0071.0650.9331.0041.0930.9741.054 81.0831.0461.1051.0441.0731.0510.9461.0971.0651.0781.042 101.0961.0471.0801.1121.0981.0690.9071.0751.1021.1131.073 121.0771.0461.1071.0681.1211.1461.0521.0931.0411.1001.063 Economy 60.2680.3320.3370.3770.6430.3160.6490.1750.3020.1990.344 80.4220.2130.0960.5360.3790.7091.0530.3400.2210.3260.260 100.4470.7030.8340.5311.1100.8172.0160.6150.4410.6330.747 120.3671.1660.3000.2801.5431.5640.9430.4350.9220.4580.381 Energy 120.2670.2010.1560.2090.2050.2000.1450.1670.2230.1970.219 240.4360.4100.4400.4080.3170.3830.3330.4010.4010.3720.400 360.5030.4620.4800.4620.3700.4830.4700.3890.4810.4360.425 480.6000.5570.5270.5130.6400.5360.5830.5100.4880.5230.518 Environment 480.4110.5330.4660.4540.4030.4620.5610.4630.4450.4530.453 960.4720.5750.4970.4750.5580.4380.6860.5650.4920.4380.450 1920.4370.5670.4680.4850.5440.4570.7090.4530.4850.4560.457 3360.4460.4680.4340.5100.5760.4580.5750.4700.4460.4660.507 Public Health 121.1760.9311.2511.0981.0151.0941.1481.1071.1751.1881.086 241.2721.3411.2381.3241.3271.2861.3631.3171.2151.2061.285 361.3251.4351.3651.4101.2661.2521.3941.2491.2821.2801.334 481.4151.2981.5501.4291.4421.4491.5311.6001.4311.4401.375 Security 6 124.9124.5125.5123.9118.9119.2121.6125.0123.8123.5122.7 8124.9123.0123.3125.6123.2123.5124.5123.3123.3124.2124.8 10126.5127.1125.6124.8129.0124.7127.1125.5123.6124.8127.3 12125.1123.4126.5125.9126.0125.5127.1126.5124.8127.7125.6 Social Good 60.8040.7820.8080.8000.7960.8600.7740.8470.8000.8210.807 80.9120.8860.8800.8720.8970.8990.9180.8990.8990.9020.858 100.9540.9500.9700.9760.9700.9480.9580.9710.9640.9330.958 120.9931.0070.9390.9631.0070.9800.9910.9270.9980.9841.007 Traffic 60.2320.2440.2600.2530.2750.2970.2050.2460.2240.2160.258 80.2400.2610.2490.2790.2450.2700.2410.2490.2650.2580.270 100.2650.2570.2390.2330.2230.3120.2230.2250.2410.2760.246 120.2600.2520.2470.2440.2620.2920.2370.2570.2320.2420.240 78 Table 71: Comparison of multimodal fusion strategies with Transformer (+ Doc2Vec). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Transformer Agriculture 60.1380.1600.1380.0950.0970.1430.2850.1260.1620.1510.143 80.1550.2230.1790.2180.1520.2810.4460.1830.2010.2530.181 100.2020.3170.2360.1950.1150.1810.6770.2450.1920.1840.197 120.1950.2730.2740.2240.1460.3730.5340.2680.2070.2050.324 Climate 61.0230.9901.0231.0190.9711.0091.0911.0581.0801.0321.087 81.0481.0601.1061.0231.0571.0501.0681.1041.0401.0611.041 101.0941.0611.0671.0991.0970.9851.0401.0671.0481.1061.070 121.0211.0071.0511.0671.0881.0821.0231.0391.0371.0601.105 Economy 6 0.2520.2440.2350.4150.1080.2840.5260.2480.4470.1600.309 80.2590.5380.2080.3310.0620.4600.7350.2660.4280.1480.244 100.2880.7110.2170.1890.1100.1041.4520.2510.1910.0980.110 120.2650.3420.5560.1860.0880.5670.9370.4760.0520.5130.392 Energy 120.0850.1320.1020.1430.1170.1090.1240.0950.1010.0980.106 240.2040.2220.2000.3030.2690.2320.2240.2180.2040.1850.241 360.3570.4220.3660.3080.3140.2780.2700.3350.3460.3120.298 480.4790.4900.4050.4850.3990.4020.4220.3690.4350.4410.481 Environment 480.4070.5240.4300.4480.4650.4420.5010.4280.4660.4100.410 960.4870.4660.4540.4650.4230.4480.5230.4700.4650.4330.491 1920.4450.5280.4520.5190.4600.4680.6510.4630.4420.4520.442 3360.4490.4780.4640.4770.5380.4770.5550.4740.4080.4580.434 Public Health 12 0.9681.0691.1191.1611.0871.0401.0621.2871.0680.9610.962 241.2851.2881.2211.3521.1531.2931.4491.2081.3221.2961.207 361.3181.3211.2941.3451.2821.4131.1981.2651.2951.2881.277 481.4271.4321.3001.2801.3911.4321.5211.4081.3891.4161.576 Security 6126.6127.3126.5126.0122.3124.0124.4125.9124.6128.0124.5 8126.1127.5128.3125.9123.9127.6128.5128.4126.9126.7130.4 10129.3129.3130.7128.8124.1128.6129.9131.2130.1129.8130.7 12128.4126.5131.3129.9126.7130.4131.1130.9129.2129.0118.5 Social Good 60.6980.7370.7690.7810.7950.7430.7240.7700.7650.7410.764 80.7860.7790.8560.8550.8430.8130.7690.8290.8680.8150.841 100.8880.8310.9040.8550.8640.9250.8530.9290.8350.8990.913 120.8680.8960.8740.8520.8940.9140.8670.9830.9420.8620.940 Traffic 60.1610.1620.1630.1670.1730.1710.1580.1630.1850.1520.166 80.1650.1640.1760.1690.1960.1590.1600.1760.1620.1760.167 100.1810.1740.1920.1720.1990.1940.1680.1920.1900.1800.208 120.2130.2270.2180.2090.2370.2200.2090.2170.2150.2170.214 79 Table 72: Comparison of multimodal fusion strategies with TSMixer (+ Doc2Vec). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA TSMixer Agriculture 60.0850.0980.0910.1000.2040.0840.1430.0990.1370.1020.077 80.1240.1570.1580.3110.2050.3000.1670.1710.1180.1280.174 100.2590.1990.3740.2450.5030.4890.2130.3640.2240.2910.205 120.6060.8090.5730.6160.9070.6440.5670.5141.4120.6850.249 Climate 61.0581.0661.0771.0741.0871.0771.0791.0711.0691.0621.048 81.0921.0881.0831.0681.1021.0951.1001.1041.0951.0851.098 101.1061.1041.1101.0911.1051.1171.1121.1131.1091.0981.111 121.1041.1141.1171.1101.1181.0971.0961.1151.1301.1131.091 Economy 60.0450.0610.0560.0970.1330.0490.1280.1950.0670.0470.036 80.0360.1090.1260.0710.3460.2130.2080.2440.0900.0540.093 100.3580.4160.1740.2390.1650.9710.2030.5720.2970.2230.158 120.7920.7361.2170.8761.5250.4850.4400.2261.9560.8350.111 Energy 120.1570.1720.1500.1700.1670.1370.1780.1270.1440.1430.149 240.2640.2720.2890.3300.2760.2710.2760.2510.2760.3040.278 360.3610.3600.3770.4030.3630.3480.3660.3380.3700.3550.333 480.4540.4560.4740.4650.4460.4370.4590.4380.4840.4690.459 Environment 480.5390.5870.7160.7690.5760.5160.7180.5210.5430.6630.518 960.6060.6470.7381.0420.6480.6250.7320.5920.6640.6370.565 1920.6600.7180.6800.6970.6740.6331.0660.5920.6930.6740.588 3360.5860.6600.6630.6700.5670.5950.8920.5690.6310.6430.580 Public Health 121.0741.1161.2081.2221.0621.1271.1011.1631.1211.1061.050 241.4251.5041.3891.4971.4321.5031.5631.4041.5131.3951.366 361.4881.4471.4341.3921.4931.5361.5091.5231.4811.4651.565 481.6051.6441.6261.6021.7591.7051.6831.5411.6691.5751.526 Security 6 128.2128.2129.8127.2124.9125.7128.8123.1124.1125.9127.9 8127.8123.5127.8128.6122.3122.7130.2126.7126.2127.7125.9 10129.9129.7130.2131.9123.7124.6124.9131.9130.1130.1127.9 12130.2133.3132.1133.5132.2127.8128.9133.7130.5129.7129.7 Social Good 60.7440.7480.7820.7380.7450.7300.7300.8240.7440.7720.733 80.8830.9141.0650.9340.8890.8570.8910.9340.8170.8310.823 100.8900.8510.9120.8410.8970.9320.8930.9000.8370.8920.886 120.8810.8880.9260.9131.2670.9051.2300.9760.9860.8870.888 Traffic 60.2140.2010.2280.2040.2060.2280.2260.2300.2260.2070.187 80.2120.2200.2230.2130.2260.2220.2350.2750.2760.2010.188 100.2220.2290.2430.2250.2220.2920.2180.2660.3040.2310.207 120.2890.2890.2970.3040.4510.2740.2680.3470.5040.2590.262 80 Table 73: Comparison of multimodal fusion strategies with Informer (+ Doc2Vec). TS Model DatasetHUnimodal AdditiveConcatConstrained FirstMiddleLastFirstMiddleLastOrthogonalFiLMGatingCFA Informer Agriculture 60.2280.4680.3420.2490.2400.3060.2700.2440.2120.2000.218 80.3070.3980.2040.2760.2330.1990.3900.2540.2640.1840.232 100.3690.2670.3920.4500.5720.2560.4110.3820.3630.3710.341 120.5700.6090.5840.6250.4630.5530.4850.2740.3580.3520.517 Climate 6 1.0911.0451.0931.0631.0291.0621.0681.0661.0511.0261.016 81.1271.0301.0621.0941.0031.0551.0651.0581.0821.0411.032 101.0521.0611.0501.0840.8831.0761.0681.0761.0581.0831.091 121.0581.0841.0711.0740.9071.0241.0691.0651.0641.0691.103 Economy 6 0.5520.6670.7260.4040.5430.6030.9430.1920.5100.5260.381 80.7040.8520.7000.9040.2710.8120.4500.6180.4640.3870.720 100.8640.7410.6620.7271.0730.9280.8130.7570.9400.5970.276 121.0150.8940.2541.2730.5151.1010.7670.9791.0811.0111.057 Energy 120.1900.1770.2080.2160.2140.1830.1740.2030.2250.2000.205 240.2760.2970.4080.2940.2630.3590.2550.3410.3230.3740.286 360.3710.4010.4520.3740.3640.4180.3810.4310.3200.4110.481 480.4900.4800.5100.5200.4810.5080.5160.4830.4570.4630.501 Environment 480.4120.5930.4240.5090.5020.4360.5060.4550.4590.4740.449 960.4110.5010.5050.5270.4780.4560.5270.4290.4610.4550.441 1920.4450.4580.4820.5880.5760.4800.4860.4680.4600.4520.448 3360.4360.5120.4780.4890.5000.4550.5480.4930.4580.4520.461 Public Health 12 1.0611.1241.1091.1161.1030.9941.0041.0971.1051.1251.091 241.2931.3341.4091.3381.3591.4161.3481.4051.2881.3261.287 361.4361.4161.4151.5341.2281.4031.5481.4611.4431.4511.509 481.5561.6571.6341.6711.3321.6691.5741.6651.7721.6181.646 Security 6124.2127.0126.6128.0125.7128.0124.4128.6127.1127.8127.2 8122.3127.7128.0129.0128.8128.4127.3130.1126.7127.5128.8 10129.5130.2125.6127.0131.3129.2128.8129.4129.0128.4129.9 12130.8131.3130.9129.5127.4126.5130.4131.1127.3132.3129.9 Social Good 60.7250.7540.7900.7270.7520.7570.7660.7110.7470.7700.756 80.8010.7690.7880.8010.8090.8000.8080.8240.8030.8340.816 100.8490.8380.8010.8550.7830.8610.7980.8500.8370.8840.898 120.9110.8620.9400.8600.9020.8700.8700.9310.8840.8830.885 Traffic 60.1800.1780.1920.1480.1570.1540.1590.1640.1770.1550.156 80.1740.1690.1670.1850.1580.1700.1680.1930.1770.1630.182 100.2080.1810.1650.1930.2000.1770.1780.1870.1770.1730.193 120.2130.2210.2270.2100.2060.2120.2110.2200.2150.2210.229 81