Paper deep dive

From Digital Twins to World Models:Opportunities, Challenges, and Applications for Mobile Edge General Intelligence

Jie Zheng, Dusit Niyato, Changyuan Zhao, Jiawen Kang, Jiacheng Wang

Year: 2026Venue: arXiv preprintArea: cs.AIType: PreprintEmbeddings: 117

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 92%

Last extracted: 3/22/2026, 5:52:49 AM

Summary

This paper provides a systematic survey on the transition from traditional digital twins to world models to enable Edge General Intelligence (EGI). It highlights the shift from physics-based, system-centric replicas to data-driven, agent-centric internal models, detailing how world models support perception, latent state representation, and imagination-based planning in resource-constrained edge environments.

Entities (5)

Digital Twins · technology · 95%Edge General Intelligence · concept · 95%World Models · technology · 95%6G · communication-standard · 90%Edge Computing · infrastructure · 90%

Relation Signals (3)

World Models → enables → Edge General Intelligence

confidence 95% · This paper presents a systematic survey of the transition from digital twins to world models and discusses its role in enabling edge general intelligence (EGI).

Digital Twins → complementedby → World Models

confidence 90% · To complement and enhance the capabilities of digital twins, world models have gradually emerged and been adopted for similar tasks.

Edge General Intelligence → deployedin → Edge Computing

confidence 90% · EGI represents a new class of intelligent systems deployed close to the physical world... in wireless and edge computing environments.

Cypher Suggestions (2)

Find all technologies that enable Edge General Intelligence · confidence 90% · unvalidated

MATCH (t:Technology)-[:ENABLES]->(e:Concept {name: 'Edge General Intelligence'}) RETURN t.name

Map the relationship between Digital Twins and World Models · confidence 85% · unvalidated

MATCH (a:Technology {name: 'Digital Twins'})-[r]->(b:Technology {name: 'World Models'}) RETURN type(r)

Abstract

Abstract:The rapid evolution toward 6G and beyond communication systems is accelerating the convergence of digital twins and world models at the network edge. Traditional digital twins provide high-fidelity representations of physical systems and support monitoring, analysis, and offline optimization. However, in highly dynamic edge environments, they face limitations in autonomy, adaptability, and scalability. This paper presents a systematic survey of the transition from digital twins to world models and discusses its role in enabling edge general intelligence (EGI). First, the paper clarifies the conceptual differences between digital twins and world models and highlights the shift from physics-based, centralized, and system-centric replicas to data-driven, decentralized, and agent-centric internal models. This discussion helps readers gain a clear understanding of how this transition enables more adaptive, autonomous, and resource-efficient intelligence at the network edge. The paper reviews the design principles, architectures, and key components of world models, including perception, latent state representation, dynamics learning, imagination-based planning, and memory. In addition, it examines the integration of world models and digital twins in wireless EGI systems and surveys emerging applications in integrated sensing and communications, semantic communication, air-ground networks, and low-altitude wireless networks. Finally, this survey provides a systematic roadmap and practical insights for designing world-model-driven edge intelligence systems in wireless and edge computing environments. It also outlines key research challenges and future directions toward scalable, reliable, and interoperable world models for edge-native agentic AI.

PDF

Open source PDF →Open local PDF →

Full Text

116,344 characters extracted from source content.

Expand or collapse full text

1 From Digital Twins to World Models: Opportunities, Challenges, and Applications for Mobile Edge General Intelligence Jie Zheng, Dusit Niyato, Fellow, IEEE, Changyuan Zhao, Jiawen Kang, Jiacheng Wang Abstract—The rapid evolution toward 6G and beyond commu- nication systems is accelerating the convergence of digital twins and world models at the network edge. Traditional digital twins provide high-fidelity representations of physical systems and support monitoring, analysis, and offline optimization. However, in highly dynamic edge environments, they face limitations in autonomy, adaptability, and scalability. This paper presents a systematic survey of the transition from digital twins to world models and discusses its role in enabling edge general intelli- gence (EGI). First, the paper clarifies the conceptual differences between digital twins and world models and highlights the shift from physics-based, centralized, and system-centric replicas to data-driven, decentralized, and agent-centric internal models. This discussion helps readers gain a clear understanding of how this transition enables more adaptive, autonomous, and resource-efficient intelligence at the network edge. The paper reviews the design principles, architectures, and key components of world models, including perception, latent state representation, dynamics learning, imagination-based planning, and memory. In addition, it examines the integration of world models and digital twins in wireless EGI systems and surveys emerging applications in integrated sensing and communications, semantic communication, air–ground networks, and low-altitude wireless networks. Finally, this survey provides a systematic roadmap and practical insights for designing world-model-driven edge intelli- gence systems in wireless and edge computing environments. It also outlines key research challenges and future directions toward scalable, reliable, and interoperable world models for edge-native agentic AI. Index Terms—Digital Twins; World Models; Edge General Intelligence I. INTRODUCTION A. Background Edge computing is undergoing a paradigm shift from task- specific edge artificial intelligence (Edge AI) to edge general intelligence (EGI) [1]. EGI represents a new class of intelligent systems deployed close to the physical world, capable of long- term autonomous operation on multiple tasks, environments, and time scales. EGI enables low-latency or even ultra-low- latency inference while reducing system energy consumption J. Zheng is with State-Province Joint Engineering and Research Center of Advanced Networking and Intelligent Information Services, College of Computer, Northwest University, Xian, 710127, Shaanxi, China.(jzheng@nwu.edu.cn) D. Niyato, C. Zhao, R. Zhang and J. Wang are with the College of Comput- ing and Data Science, Nanyang Technological University, Singapore 639798. (dniyato@ntu.edu.sg, zhao0441@e.ntu.edu.sg, jiacheng.wang@ntu.edu.sg J. Kang with the Automation of School, Guangdong University of Tech- nology, Guangzhou 510006, China.(kavinkang@gdut.edu.cn) and operational costs, making it well suited for resource- constrained devices and diverse vertical applications [2]. Tra- ditional Edge AI is generally considered to have emerged from the development needs of interconnected ecosystems. Its primary objective is to enable the execution of local algorithms near data sources or edge servers, thus supporting applications with stringent requirements of low latency and high data efficiency, such as in-vehicle communication for autonomous driving [3]. Unlike traditional Edge AI, which focuses on predefined inference tasks, EGI agents can autonomously per- ceive, reason, and act within dynamic and partially observable environments. By executing algorithms directly on edge devices, edge intelligence supports localized data processing and reduces reliance on cloud infrastructure. This approach improves la- tency and enhances privacy and security, as sensitive data remain within the local environment [4]. Key application sce- narios include UAV-enabled communication and sensing net- works [5], intelligent transportation systems with autonomous vehicles [6], and industrial infrastructures for smart factories and energy grids [7]. In these scenarios, EGI is typically required to operate in close coupling with physical systems under resource constraints and in real time to address dynamic environments. This need for a closed loop of perception, decision-making, and action gives rise to clear embodied characteristics, aligning edge intelligence with the core princi- ples of embodied artificial intelligence. Embodied intelligence emphasizes autonomous behavior achieved through perception and interaction with the physical world, and relies on physical simulators and world models to support training, environment representation, and predictive planning, thus advancing agents toward higher levels of autonomy [8]. Achieving high levels of autonomy requires capabilities beyond simple reactive inference. Policies based only on instantaneous perception-to-action mappings typically imple- mented through end-to-end models are often fragile when faced with environmental changes, partial observability, and action delays. Thus, EGI systems must incorporate world models, which are internal representations of the external environment [9]. World models are commonly regarded as essential tools for understanding the current state of an envi- ronment and predicting its future evolution. By modeling state transitions, reasoning under uncertainty, and simulating and comparing different action sequences, world models support long-term planning and decision evaluation. World models can simulate complex real-world dynamics [10], while edge arXiv:2603.17420v1 [cs.AI] 18 Mar 2026 2 Introduction World Models From Digital Twin to World Model Applications Open Resources Project Future Directions Background Motivation and Contributions Brief Introduction to Digital Twins Overview of World Models Core Components of World Models From Rule-Driven Dynamics to Data-Driven World Evolution Action-Conditioned Prediction and Imagination Integrated Sensing, Communication, and Computing Semantic Communication Air-to-Ground Networks Digital Twin Framework World Model Framework Edge Artificial Intelligence Framework From World Replication to World Abstraction I I I IV V VI Low-Altitude Wireless Networks Differences Between World Models and Digital Twins World-Model-Enabled Edge General Intelligence Agent-Centric and Localized World Modeling Summary of the World Model Paradigm Hybrid Physics-Driven and Data-Driven World Models Federated World Modeling at the Edge Multi-Agent and Multi-Scale World Models Explainable and Trustworthy World Models Conclusion VII Fig. 1: Conceptual architecture from digital twins to world models for edge general intelligence. intelligence without such models often relies on short-term reactive decisions, making it challenging to ensure reliability and generalization in dynamic or unknown environments. B. Motivation and Contribution From early conceptual proposals to recent systematic de- velopments, digital twins have become a main paradigm for representing and modeling physical objects in networked cyber–physical systems [11]. Digital twins are typically im- plemented in software as high-fidelity virtual counterparts of physical systems, supporting contextual modeling, real- time state synchronization, and secure modular interactions through application programming interfaces (APIs). Thus, dig- ital twins can facilitate industrial analysis and decision-making tasks [12]. Built on explicit physical laws, domain-specific equations, and expert rules, digital twins provide substantial value for offline engineering analysis and system design. In recent years, digital twins have been widely applied in fields such as robotics, healthcare monitoring, and wireless communications. For example, RoboTwin integrates three- dimensional generative foundation models with large language models (LLMs) to generate diverse expert data for dual- arm robotic manipulation, leading to significant performance gains [13]. digital-twin-based models for smart home health- care monitoring support visual monitoring, health state predic- tion, and intelligent control [14]. In wireless communication and edge computing systems, digital twins are used for net- work planning, configuration validation, and what-if analysis. They also enable joint optimization of communication and computing under low-latency and energy-constrained condi- tions while meeting the model accuracy requirements [15]. However, digital twin technology still faces several chal- lenges, including complex communication and data integra- tion, limited data availability for machine learning training, high computational costs for high-fidelity modeling, strong dependence on interdisciplinary collaboration, and the lack of unified development and validation frameworks [16]. Tra- ditional digital twin paradigms, which are mainly designed for offline system engineering, are fundamentally misaligned with the requirements of online and autonomous EGI. This mismatch is further amplified in heterogeneous operational technology environments, where infrastructure differences and missing standards hinder large-scale deployment [11]. More- over, digital twins often rely heavily on predefined rules and prior assumptions, which limits their generalizability. High-fidelity modeling also struggles to meet the real-time constraints of resource-limited edge devices, and the system- centric modeling perspective tends to overlook the agent’s perception, decision-making, and action processes. As a result, many existing digital twins remain in the static replica stage, lacking dynamic evolution and intelligent decision-making capabilities [17]. To complement and enhance the capabilities of digital twins, world models have gradually emerged and been adopted for similar tasks [18]. Their objective shifts from high-fidelity replication of the physical world to capturing environment evolution that is relevant to decision-making. This reflects a broader transition from world replication to task-relevant ab- straction, from rule-driven to data-driven dynamics, and from a system-centric to an agent-centric modeling perspective. World models typically use modern representation learn- ing methods, such as variational autoencoders, to compress high-dimensional sensory input into low-dimensional latent states [9]. Action-conditioned state transitions are then learned in the latent space to support imaginative prediction of fu- ture environment dynamics. This paradigm aligns well with the core requirements of EGI, including resource efficiency, imagination-based planning, and close integration with rein- forcement learning and control algorithms. Existing studies have explored world model design from multiple perspectives, including multi-scale variation modeling, controllable predic- tion, structured reasoning, and dynamics modeling [10]. For example, DriveDreamer-2 generates diverse predictions cov- ering long-tail scenarios in autonomous driving [19]. GLAM improves model-based reinforcement learning by jointly mod- 3 B. Edge General Intelligence (EGI) Multi-Agent Coordinator Autonomous Edge Agent Online, Adaptive, Continuous Learning Prospective Decision-Making Needed Resource- Constrained (Latency, Energy, Privacy) Proposed Integration Path Fundamental Differences in Core Logic & Focus C. World Model (WM) Encoder (Information Preprocessing): High-Dimensional Data Compression & Feature Extraction Decoder (Output & Feedback): Latent State Reconstruction & Reward Signal Dynamics Model (Prediction Engine): State Transition Learning & Temporal Prediction (RNN/RSSM) CORE ENABLER: Prediction, Planning, Reasoning Aligns with Constraints A.Digital Twin(DT) Offline Simulation & Monitoring Physical Model High-Fidelity Virtual Replica Physics-Based & Rule-Driven Virtual Model Offline Simulation & Monitoring Fault Diagnosis GAP: Offline Analysis vs. Online Needs Fig. 2: A conceptual framework illustrating how digital twins and world models enable EGI. (A): Digital twin offline physics replica feeds online monitoring. (B): EGI agents compress data and pick actions via reward. (C): world model hierarchy bridges offline planning to online action. eling global and local state variations [20]. Drive-OccWorld applies a vision-centric 4D world model to end-to-end au- tonomous driving planning [21]. SWAP uses world-model- driven entailment graphs for structured reasoning [22][23], and MoSim supports long-horizon physical state prediction through motion dynamics modeling [24]. Thus, these studies show that by modeling only the environment dynamics related to task objectives and perceptual capabilities, world models can provide key support for long-term autonomy in resource- constrained edge environments. This survey reviews the evolution from digital twins to world models through the lens of EGI. Figure 1 outlines the structure of this paper, covering motivation, world models, comparison with digital twins, technical evolution, applica- tions, open resources, and future directions. Building on the background and motivation discussed above, the contributions of this work are as follows: • We provide the first comprehensive review on world models for EGI in wireless and edge systems, and con- trast them with traditional digital-twin-centric designs. We clearly distinguish world-model-centric EGI from conventional digital twin approaches, and outline a new direction for decision-centric modeling and autonomy at the network edge. • We provide a systematic conceptual comparison and a unifying perspective on digital twins and world mod- els as two complementary, yet fundamentally different, paradigms for modeling the physical world in edge sys- tems. We highlight key shifts from world replication to decision-oriented abstraction, from rule-driven to data- driven dynamics, and from system-centric to agent-centric modeling, and explain how these shifts effectively match the requirements of EGI. • We establish a taxonomy of world models tailored to wireless and edge scenarios, and decompose them into core components, including representation learning, dy- namics modeling, observation and action interfaces, and imagination-based planning. We review representative methods from machine learning, robotics, and control, and reinterpret them in terms of edge deployment con- straints, communication awareness, and integration with existing digital twin infrastructures. • We demonstrate the potential of world models in edge systems by mapping classical digital twin applications in integrated sensing, communication, and computing (ISCC), semantic communication, air-to-ground network, and low-altitude platforms, and industrial edge infrastruc- tures to their corresponding world model-based counter- parts. We identify key open challenges and outline future research directions, including hybrid physics–data-driven world models, federated world modeling at the edge, multi-agent and multi-scale modeling, and explainable and trustworthy world models for safety-critical EGI. By organizing existing knowledge and open questions along these dimensions, this survey offers both a conceptual founda- tion and a practical roadmap for researchers and practitioners moving from digital-twin-centric design toward world-model- centric EGI. I. WORLD MODELS This section introduces the basic concepts of digital twins and world models, and then discusses their main differences. We further explain how world models can serve as a core enabler for EGI. The detailed framework is shown in Figure 2. 4 TABLE I: Differences between digital twins and world models NameRefDescriptionTechnical FoundationAdvantages World models [9] [26] [1] [27] • Learns compact latent representations from multi modal sensory inputs • Models temporal state transitions to capture environment physics • Forecasts future latent states conditioned on planned actions • Representation learning • Dynamics models • Self-supervised learning • Latent-state compression • Resource-efficient • Self-adaptive • Foresighted • Task-centric Digital twins [11] [16] [28] [29] • Constructs physics-based virtual replicas of physical assets • Maintains bidirectional synchronization between physical and virtual states • Propagates state changes via real-time data streaming • IoT sensory ingestion • Bidirectional synchronization • Physics-based simulation • Edge-cloud computing • Cyber-physical fidelity • Life cycle governance • Anticipatory risk mitigation • Holistic visualization A. Introduction to Digital Twins Digital twin technology has emerged as a dominant paradigm for modeling physical systems in cyber-physical en- vironments [11]. A digital twin is typically a high-fidelity vir- tual replica of a physical asset, system, or process governed by physical laws, domain-specific equations, and expert-crafted rules. Digital twins support detailed simulation, performance evaluation, fault diagnosis, and system optimization. In wireless and edge systems, digital twins have been widely adopted for tasks such as network planning, configuration validation, interference analysis, and what-if performance eval- uation under known operational conditions [14]. For example, network operators use digital twins to explore different base- station deployment strategies, predict coverage and capacity, and test control policies before applying them to the live network. These applications show that explicit physics-based and rule-based models can provide valuable insights into complex edge environments [25]. However, the characteristics that make digital twins pow- erful for offline engineering analysis do not directly meet the needs of online, autonomous EGI agents that must learn, adapt and act continuously in the field. This gap motivates a critical reassessment of how world modeling should be approached for EGI. B. Overview of World Models World models are internal simulations of environmental dynamics constructed by intelligent systems [26]. By learning from real-world data and implicit laws, they capture key dynamic properties of the environment, predict future states, and provide agents with the ability to understand, reason about, and plan for the physical world. This capability of learning and representing physical dynamics enables world models to excel in computer vision tasks such as video generation. For instance, by internalizing spatiotemporal relationships from driving data, world models can generate realistic 4D scenes where objects follow consistent physical trajectories and spa- tial layouts over time, demonstrating their potential as physical simulators for visual content creation [30]. World models are not high-fidelity reproductions of the physical world. Instead, they focus on dynamic aspects that are directly related to agent behavior and build an internal cognitive framework for imaginative reasoning [9]. This ab- straction avoids the computational cost of full-scale replication and makes world models well suited to resource-constrained scenarios such as EGI, where they can offer efficient decision support. Data-driven dynamic modeling is central to the adaptability of world models. Unlike traditional models that depend on preset physical formulas or rule bases, world models automat- ically extract implicit environmental laws including physical regularities and temporal correlations via unsupervised or self- supervised learning from multi-modal interaction data, such as sensor measurements, action feedback, and environmental observations [18]. For example, in UAV flight scenarios, they can learn the mapping between actions and states from data that reflect channel variations and meteorological interfer- ence [18]. Even when facing previously unseen low-altitude weather conditions, they can still make reasonable predictions based on learned general laws, thereby overcoming the gener- alization limits of traditional models and flexibly adapting to the dynamics and uncertainty of edge environments. Prospective decision-making and planning are the core ob- jectives of world models [9]. Rather than responding passively, they follow a prediction–imagination–decision pipeline that gives agents forward-looking cognitive capabilities. World models can simulate multi-step future evolutions of the en- vironment based on the current state and candidate actions, assess the benefits and risks of different action sequences, and optimize decision strategies to avoid short-sighted behavior. This property makes them well aligned with the latency, energy, and privacy constraints that define EGI in practice [1], and provides the core support for long-term autonomous and adaptive decision-making at the edge. This adaptability comes from three key design characteristics of world models for edge deployment. The efficiency of the world model comes from latent- space abstraction. World models extract decision-relevant information without reconstructing the full environment in high fidelity, thus significantly reducing computation, stor- age, and communication overhead, which is critical for edge devices [18]. The agent-centric nature focuses only on the environmental dynamics that are relevant to the observations and actions of the agent, forming a localized task-oriented cognitive model and avoiding the use of limited resources on redundant information [31]. The generative imagination capa- bility uses generative AI techniques to infer the trajectories of scenarios that have not yet occurred, improving sample 5 TABLE I: From digital twins to world models: Enabling edge general intelligence ScopeRefKey insightFrom digital twins to world models Edge general intelligence [32] • Semi-synchronous edge intelligence using digital twins • Virtual-physical interaction for real- time decisions Digital twins continuously supply bound- ary data, laying an online learning founda- tion for later integration with world models [18] • Lightweight recurrent state-space model with fast reasoning • Real-time prediction for autonomous edge nodes World models give edge nodes real-time look-ahead reasoning ability, providing a decision brain for subsequent connection to digital twins [33] • Compress LLM knowledge into world models • Digital twins calibrate for trustwor- thy edge AI World models compress and offload large model capabilities to the edge, while digital twins correct physical errors in real time, making edge general intelligence both ef- ficient and trustworthy efficiency and further adapting to the practical requirements of edge scenarios [18]. These characteristics complement the data-driven nature and prospective decision-making objective of world models and together strengthen their applicability in edge environments. Thus, world models learn compact representations of en- vironmental dynamics from data and, through generative imagination and prospective reasoning, provide efficient, autonomous, and adaptive decision support for resource- constrained edge agents. They are becoming an indispensable cognitive pillar in the realization of EGI. C. Differences Between World Models and Digital Twins Both world models and digital twins aim to model environ- mental dynamics and support decision-making, but their core logic and application focus are fundamentally different. A world model is an internal cognitive framework of an agent. Rather than replicating reality, it distills laws of envi- ronmental dynamics that are relevant to agents’ actions and decisions [27]. Without relying on preset physical formulas or rule bases, a world model learns implicit environmental laws in an unsupervised or self-supervised way from multi- modal interaction data and is inherently designed to provide efficient decision support across diverse tasks [34]. To fit resource-constrained settings in EGI, world models reduce in- formation dimensionality via latent-space abstraction, greatly lowering computation, storage, and communication overhead and avoiding resource use on redundant information. With a focus on optimizing future decisions, they enable agents to perform autonomous,adaptive and long-horizon decision- making in dynamic and uncertain edge environments, and have already shown clear advantages in navigation and related tasks that require accurate perception and decision-making [35]. By contrast, the core of a digital twin is to build a high-fidelity virtual mirror of a physical entity, pursuing the accurate correspondence between the virtual and actual systems [28]. Its modeling process heavily depends on pre- set physical formulas, rule bases, and rich sensor data, and aims to restore the real system states through physics-based modeling and data fusion [29]. This makes digital twins very effective for intelligent asset management,. However, such high-fidelity requirements lead to massive data transmission, complex simulations, and large storage demands, impose strict requirements on hardware resources, and typically rely on cloud or high-performance servers. Digital twins reside in real-time monitoring of physical entities, reconstruction of historical processes, and diagnosis of potential faults, with an emphasis on knowing the current real-world states. Thus, Digital twins are more suitable for Industry 4.0 manufacturing, urban operation and maintenance, and other scenarios that need fine-grained management, rather than highly resource- constrained edge scenarios. The essential difference is that digital twins emphasize restoring reality, making the states and changes of physical entities observable and knowable, whereas world models em- phasize predicting the future, making agent decisions more forward-looking and near-optimal through law distillation and imaginative reasoning [27], [33]. This difference in core ori- entation allows world models to overcome the strong resource dependence of digital twins and better match the latency, energy, and privacy constraints of edge scenarios, while digital twins remain indispensable in scenarios that require high- fidelity reconstruction. D. World-Model-Enabled Edge General Intelligence Through leveraging the four core capabilities, such as imagi- nation, prediction, planning, and reasoning, world models meet the autonomous decision-making needs of EGI under resource constraints and dynamic uncertainty [36]. • Imagination: Imagination empowers world models to synthesize novel scenarios beyond observed reality, en- abling agents to mentally explore hypothetical futures without physical interaction [26]. This capability supports creative problem-solving and risk-free experimentation in resource-constrained edge environments. For example, by leveraging learned visual priors, navigation agents can imagine trajectories in unfamiliar environments from a single input image, predicting future visual observations to plan safe and efficient paths without prior explo- ration [30]. 6 A. From World Replication to World Abstraction C. From Simulation to Imagination High Fidelity Replication Decision-relevant Abstraction ... ... Cumulative reward function 퐀 퐀 ,퐀 퐀 ,퐀 퐀 퐀 퐀 Compact Latent Representations & Key Dynamics 퐀=퐀 Navier- Stokes Full Physical States & Mechanisms Passive Simulation 퐀 0 퐀 1 퐀 2 퐀 3 Single trajectory forecast Actions as parameters Offline Parameter-Driven Forecast Action-Conditioned Imagination 퐀 퐀 :퐀+퐀 퐀 퐀 :퐀+퐀 퐀 퐀 퐀 퐀+1 ...퐀 퐀+퐀 퐀 퐀+2 ... 퐀 퐀+2 ... 퐀 퐀+1 ...퐀 퐀+퐀 퐀 퐀+2 ...퐀 퐀:퐀+퐀 퐀 퐀+2 ...퐀 퐀:퐀+퐀 Online Imaginative Rollouts & Policy Evaluation Digital TwinWorld Model Digital TwinWorld Model B. From Rule-Driven to Data-Driven D. From System-Centric to Agent-Centric Rule-Driven DynamicsData-Driven World Evolution Fixed physical laws Fixed laws Predefined formulas Deterministic & Predefined Laws Agent-Centric Local Modeling Local perceptual field Internal cognitive model Localized Dynamics & Internal Cognitive Component System-Centric Modeling Global Replica & Holistic View Generative AI ... ... ... ... 퐀(퐀 퐀+1 |퐀 퐀 ,퐀 퐀 ) Learned Implicit Transitions & Generative Modeling Digital TwinWorld Model Digital TwinWorld Model Digital Twin Platform physical entities Fig. 3: Evolution from Digital Twin to World Model for EGI. (A) From world replication to world abstraction, (B) From rule-driven to data-driven, (C) From passive simulation to active imagination, (D) From system-centric to agent-centric. • Prediction: Prediction refers to inferring future environ- mental states by learning implicit laws from multi-modal interaction data [18]. In EGI scenarios, a world model can foresee channel-quality fluctuations based on histor- ical channel data and device trajectories, providing time margins for mode switching and power adjustment, and thus helping to avoid link outages [20]. In low-altitude network, it can fuse historical and real-time weather data to predict short-term changes in wind speed and direc- tion, provide early warnings for UAV flight safety, and mitigate the timeliness limitations of traditional weather forecasts [37]. • Planning: Planning uses predicted outcomes to simulate multi-step scenario evolution, evaluate candidate actions, and optimize decisions [38]. When UAVs perform de- livery or inspection tasks, the world model can combine weather and channel predictions to plan flight trajectories that balance safety, energy efficiency, and throughput. In scenarios with multiple UAVs or clusters of edge nodes, it can dynamically allocate limited computing, storage, and communication resources and adjust allocations based on task load and remaining resources, preventing overload or waste and improving overall system efficiency [38]. EGI requires autonomous decision-making in uncertain environments, and world models enhance this ability through latent-space simulation. For example, Zhang et al. [39] proposes a generative AI and multi-agent pol- icy optimization framework for satellite communication, achieving efficient modeling and adaptive optimization, thereby improving the planning capability of EGI. • Reasoning: Reasoning allows the world model to use learned general laws to handle unknowns and uncertain- ties at the edge [40]. In vehicular communications, when sudden road works or traffic jams occur, the onboard edge nodes can infer appropriate communication routes and trajectory adjustments by combining real-time road con- ditions with link states, thus maintaining continuous data delivery and driving safety [41]. In industrial edge-device maintenance, a world model can analyze abnormal sensor readings, distinguish between actual faults, interference, and false alarms, and provide accurate support for oper- ations and maintenance decisions, reducing unnecessary production loss [40]. EGI face challenges such as time-varying channels and dynamic node appearance caused by high-speed mobility. The prediction, planning and reasoning capabilities of world mod- els help ensure link stability, resource optimization, and rapid response to emergencies, thus meeting the low-latency and high-reliability requirements of wireless EGI networks [42]. In low-altitude cooperative UAV scenarios, these capabilities jointly support flight safety and mission efficiency, and en- able adaptation to the complex conditions of edge environ- ments [20]. I. FROM DIGITAL TWIN TO WORLD MODEL This section first introduces the core components of world models and then discusses the key technical shifts that drive the evolution from digital twins to world models. A. Core Components of World Models The perception–prediction–decision of world models is based on the coordinated operation of three core modules: the 7 encoder, the dynamics model, and the decoder. These modules are tightly coupled and together form an efficient closed-loop workflow [18]. • Encoder: The encoder is the front-end information pro- cessing module of the world model and is mainly re- sponsible for compressing high-dimensional data and ex- tracting key features [18]. It receives multi-modal, high- dimensional perceptual data such as images and radar signals, removes redundant details via feature extraction algorithms, and preserves core information relevant to agent decision-making. In this way, it transforms high- dimensional raw data into compact low-dimensional la- tent states [43]. This processing not only greatly reduces the computational overhead of downstream modules, which is important for edge devices, but also provides high-quality input for subsequent structure learning and prediction. • Dynamics model: The dynamics model is the core en- gine that allows a world model to learn environmental evolution laws and perform temporal prediction [26]. Its main role is to learn the state transition function of the environment so that it can accurately predict the next latent state given the current latent state and the action of agent [44]. Dynamics models can be broadly divided into deterministic models, such as RNN-based models, and stochastic models, such as the recurrent state-space model(RSSM), which uses both deterministic and stochastic hidden variables, and can be selected according to scenario requirements. The dynamics model takes the encoder’s latent state as input, combines it with the agent’s action, and predicts multi-step latent state trajectories, thereby providing crucial trend information to support decision-making [24]. • Decoder: The decoder is the key output module in the closed loop, responsible for mapping the latent states predicted by the dynamics model back to observable quantities [9]. It can reconstruct future environmental ob- servation frames and output reward signals. These outputs can be used to visually verify prediction quality and as- sess the validity of latent states, and they also serve as an important basis for policy evaluation, helping the system select better decision strategies. In addition, the decoder’s outputs can be used as feedback signals to update the encoder and the dynamics model, continuously improving feature extraction and state prediction accuracy. This closes the perception–prediction–decision–iteration loop and ensures the integrity and effectiveness of the entire pipeline [19]. Thus, the progression from digital twin to world model for EGI involves a transition from rule-driven, system-centric world replication to a data-driven, agent-centric abstraction capable of active imagination, as shown in Figure 3 and Table I. B. From World Replication to World Abstraction The core principle of digital twins is world replication. The goal is to construct virtual systems that are highly consistent with the physical world in structure, state, and mechanisms, to support system-level simulation and monitoring [16]. The modeling objective of world models is world abstrac- tion. Instead of pursuing full physical fidelity, world models retain only those environmental dynamics that affect an agent’s future cumulative rewards and emphasize decision-relevant information through abstract representations. For example, the world models framework proposed [26] uses a variational auto-encoder(VAE) to learn low-dimensional latent represen- tations and shows that such latent states alone can support complex control tasks with performance close to that achieved via direct interaction with the real environment. The Dreamer family proposed [27] further demonstrates that imaginative rollouts in a learned world model can significantly improve the sample efficiency of policy optimization. When the modeling objective shifts toward online decision- making and control for EGI, the criteria for evaluating state representations differ fundamentally from those in traditional systems engineering. The objective of EGI is to bring gen- eral, adaptive, and context-sensitive cognitive capabilities to resource-constrained edge devices. It emphasizes long-term autonomous decision-making in dynamic, uncertain, and par- tially observable environments, under tight computation, stor- age, and communication budgets [48]. In EGI scenarios, maintaining a globally accurate, real-time synchronized replica of the physical world leads to heavy computational and communication overhead and introduces large amounts of information that are irrelevant to current decisions, due to over-detailed physical descriptions [18]. This mismatch reduces the efficiency of decisions per unit of resource. Furthermore, traditional digital twins rely heavily on prior physical models, which are difficult to maintain in complex and unknown edge environments and often lack suf- ficient generalization and adaptability. Existing studies indicate that agent decision-making depends on how the environment evolves under the influence of actions rather than on complete reconstruction of physical states, while full replication of the physical world therefore does not produce an optimal state representation for decision-making [49]. Abstraction-based world models show clear advantages for EGI. The core requirement of EGI is to capture the mapping between scheduling or control actions and system perfor- mance, rather than to reconstruct the full physical state. By dis- carding irrelevant details for decision-making, world models can significantly reduce computation and communication costs on the edge [50]. More importantly, by learning environmental dynamics directly from interaction data, world models reduce dependence on precise prior physical models and lay the foundation for strong generalization and long-term autonomy in unknown environments [8], [51]. The evolution from digital twins to world models thus reflects a decision-centered shift in design philosophy. The modeling goal changes from building a high-fidelity replica of the world to constructing an abstract world representation that supports efficient prediction, planning, and learning by agents. 8 TABLE I: Comparison between digital twin and world model FeatureCharacteristicsReplication to Abstraction Rule-based to Data-driven Passive to ActiveSystem-centric to Agent-centric Digital Twin World Replication [16] Rule-driven [45] Passive Simulation [46] System-centric [47] • Achieve high-fidelity mapping that reflects precise physical reality • Maintain structural isomorphism to ensure geometric consistency with objects • Reproduce intricate physical details and geometry for total synchronization • Operate based on deterministic rules to ensure predictable outcomes • Models are strictly governed by explicit physical laws and mechanisms • Utilize complex mechanism equations to simulate exact physical behaviors • Run simulations based on fixed parameters to verify designs • Simulations rely on specific initial states to predict results • Treat agent actions as external inputs within a static system • Provide comprehensive global modeling for the entire system landscape • Create a complete virtual replica encompassing all system components • Ensure full coverage of complex system behaviors and interactions World Model World Abstraction [26] Data-driven [27] Active Imagination [18] Agent-internal cognitive component [9] • Abstract complex environments into compact, task-oriented representations • Retain only the critical information necessary for reward prediction • Extract low-dimensional latent features to simplify environmental complexity • Learn versatile environment representations through advanced generative AI • Evolve through continuous learning from massive environmental interaction data • Capture implicit state transition dynamics without explicit physical formulas • Predict future environment states conditioned on specific agent actions • Imagine virtual trajectories within latent space to evaluate plans • Enable autonomous exploration through internal ”what-if” active simulations • Focus on ego-centric local modeling relevant to the agent • Dynamically adapt modeling to match the agent’s real-time perception • Function as the internal cognitive brain for autonomous decision-making C. From Rule-Driven Dynamics to Data-Driven World Evolu- tion Traditional digital twin technology is grounded in physics- based mechanistic models, in which environmental evolution is governed by predefined physical laws and system-level equations. This rule-driven paradigm provides deterministic descriptions of physical entities through high-fidelity virtual replicas [16]. World models adopt a data-driven modeling paradigm. Using generative AI techniques, they compress environmental dynamics into compact latent spaces. By analyzing interaction data between agents and their environments, world models can autonomously learn implicit laws of state transitions and predict future environmental states [26]. A key characteristic of EGI is the need for long-horizon autonomous decision-making in dynamic, unstructured, and highly uncertain physical environments [52]. In practical EGI scenarios, such as UAV networks and intelligent transportation systems [53], fixed rule-based models often fail to capture the long-term evolution of complex systems with sufficient accuracy. This rule-driven nature limits their generalization capability. Since a digital twin operates strictly according to predefined formulas and assumptions, it cannot flexibly adjust its evolution logic when facing unmodeled physical phenom- ena or unexpected disturbances beyond its design scope. In addition, the complex numerical solvers required for high- fidelity physical modeling impose a substantial computational load that conflicts with the limited resources of edge devices. Data-driven world evolution modeling becomes a key di- rection for EGI. EGI agents can continuously refine their internal world models using local interaction data, thus adapt- ing to environmental changes without redesigning physical equations [18]. By performing evolution inference in low- dimensional latent spaces, world models avoid pixel-level sim- ulation of high-dimensional physical entities and significantly reduce computational latency and resource consumption. For example, in UAV networks, the aerial network world model, trained in large-scale aerial sequences, trajectories, and se- mantic labels, can predict semantically plausible long-range scenarios even under unseen meteorological conditions. This supports the dynamic generation of navigation paths that balance obstacle avoidance and high-level semantic objec- tives [37]. The shift from rule-driven to data-driven modeling repre- sents a fundamental evolution of EGI rather than a simple technical improvement. This transition allows the system to move beyond passive reproduction toward proactive prediction within open-world environments D. Action-Conditioned Prediction and Imagination As EGI evolves from basic perception to complex au- tonomous decision-making, the objective of modeling shifts from describing current states to reasoning about the conse- quences of actions [54]. From this viewpoint, digital twins and world models represent two distinct predictive paradigms: the former focuses on parameter-driven offline simulation, whereas the latter supports action-conditioned online imagi- nation. Traditional digital twins are designed mainly for sys- tem monitoring and performance evaluation. Their predictive mechanisms are typically based on passive simulation of environmental dynamics. Given an initial state and boundary conditions, a digital twin carries out deterministic simulations to forecast system trajectories. Agent actions are often treated 9 as external input or configuration parameters, rather than as intrinsic drivers of system evolution [55]. By contrast,world models use an action-conditioned pre- diction mechanism that explicitly models the agent’s actions as key variables in state transitions. Since trial-and-error interaction in the physical world is often costly, world models allow agents to conduct imaginative trajectory rollouts in a compact latent space. Without directly interacting with the real environment, an agent can rapidly simulate multiple candidate action sequences inside the model and evaluate their cumulative long-term rewards [9]. In EGI scenarios, edge devices are constrained by lim- ited computing, storage, and communication resources, while EGI tasks often involve long time horizons and limited re- wards [56]. Agents must therefore compare many potential decision paths under stringent cost constraints, which is dif- ficult to achieve efficiently with prediction schemes based on high-fidelity simulation. World models perform action-conditioned prediction and imagination in low-dimensional latent spaces. This avoids pixel-level simulation of high-dimensional physical states, greatly reducing computational complexity while preserving essential dynamics for decision-making. In autonomous driv- ing, for example, FSDrive [57] employs a world model as a predictor to generate future imagined scenes in latent space that jointly capture spatial and temporal structure, thereby enabling trajectory planning and policy evaluation without explicit pixel-level physical simulation. By shifting prediction from system-level reproduction to action-driven evaluation of future outcomes, world models pro- vide a more efficient and scalable decision-support paradigm for edge intelligence. E. Agent-Centric and Localized World Modeling Digital twin technology originated in large-scale system engineering contexts such as Industry 4.0 and smart cities, and its design philosophy is inherently system-centric [46]. By constructing a global virtual replica of the entire physical system, digital twins aim to reproduce the states of physical entities, their structural relationships, and their evolution from a holistic perspective, thereby supporting system-level analy- sis, simulation, and management. World models adopt an agent-centric and localized modeling paradigm. Agent-centric modeling means that a world model does not try to reconstruct the full objective physical world. Instead, it serves as an internal cognitive component of the agent [58]. The model is defined by the agent’s sensing capabilities, action space, and task objectives, and captures only the environmental dynamics that are relevant to the agent’s observations and actions. Through localized modeling, world models learn implicit evolution patterns and retain only local dynamics that directly affect current decisions [59]. Thus, the size of the model is decoupled from the overall environmental complexity, allowing resource-constrained edge devices to perform complex environment modeling tasks. In EGI scenarios that require continuous autonomous decision-making and online adaptation, the primary challenge of an agent is not only to precisely reconstruct the environ- ment, but also to quickly understand local conditions, predict the consequences of actions, and make effective decisions under partial observability [60]. For edge-deployed agents, sensing range, communication capability, and computational resources are limited, and the global state of the environment is generally neither fully observable nor practically maintain- able [48]. The agent-centric and localized nature of world models closely matches the fundamental requirements of EGI [61]. This modeling paradigm reduces the complexity of model construction and maintenance, making online learning and continuous updates on edge devices more feasible. It supports a deep integration with decision-making methods such as reinforcement learning and model predictive control, allowing the world model to function as an internal part of the agent’s decision process rather than an external simulation tool. Lo- calized world models offer better scalability and robustness in typical EGI scenarios characterized by multi-agent operation, non-stationary environments, and incomplete information[62]. By shifting from system-centric to agent-centric modeling, world models focus on local dynamics directly relevant to individual decisions. This shift not only reduces resource consumption at the edge,but also provides essential internal support for efficient prediction, reasoning, and autonomous decision-making in dynamic physical environments. F. Summary of the World Model Paradigm This section has examined the paradigm shift from digital twins to world models. This transition redefines physical- world modeling from fidelity-oriented replication to decision- oriented abstraction, establishing a more suitable cognitive modeling framework for resource-constrained EGI. Therefore, we summarize the key characteristics of the world model paradigm in four dimensions. • Decision-Oriented Abstract Modeling: With high-fidelity physical replication of digital twins, world models ab- stract the environment in a decision-oriented manner by learning compact latent representations that preserve only the dynamics relevant to an agent’s future rewards. This abstraction reduces model dimensionality and computa- tional overhead, making world models well suited for resource-constrained edge devices [26]. • Data-Driven Evolutionary Learning: Rule-driven digital twins that rely on predefined physical equations, world models use data-driven generative modeling to learn state transition dynamics from agent–environment interactions. This approach captures complex dynamics that are dif- ficult to describe analytically and supports continual learning, enabling the model to adapt to environmental changes and unexpected disturbances [18]. • Action-Conditioned Imaginative Reasoning: World mod- els incorporate agent actions into state transitions through action-conditioned prediction. By iterating one-step pre- dictions, they enable imagination-based reasoning in la- tent space to evaluate future trajectories under different actions. This allows agents to assess long-term effects 10 A. Integrated Sensing, Communication, and Computing Sensing Calibration(Baseline) Sensing (Data Collection) Semantic Priority Table Coding/Modulation Config Channel Protection Params B. Semantic Communication Industrial Sensors Edge nodes Physical System Replica High-Fidelity Model & Rules Engine DIGITAL TWIN (Offline Simulation &Validation) OFFLINE PLANNING & PRE-ALLOCATION Link Planning(Reliable) Compute Pre-allocation WORLD MODEL (Online Dynamic Adaptation) Real-time Data Encoder Dynamics Model Decoder Communication (Data Transmission) Computing (Processing & Decision) OPTIMIZED SYSTEM PERFORMANCE DIGITAL TWIN (Offline Pre-optimization & Calibration) Semantic Feature Leaming Channel Model Simulators Priority Rules Engine OFFLINE STRATEGY & PARAMETER CALIBRATION Semantic Encoding Efficient Transmission Semantic Decoding ROBUST & EFFICIENT SEMANTIC TRANSFER Synergy: Continuous Optimization WORLD MODEL (Online Adaptive Optimization) Real-time Data (Semantic/Channel/User) Adaptive Encoder/Decoder End-to-End Leaming Model Global System- Level Optimization Long-term Resource Management Global Structural Guidance GEO MEO LEO Digital Twin •Local online decision •Short-term action conditioned predication •Adaptive control •Local online decision •Short-term action conditioned predication •Adaptive control World Model action & observation 퐀 퐀 퐀 퐀 퐀 퐀+1 ,퐀 퐀 C. Air-to-Ground Networks UAV UAV Global Situational Awareness Offline Path Planning & Resource Scheduling Macro Guidance RIS Digital Twin World Model UAV Real-time dynamic correction Dynamic control & path correction Future trajectories 4D Occupancy-Flow Prediction Imagination in Latent Space •Real-time Obstacle Avoidance Prediction •Micro Control & Refinement D. Low-Altitude Wireless Networks Fig. 4: Evolution from digital twin to world model for EGI. (A) From world replication to world abstraction, (B) From rule- driven to data-driven, (C) From passive simulation to active imagination, (D) From system-centric to agent-centric. without real-world interaction, supporting forward plan- ning and improving sample efficiency [51]. • Agent-Centric Local Modeling: As an internal cognitive component, a world model is shaped by the agent’s perception, action space, and task goals, and captures only the local dynamics the agent can observe and influence. This agent-centric design decouples model complexity from the global environment, reduces the reliance on global state synchronization, and supports heterogeneous and collaborative deployment in distributed edge sys- tems [9]. In EGI, the world model paradigm provides a coherent and practical framework for environmental modeling. With replacing digital twins in system-level analysis, world mod- els complement them by supporting internal cognition and decision-making of agents. This enables edge agents to achieve continuous autonomy and long-term adaptation in dynamic, partially observable, and resource-constrained environments, marking a shift from task-specific automation to more general autonomous intelligence. IV. APPLICATIONS OF FROM DIGITAL TWINS TO WORLD MODELS This section discusses the application of world models to integrated sensing, communication, and computing (ISCC), semantic communication, air-to-ground networks, and low- altitude wireless networks. Table IV compares digital twins and world models based on their main functions and related studies in these scenarios. A. Integrated Sensing, Communication, and Computing Integrated sensing, communication, and computation (ISCC) is a key application in edge intelligence. Its objec- tive is to orchestrate the end-to-end optimization of these resources under strict edge constraints, thereby maximizing system performance in connectivity-centric scenarios [63]. Digital twins and world models jointly support deep synergy among the three ISCC pillars. The digital twin provides offline verification for sensing-data calibration, communication-link planning and computational-resource pre-allocation, whereas the world model dynamically adjusts, in an online fashion, sensing frequency, communication-protocol parameters and computation-offloading policies. The two mechanisms are functionally complementary and together form a cohesive co- design paradigm. The two mechanisms are functionally com- plementary and together form a cohesive co-design paradigm, as illustrated in Fig. 5. The principal value of the digital twin lies in its ability to perform offline, system-level modeling and validation of sensing accuracy, link quality, and resource allocation [64]. By constructing a high-fidelity virtual replica of the physical system, the twin exploits predefined physical models and rules to run simulations. In industrial-edge deployment scenarios, it accurately reproduces sensor coverage, wireless signal atten- uation, and compute-node capacity distributions, enabling ex- ante evaluation of various resource-scheduling policies [65]. This high-fidelity emulation offers reliable evidence for offline resource allocation and task scheduling, eliminates the risks and overhead of direct experimentation on real hardware, and constitutes an indispensable tool during the ISCC design phase. 11 Optimized Edge ISCC System Framework Online Dynamic Co-optimisation World Model Multimodal Interaction Data · Accuracy Benchmarks · Predefined Link Planning · Compute Resource h Distribution SYNERGY System Modeling & Offline Evaluation Offline System Verification · System Modeling &Offline Evaluation · Dynamic Module Temporal Dependency Tracking Real-Time Self-Adaptive Control Intelligent Edge ISCC System Optimised Sensing, Communication, & Computing at the Edge Adaptive Sensing Tuning Sampling Rate Adjustment Flexible Protocol Configuration Online Task Offloading Bandwidth & Parameter Adjustment Fractional Offloading & Balancing Encoder Latent State Compression · Latent State Compression · Temporal Dependency Tracking Decoder State Prediction & Recommendation Fig. 5: A conceptual framework illustrating how the optimized edge ISCC system enables intelligent edge computing and sensing through multi-modal data processing, world model- driven perception, and dynamic edge co-optimization. The world model focuses on online, dynamic co- optimization of sensing, communication, and computation, thereby accommodating the volatility and uncertainty inher- ent to edge environments. Its core advantage is prospective planning and real-time adaptation conditioned on the instan- taneous system’s state [66]. The world model employs data- driven approaches to learn the latent dependencies between the three functions using multi-modal interaction data [67]. To achieve this, an encoder first compresses high-dimensional data into low-dimensional latent states, extracting critical information to reduce overhead. Next, a dynamics module tracks environmental changes and captures temporal depen- dencies. Finally, a decoder maps these predicted states back to observable quantities, completing the sense-predict-decide loop. In ISCC scenarios for UAV swarms, the model forecasts communication traffic and computational-resource variations induced by alternative trajectory adjustments, enabling proac- tive cooperative strategies [68]. In industrial-edge contexts, it adaptively tunes sensor sampling rates, communication protocol configurations, and offloading fractions, dynamically balancing competing objectives [69]. This online adaptability exactly compensates for the inability of offline digital-twin optimization to cope with environmental dynamics. In physical layer security, the prediction and generation abilities of world models have shown great potential. The APEG [70] framework uses generative AI and diffusion models to achieve high- accuracy and adaptive physical layer authentication in dynamic environments. The synergy between digital twins and world models propels ISCC from offline static optimization to online dynamic self- adaptive optimization [18]. Exploiting high-fidelity physical mapping and offline deduction, the digital twin delivers pre- cise calibration benchmarks for sensing, dependable link- planning evidence for communication, and rational resource pre-allocation schemes for computation. Leveraging data- driven predictive modeling and online decision making, the world model continuously refines sensing frequency, commu- nication parameter configurations and computation-offloading strategies at run time [71], achieving online co-alignment of sensing, communication and computation. Collectively, the integration of digital twins and world models breaks down silos-based subsystem optimization, endowing ISCC with both the reliability of offline planning and the agility of online adaptation. Specifically, digital twin-integrated frameworks achieve the 33% end-to-end latency reduction and a 30% reduction in tail latency at a mere 2–7% energy cost [64], while world models enable dynamic parameter adjustment of 15–20% lower execution latency [69], ultimately enhancing system stability and efficiency in complex edge scenarios. B. Semantic Communication Semantic communication is widely recognized as a key technique for next-generation 6G networks because it focuses on the efficient transmission of semantic essence and core intent [86]. By conveying semantic meaning rather than raw bits, it breaks the throughput ceiling of conventional schemes and remains efficient under limited bandwidth or hostile chan- nels, thus providing a key enabler for resource-constrained intelligent networking [87]. The digital twin is responsible for offline pre-optimization and parameter calibration of semantic policies, whereas the world model performs online adaptation by tracking end-to-end dynamics and continuously refining encoding, transmissions, and decoding. The two actors there- fore operate in a complementary offline design, and an online evolution loop, as illustrated in Fig. 6. The digital twin, operating within semantic communication, concentrates on noise-prone channels, priority-aware semantic priority and the stringent requirement for guaranteed recog- nition accuracy, thereby enabling offline pre-optimization and calibration of core transmission parameters [72]. By learning intrinsic semantic features and their interaction rules, the digital twin proactively simulates the suitability of candidate encoding or transmission schemes. The digital twin tests the semantic-loss rate of a specific code under complex noise and measures the delivery delay of critical intent messages, then refines semantic recognition and reconstruction algo- rithms accordingly. Priority assignment is calibrated against latency budgets [72], while channel adaptation is tuned to counteract time-varying channel fading [73]. This targeted offline optimization intrinsically improves reliability and ef- ficiency, mitigates semantic distortion or intent loss, and lays a stable foundation for deployment, especially in scenarios where business demands are static and semantic rules are well defined [74]. The world model focuses on the real-time capture of dynamic variations along the entire semantic link. Without assuming preset channel parameters, it autonomously learns the latent relationship between semantic content, instantaneous channel state and user demand from massive live data and 12 TABLE IV: Functional comparison of digital twin and world model in edge intelligence ScenarioParadigmReferencesKey Functions and Contributions ISCC Digital Twin[18], [64], [65] Supports high-fidelity simulation for offline sensing calibration, link configuration, and resource pre- allocation. World Model [66]–[69] Enables online coordination via latent dynamics learning and adaptive sampling and offloading. Semantic Communication Digital Twin[72]–[74] Provides semantic policy calibration through priority setting and channel loss simulation. World Model [75]–[77] Supports dynamic semantic adaptation through im- plicit channel learning and encoding adjustment. A2G Networks Digital Twin[78]–[80] Facilitates global multi-layer scheduling and cross- layer system optimization. World Model [37], [81] Enables low-latency local control via action–link modeling and latent roll out. LAWNs Digital Twin[82], [83] Supports 3D environment modeling and trajectory optimization for coverage planning. World Model [35], [84], [85] Enables onboard real-time control with obstacle avoidance and channel prediction. World Model Process Dynamic Parameter Update Semantic Encoding & Compression Adaptive Modulation & Power Control Semantic Error Correction Intent Understanding & Optimization Channel State Analysis Real-Time Learning Digital Twin Pre-Optimization (Offline) Real-Time Adaptation (Online) Supporting Semantic Communication World-Model-Guided Semantic Compression · Knowledge-Aware Redundancy Reduction Reduction · Context-Adaptive Feature Extraction World-Model-Based Transmission Control · Channel-Aware Adaptive Modulation · Uncertainty-Aware Power Allocation World-Model-Assisted Robust Delivery · Predictive Noise & Fading Mitigation · Edge-Level Semantic Reconstruction World-Model-Aligned Intent Refinement · Intent Consistency Verification · Task-Oriented Semantic Optimization Fig. 6: A conceptual framework depicting the integration of world model processing, digital twin pre-optimization, and real-time adaptation for enabling robust and intelligent seman- tic communication. performs end-to-end optimization [75], [76]. The world model first compresses semantic data in the encoder to eliminate redundancy. It then adapts modulation and power during transmission to mitigate fading and, finally, corrects semantic distortion at the decoder to preserve the original intent. Its lightweight design enables its deployment on edge nodes, facilitating an efficient and robust cloud-edge-terminal archi- tecture [77]. In the offline phase, the digital twin generates semantic pri- ority tables, code modulation settings, and channel protection parameters [74]. During online operation, the world model dynamically updates these parameters to adapt to real-time noise, traffic, and resource conditions. This interaction ensures that the end-to-end system maintains near-optimal perfor- mance automatically, thus delivering the intended information with stability and efficiency [75]. Semantic communication empowered by world models provides a scalable and reliable architecture for next-generation intelligent networks. C. Air-to-Ground Networks Air-to-Ground (A2G) networks integrate satellites, high- altitude platforms, and UAVs to form a three-dimensional space-air–ground (SAG) communication and computing ar- chitecture, providing fundamental support for wide-area con- nectivity, low-latency services, and edge intelligence. How- ever, highly dynamic aerial topologies, heterogeneous network resources, and stringent ultra-low-latency requirements pose significant challenges to efficient resource allocation and task offloading [88]. Digital twins focus on global modeling of the multi-layer air–space–ground network structure and resource constraints, enabling system-level optimization for cross-layer resource scheduling and task offloading. To illustrate the hier- archical interaction between global planning and local adaptive control in A2G networks, the integrated Digital Twin–World Model framework is depicted in Fig. 7. Digital twins support system-level resource management and computation offloading in A2G scenarios by construct- ing high-fidelity virtual replicas of SAG networks. Hevesli et al. [78] developed a digital twin-based framework for air–ground cooperation in the 6G industrial Internet of Things, where a virtual network replica is used for real-time state prediction. Gong et al. [79] proposed a SAG digital twin inte- grated blockchain architecture for air–space–ground heteroge- neous networks, enabling centralized digital twins to perform 13 World Model Internal Process Supporting LEO A2G Applications Proactive Mobility Management Predictive Handover Decentralized Control Autonomous Edge Operations (Low-Latency) Digital Twin (Global Planning) System configuration & policy priors Multi-Source Data Ingestion Inputs: orbit dynamics, link states, sensing, neighbors Latent Encoding & Causal Learning Encodes history; captures causal action-state relationships Action-Conditioned Latent Simulation (Predictive Inference) Simulates state evolution and path selection Local Adaptive Decision Making Short-term actions under partial observability Pre-emptive Adjustment Resilient Link Adaptation Real-Time LEO Edge Adaptation (Online) Executing WM-driven local actions Global Policy & Long-Term Parameters Local State Feedback Dynamic Action Deployment Hierarchical Optimization Loop Resource Optimization Rapid Response Topology Awareness Fig. 7: Overview of world model–enabled framework for LEO–A2G networks: Latent simulation and causal learning with digital twin–edge hierarchical optimization supporting mobility, low-latency operations, and link adaptation. global resource scheduling and ensure globally optimal task allocation. As local environment simulators for A2G edge nodes, world models emphasize the relationships between aerial agent ac- tions and local link and airspace states, supporting short-term prediction under partial observability. Zhang et al. [37] pro- posed an aerial network world model that encodes historical link states, flight trajectories, and control actions into compact latent representations, allowing path selection to be simulated in latent space. Lu et al. [81] integrate edge interaction data with aerial and satellite sensing information, enabling world models to learn action-conditioned spatial state evolution and perform predictive inference before link fluctuations occur, thereby reducing the dependence of aerial nodes on frequent synchronization with centralized controllers. Zhang et al. [89] propose an integrated air–ground edge–cloud framework that improves multi-modal AI inference under limited bandwidth and outperforms both cloud-only and edge-only schemes. In A2G networks, digital twins and world models exhibit clear functional separation. Digital twins act as global op- erational mappings and planning centers for A2G networks, addressing how the general system should be configured [80]. World models serve as local cognition and decision engines for aerial edge agents, determining which actions should be taken under current local airspace and link conditions. Together, they form a hierarchical optimization framework that spans from system-level planning to link-level adaptive control. D. Low-Altitude Wireless Networks Low-Altitude wireless networks (LAWNs) are composed of platforms such as UAVs and electric vertical take-Off and land- ing (eVTOLs), typically operating below an altitude of 3000 Onboard World Model Process Local Perception & Action Feedback Latent Space Imagination & Prediction Predict Future States (Visual, Channel, Occupancy) Fine-Grained Dynamic Modeling Generate Real-Time Control Policies (Action Refinement) Sub-Second Decision Making at Edge World Model–Driven LAWNs Operational Framework Real-Time Edge Adaptation loop (Online) Provides macroscopic objectives, task boundaries, initial plans. Digital Twin (Global/Offline Guidance) Long-Term Objectives Autonomous Avoidance & Scalable Operation Learning Physical Laws from Observation 푉=푚Ԧ푣 퐹 0 =퐹 2 푓 푐 =푚 Ԧ 퐹 Ԧ 퐹 Ԧ 퐹 Ԧ 퐹 Physics-Informed Imagination & State Encoding Prediction and Policy with Physical Awareness Physics-Based Modeling & Edge Decision Physically Realistic Planning & Coordinated Control Real-Time Control & Adaptive Reconfiguration Fig. 8: Overview of the world model–driven framework for LAWNs: Onboard latent prediction, physics-aware control, and digital twin–guided edge adaptation. m. In these networks, highly unstructured environments, strong uncertainty, and strict computation and energy constraints on board make global planning and fast local response equally important [90]. Digital twins enable global awareness of aerial platform states by constructing high-fidelity virtual replicas of LAWNs and provide macroscopic guidance for large-scale resource scheduling. World models act as embodied cognition onboard aerial platforms. Using imagination in latent space, they empower UAVs with real-time local control capabilities, such as obstacle avoidance, short-term channel prediction, and action refinement in complex environments [91]. As shown in Fig. 8, the framework combines offline digital twin guidance with online world modeling in a hierarchical closed-loop architecture. In LAWNs, constrained airspace, dense node deployment, and complex interference conditions motivate the use of digital twins for global situational awareness [82], [83]. By integrating virtual replicas of urban 3D models, building distributions, wireless propagation environments, and UAV operational states, digital twins support system-level opti- mization of UAV trajectory planning, task assignment, and resource scheduling. Xie et al. [82] proposed a UAV ap- plication framework in which multiple tasks in the digital twin are coordinated by a task manager and interact with physical UAVs, enabling intelligent operation and management of real UAV networks. Wang et al. [83] developed a digital replica of aerial networks to jointly design power control, task partitioning, and computation resource allocation. World models act as local embodied decision-makers at the UAV edge. They enable real-time prediction of channel dynamics and environmental disturbances to generate fine- grained control policies. Bar et al. [35] proposed a navigation world model based on the conditional diffusion transformer 14 TABLE V: A summary of frameworks for digital twin, world model, edge general intelligence FieldMethodCharacteristicRelated Resource Link Digital Twin FaceChain [93]Training-free and compatiblehttps://github.com/modelscope/facechain PsyDT [94]LLMs and psychological counselinghttps://github.com/scutcyr/SoulChat2.0 DTTD2 [95]Robust and object trackinghttps://github.com/augcog/DTTD2 DTaaS [96]Management and serviceshttps://github.com/INTO-CPS-Association/DTaaS Ditto [97]PointNet++ and articulated objecthttps://github.com/UT-Austin-RPL/Ditto World Model LWM [98]Context understanding and traininghttps://github.com/LargeWorldModel/LWM DreamerV3 [27]RL and imagination-based planninghttps://github.com/danijar/dreamerv3 LingBot-World [99]High-fidelity and long-horizonhttps://github.com/Robbyant/lingbot-world IRIS [100]Data-efficient and Sequence-modelinghttps://github.com/eloialonso/iris GigaBrain-0 [101]Policy robustness and spatial reasoninghttps://github.com/open-gigaai/giga-brain-0 Edge General Intelligence LotteryFL [102]Personalized and Low-Comm FLhttps://github.com/charleslipku/LotteryFL Neurosurgeon [103]Fine-grained and Layer-wisehttps://github.com/Tjyy-1223/Neurosurgeon FedCache [104]Device-Fit and Personalizedhttps://github.com/wuzhiyuan2000/FedCache ORRIC [105]Adaptive Inference and Retraininghttps://github.com/caihuaiguang/ORRIC pFedSD [106]Faster personalization and robustnesshttps://github.com/CGCL-codes/pFedSD architecture, which predicts future visual observations condi- tioned on navigation actions, allowing UAVs to imagine flight trajectories in unfamiliar environments from a single input image and to perform online path planning under complex dynamic constraints. For more challenging dynamic obstacle avoidance tasks, Diehl et al. [84] proposed the 4D occupancy flow world model, which estimates and decomposes scene oc- cupancy flow from sparse laser radar observations, completes instance shapes, and predicts their temporal evolution. This predictive capability enables UAVs to anticipate the motion of low-altitude objects and make sub-second safe avoidance decisions while maintaining communication link stability, without relying on frequent synchronization with centralized digital twins. In addition, reconfigurable intelligent surfaces (RIS) introduce extra degrees of freedom for signal reflection in LAWNs. By learning implicit mappings among position, phase configuration, and channel quality, world models support end-to-end policy optimization without explicitly constructing analytical channel models [85]. In practical LAWNs deployments, digital twins provide offline or quasi-online trajectory planning and resource con- figuration from a global network perspective, offering macro- scopic guidance that defines task boundaries and long-term objectives. World models operate onboard aerial nodes, gen- erating real-time policies based on local perception and action feedback to dynamically adjust navigation maps and pre- defined trajectories [92]. This collaborative architecture en- sures both continuous network coverage and efficient resource utilization at the system level, while enabling edge nodes to autonomously adapt to transient environmental changes, forming a hierarchical control framework that combines global topology optimization with local link-level adaptation. V. OPEN RESOURCES PROJECT This section provides related open-source projects of digital twin, world model, and EGI across various fields. A. Digital Twin Framework Digital twins have advanced in multiple dimensions, includ- ing generation, behavior modeling, robust tracking, platform support, and object reconstruction. The following work high- lights representative frameworks and methods that illustrate progress in these areas. Deep Learning Tools for Digital Twin Generation: Deep learning has become a key tool for digital twin generation. In this context, FaceChain [93] is a deep learning framework to generate human portraits that preserve identity. It uses decoupled training and face-related perceptual understanding models to extract ID features, combined with Classifier-Free Guidance and models such as DamoFD [107] and M2FP [108]. The framework supports high-quality controllable portrait gen- eration with custom style training and pose control, producing realistic outputs. LLM-enhanced Digital Twin Frameworks: Large language models provide new capabilities for building digital twins with personalized behaviors. PsyDT [94] is an LLM-based framework to build digital twins of psychological counselors with personalized styles. It uses dynamic one-shot learning to capture counselor linguistic patterns and therapy techniques, synthesizing multi-turn dialogues to fine-tune the model for personalized counseling behavior. Robust Digital Twin Tracking: Robust pose estimation un- der sensor noise is essential for reliable digital twin tracking in mobile environments. DTTDNet [95] addresses this prob- lem by introducing a robust six-degrees-of-freedom (6DoF) pose estimation network for mobile environments. Built on a Transformer, it uses geometric feature filtering and Chamfer Distance loss to enhance robustness to depth noise. Exper- iments on DTTD-Mobile, a new digital-twin dataset from mobile devices, show that DTTDNet achieves 60.74 on the ADD metric, outperforming existing methods by at least 4.32, and remaining stable across noise levels. 15 Composable and Reusable Digital Twin Platform: Platform- based solutions reduce complexity and enhance reusability in digital twin systems. DTaaS [96] offers a composable platform that centrally manages assets—models, data, functions, and tools—and supports building digital twins as services, integrat- ing storage, computing, communication, monitoring, and task execution. Case studies show its effectiveness for development and service-oriented deployment. Digital Twins for Articulated Object Reconstruction and Modeling: Recent studies have explored digital twins to model the structure and motion of articulated objects. Ditto [97] con- structs digital twins of real-world articulated objects through interactive perception. Using visual observations before and after interaction, it reconstructs part-level geometry and esti- mates articulation models with implicit neural representations. The method is category-agnostic and supports real-world re- construction and physical simulation. These works demonstrate advances in digital twins across generation, personalized behavior, robust tracking, platform- based management, and modeling of complex objects. They improve the accuracy, controllability, and reusability of dig- ital twins, supporting the application of EGI in perception, reasoning, and autonomous decision-making. B. World Model Framework World models provide a framework for representing and predicting complex environments. They combine multimodal perception, decision-making, planning, and simulation capabil- ities. These models help agents act and learn efficiently across long tasks, high-dimensional spaces, and multiple domains, supporting applications in AI and robotics. LLM-enhanced World Model Frameworks: Multimodal models with extended context have shown significant progress in recent years. Building on these advances, LWM [98] enables cross-modal understanding and generation of text, images, and videos while handling long-context inputs. It excels in long-text retrieval, long-video understanding, and text-to- image/video tasks, while retaining short-context capabilities and providing an open-source training pipeline. World Model-Driven RL for Multi-Domain Tasks: General- purpose reinforcement learning (RL) has advanced rapidly in recent years. DreamerV3 [27] is a world model-based algorithm with robustness techniques such as normaliza- tion, Kullback-Leibler (KL) divergence balancing, and sym- log/symexp transformations. It adapts to 150+ tasks across eight domains without tuning, learns efficiently via unsuper- vised reconstruction and actor–critic methods, outperforms proximal policy optimization (PPO) and MuZero [109] algo- rithm on benchmarks, and offers flexible model size and replay ratio for practical cross-domain use. Cross-Domain Applications of World Models: Interactive simulation platforms have advanced rapidly with video gen- eration and world modeling. LingBot-World [99] provides an open-source platform for multi-domain simulations. It integrates a hierarchical semantic engine, multi-stage train- ing, and mixture-of-experts (MoE) architecture to deliver high-fidelity, long-term, low-latency environments, supporting prompt-driven events, agent training, and 3D reconstruction. Visual and Long-Horizon World Models: Recent RL re- search has focused on world model-driven approaches for sample-efficient learning. IRIS [100] builds a world model with a discrete autoencoder and autoregressive Transformer for learning in imagination. Trained on simulated trajectories with real data, it performs pixel-level prediction, reward and termi- nation estimation, adapts to complex visual environments, and outperforms humans in Atari 100k games. Efficient Robot Learning with World Models: Efficient robot learning depends on scalable data and robust task gener- alization. GigaBrain-0 [101] uses data generated by world models to build a vision-language-action (VLA) foundation model, reducing the reliance on real robot data. With RGB- depth-map (RGBD) modeling and embodied chain-of-thought supervision, it reasons about geometry, object states, and long-horizon dependencies, achieving robust performance in dexterous and mobile tasks. These studies highlight the role of world models in sup- porting EGI. By enabling reasoning, planning, simulation, and robot learning, they improve efficiency and general- ization, supporting robust perception, decision-making, and autonomous action in complex, multi-domain environments. C. Edge Artificial Intelligence Framework EGI faces the challenge of efficient, personalized learn- ing and inference across distributed heterogeneous devices. Integrating world models, personalized federated learning, resource-aware scheduling, and knowledge distillation im- proves efficiency, generalization, and supports diverse intel- ligent applications. EGI with Communication-Efficient Federated Learning: Reducing communication and enabling personalization are critical for federated learning on edge devices. LotteryFL [102] uses the Lottery Ticket hypothesis to train and transmit client-specific subnetworks, lowering communication over- head while supporting personalized models. Experiments on non-identically independently distributed (non-IID) datasets show improved accuracy and efficiency, with real-time deploy- ment demonstrated on edge devices. Resource-Aware EGI Frameworks: EGI applications require low latency and high energy efficiency. To address these chal- lenges, Neurosurgeon [103] uses layer-level neural network partitioning to coordinate edge and cloud computing resources. The framework predicts performance and dynamically adapts to hardware, architecture, and network conditions, achieving improved efficiency and throughput. Communication-Efficient FL for EGI: Efficient and person- alized learning is critical for edge devices. FedCache [104] uses a knowledge cache on the server to provide relevant infor- mation to client models and applies ensemble distillation. The framework supports heterogeneous devices and asynchronous interactions, achieving significant communication efficiency gains while maintaining performance comparable to state-of- the-art methods. Balancing Model Drift and Inference in EGI: Managing computation and maintaining accuracy are key to practical edge intelligence. In response, ORRIC [105] models resource 16 competition between retraining and inference and dynamically adapts resource allocation. It improves long-term inference accuracy while balancing drift-related losses and optimizing computational resources and latency. Self-Distilled FL for Edge Devices: Edge computing ap- plications require efficient and personalized model training. pFedSD [106] is a personalized federated learning framework for edge computing. It uses self-knowledge distillation to retain client historical models and guide local training, improv- ing personalization and convergence while supporting non-IID data, heterogeneous models, and preserving privacy with low system overhead. These studies highlight the importance of communication- efficient, resource-aware, and personalized strategies in edge environments. By reducing overhead, managing model drift, optimizing inference, and supporting diverse devices, they provide a solid basis for efficient and reliable EGI systems. VI. FUTURE DIRECTIONS Advancing world models for EGI requires realism, adapt- ability, and efficiency in dynamic and constrained environ- ments. Edge systems need models that combine physical knowledge with data-driven learning, support continual adap- tation, and operate at different spatial and temporal scales. The following directions outline key areas for developing robust, explainable, and cooperative world models. A. Hybrid Physics-Driven and Data-Driven World Models Future work should investigate hybrid architectures that combine explicit physical knowledge (e.g., radio propagation models, mobility laws, power constraints) with learned latent dynamics. Purely data-driven world models often struggle with extrapolation under sparse, non-stationary, or shifted data, and their generalization and scalability in complex real-world robotic scenarios remain in question [110]. In contrast, purely physics-based digital twins can be brittle and computation- ally expensive. Promising directions include physics-informed latent spaces, generative models regularized by conservation laws [111], and differentiable simulators coupled with neural world models. For EGI, such hybrid designs can improve robustness and interpretability while remaining lightweight enough for deployment at the edge. B. Federated World Modeling at the Edge EGI requires world models that evolve as environments, traffic patterns, and hardware conditions change. Future re- search should address continual and lifelong learning of world models under strict resource and privacy constraints, so that intelligent systems can retain existing knowledge while con- tinuously acquiring and integrating new information [112]. This includes mechanisms for online adaptation without catas- trophic forgetting, efficient model versioning across heteroge- neous edge nodes, and federated learning protocols tailored to world-model training (e.g., regularizing latent dynamics to improve agent behavior [113]). Handling non-IID data, enabling communication-efficient aggregation, and supporting privacy-preserving updates will be central challenges. C. Multi-Agent and Multi-Scale World Models Edge scenarios such as 6G, air-to-ground networks, and low-altitude operations inherently involve many interacting agents (devices, base stations, UAVs, vehicles) evolving across multiple temporal and spatial scales. Future research should explore world models that capture multi-agent interac- tions (e.g., via graph-structured agent-level interaction mod- ules [114]) and multi-scale processes (fast wireless-channel fluctuations vs. slower mobility and traffic patterns). Agent- centric models with interactive perception capabilities can support decentralized collaboration among distributed edge nodes, enable predicting emergent behaviors, and facilitate cooperative planning. Decentralized agents may form collusion through covert communication [115], highlighting the need to move beyond single-agent, local-view world models and to develop mechanisms for effective decentralized planning. D. Explainable and Trustworthy World Models As EGI systems are deployed in high-stakes environments, it is crucial that their decisions are understandable, reliable, and trustworthy. Current world models, often based on deep neural networks, tend to act as black boxes, making failures difficult to interpret and diagnose [116]. A key research direction is the development of explainable artificial intelligence (XAI) techniques tailored to world models. Potential approaches include methods to visualize the model’s imagined futures, to identify which environmental features are most influential for its predictions, and to quantify uncertainty in its forecasts. Building trust also requires mechanisms to detect out-of- distribution conditions and to decide when the world model should not be used for planning, as such detection is a key component of trusted machine learning systems [117] and enables graceful degradation, conservative fallback strategies, or safe human or rule-based intervention. VII. CONCLUSION This survey has outlined a unified perspective on the transition from digital twins to world models for EGI. Dig- ital twins remain indispensable for high-fidelity engineering analysis, lifecycle management, and system-level optimiza- tion. However, their reliance on explicit modeling, centralized computation, and continuous synchronization limits their suit- ability for autonomous and real-time operations at the edge. World models address these limitations by learning compact, action-conditioned representations of the environment and by enabling imagination-based planning and self-supervised adaptation. This survey has reviewed core architectures and algorithms for world models, their coupling with commu- nication, sensing, and control, and their emerging role in future wireless networks and cyber–physical systems. We have also highlighted open issues, including hybrid physics–data integration, federated and continual world modeling under non-IID edge data, multi-agent and multi-scale modeling, as well as safety, explainability, and standardization. These challenges define a rich research agenda for the coming years. Thus, the synergistic use of digital twins and world models is expected to provide a key technological pillar for robust, efficient, and intelligent edge systems in 6G and beyond. 17 REFERENCES [1] H. Chen, W. Deng, S. Yang, J. Xu, Z. Jiang, E. C. H. Ngai, J. Liu, and X. Liu, “Towards edge general intelligence via large language models: Opportunities and challenges,” IEEE Network, vol. 39, no. 5, p. 263– 271, 2025. [2] N. Syed, A. Anwar, Z. Baig, and S. Zeadally, “Artificial intelligence as a service (aiaas) for cloud, fog and the edge: State-of-the-art practices,” vol. 57, no. 8, Mar. 2025. [3] D. Katare, D. Perino, J. Nurmi, M. Warnier, M. Janssen, and A. Y. Ding, “A survey on approximate edge ai for energy efficient autonomous driving services,” IEEE Communications Surveys & Tutorials, vol. 25, no. 4, p. 2714–2754, 2023. [4] X. Wang, Z. Tang, J. Guo, T. Meng, C. Wang, T. Wang, and W. Jia, “Empowering edge intelligence: A comprehensive survey on on-device ai models,” ACM Comput. Surv., vol. 57, no. 9, Apr. 2025. [5] G. K. Pandey, D. S. Gurjar, S. Yadav, Y. Jiang, and C. Yuen, “Uav- assisted communications with rf energy harvesting: A comprehensive survey,” IEEE Communications Surveys & Tutorials, vol. 27, no. 2, p. 782–838, 2025. [6] B. Liu, H. Shi, D. Jia, E. Wang, W. Han, K. Zhong, L. Wu, S. Chen, C. Qiao, and J. Wang, “Collaborative sensing and communication for intelligent connected vehicles: A comprehensive survey,” IEEE Communications Surveys & Tutorials, vol. 28, p. 3125–3164, 2026. [7] S. V. Balkus, H. Wang, B. D. Cornet, C. Mahabal, H. Ngo, and H. Fang, “A survey of collaborative machine learning using 5g vehicular commu- nications,” IEEE Communications Surveys & Tutorials, vol. 24, no. 2, p. 1280–1303, 2022. [8] X. Long, Q. Zhao, K. Zhang, Z. Zhang, D. Wang, Y. Liu, Z. Shu, Y. Lu, S. Wang, X. Wei et al., “A survey: Learning embodied intelligence from physical simulators and world models,” arXiv preprint arXiv:2507.00917, 2025. [9] J. Ding, Y. Zhang, Y. Shang, Y. Zhang, Z. Zong, J. Feng, Y. Yuan, H. Su, N. Li, N. Sukiennik et al., “Understanding world or predicting future? a comprehensive survey of world models,” ACM Computing Surveys, vol. 58, no. 3, p. 1–38, 2025. [10] M. Goff, G. Hogan, G. Hotz, A. du Parc Locmaria, K. Raczy, H. Sch ̈ afer, A. Shihadeh, W. Zhang, and Y. Yousfi, “Learning to drive from a world model,” in Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, p. 1964–1973. [11] P. Empl, D. Koch, M. Dietz, and G. Pernul, “Digital twins in security operations: State of the art and future perspectives,” ACM Comput. Surv., vol. 58, no. 1, Sep. 2025. [12] D. Li, D. Han, N. Crespi, R. Minerva, S. M. Raza, R. Farahbakhsh, W. Liang, and Z. Zheng, “Blockchain in the digital twin context: A comprehensive survey,” ACM Comput. Surv., vol. 58, no. 6, Dec. 2025. [13] Y. Mu, T. Chen, Z. Chen, S. Peng, Z. Lan, Z. Gao, Z. Liang, Q. Yu, Y. Zou, M. Xu, L. Lin, Z. Xie, M. Ding, and P. Luo, “Robotwin: Dual- arm robot benchmark with generative digital twins,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2025, p. 27 649–27 660. [14] J. Chen, W. Wang, B. Fang, Y. Liu, K. Yu, V. C. M. Leung, and X. Hu, “Digital twin empowered wireless healthcare monitoring for smart home,” IEEE Journal on Selected Areas in Communications, vol. 41, no. 11, p. 3662–3676, 2023. [15] Z. Yang, M. Chen, Y. Liu, and Z. Zhang, “A joint communication and computation framework for digital twin over wireless networks,” IEEE Journal of Selected Topics in Signal Processing, vol. 18, no. 1, p. 6–17, 2024. [16] S. Mihai, M. Yaqoob, D. V. Hung, W. Davis, P. Towakel, M. Raza, M. Karamano ̆ glu, B. S. Barn, D. Shetve, R. V. Prasad, H. Venkatara- man, R. Trestian, and H. X. Nguyen, “Digital twins: A survey on enabling technologies, challenges, trends and future prospects,” IEEE Communications Surveys & Tutorials, vol. 24, p. 2255–2291, 2022. [17] G. Dagnaw, R. Capuano, and H. Muccini, “Digital twins for cultural heritage: A systematic analysis of the state of the art,” ACM Comput. Surv., 2026, just Accepted. [18] C. Zhao, R. Zhang, J. Wang et al., “World models for cognitive agents: Transforming edge intelligence in future networks,” arXiv preprint arXiv:2506.00417, 2025. [19] G. Zhao, X. Wang, Z. Zhu, X. Chen, G. Huang, X. Bao, and X. Wang, “Drivedreamer-2: Llm-enhanced world models for diverse driving video generation,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 10, p. 10 412–10 420, Apr. 2025. [20] Q. He, W. Liang, C. Hao, G. Sun, and J. Tian, “Glam: Global-local variation awareness in mamba-based world model,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 16, p. 17 105– 17 113, Apr. 2025. [21] Y. Yang, J. Mei, Y. Ma, S. Du, W. Chen, Y. Qian, Y. Feng, and Y. Liu, “Driving in the occupancy world: Vision-centric 4d occupancy forecasting and planning via world models for autonomous driving,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 9, p. 9327–9335, Apr. 2025. [22] S. Xiong, A. Payani, Y. Yang, and F. Fekri, “Deliberate reasoning in language models as structure-aware planning with an accurate world model,” in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).Vienna, Austria: Association for Computational Linguistics, Jul. 2025, p. 31 900–31 931. [23] Q. Jia, J. Zheng, L. Gao, J. Niu, R. Cao, and J. Ren, “Satellite- aided low-altitude uav service migration with semantic extraction and generated graphs,” IEEE Transactions on Cognitive Communications and Networking, vol. 12, p. 5136–5147, 2026. [24] C. Hao, W. Lu, Y. Xu, and Y. Chen, “Neural motion simulator pushing the limit of world models in reinforcement learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2025, p. 27 608–27 617. [25] X. Huang, H. Yang, C. Zhou, M. He, X. Shen, and W. Zhuang, “When digital twin meets generative AI: Intelligent closed-loop network man- agement,” IEEE Network, 2025, to appear. [26] D. Ha and J. Schmidhuber, “Recurrent world models facilitate policy evolution,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 31, Montr ́ eal, Canada, 2018. [27] D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap, “Mastering diverse control tasks through world models,” Nature, vol. 640, no. 8059, p. 647–653, 2025. [28] F. Tao, J. Qi, L. Zhang et al., “Digital twin modeling: A systematic literature review and meta-analysis,” ACM Computing Surveys, vol. 57, no. 3, p. 1–34, 2025. [29] L. Tao, Y. Zheng, J. Cao, and F. Tao, “Digital twin-driven smart manufacturing: A review and future directions,” IEEE Transactions on Industrial Informatics, vol. 21, p. 1–11, 2025. [30] Y. Zhang et al., “DriveDreamer4D: World models are effective data machines for 4D driving scene representation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2025, p. to appear. [31] Y. Guan, Liao et al., “World models for autonomous driving: An initial survey,” IEEE Transactions on Intelligent Vehicles, p. 1–17, 2024. [32] M. Tao, L. Liao, R. Xie, Y. Zhang, G. Min, and Y. Zhang, “SAEI- DT: Semi-asynchronous edge intelligence for industrial digital twin networks in 6g,” IEEE Network, vol. 39, no. 5, p. 138–144, Sep. 2025. [33] R. Zhou, D. Chen, Z. Jia et al., “Digital twin AI: Opportunities and challenges from large language models to world models,” arXiv preprint arXiv:2601.01321, 2026. [Online]. Available: https: //arxiv.org/abs/2601.01321 [34] N. Hansen, Su et al., “TD-MPC2: Scalable, robust world models for continuous control,” in 12th International Conference on Learning Representations, ICLR 2024, 2024. [35] A. Bar, G. Zhou, D. Tran, T. Darrell, and Y. LeCun, “Navigation world models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, p. 15 791–15 801. [36] L. He, L. Fan, X. Lei, P. Fan, A. Nallanathan, and G. K. Karagiannidis, “The road toward general edge intelligence: Standing on the shoulders of foundation models,” IEEE Communications Magazine, vol. 63, no. 9, p. 1–9, 2025. [37] W. Zhang, P. Tang, X. Zeng, F. Man, S. Yu, Z. Dai, B. Zhao, H. Chen, Y. Shang, W. Wu, C. Gao, X. Chen, X. Wang, Y. Li, and W. Zhu, “Aerial world model for long-horizon visual generation and navigation in 3d space,” arXiv preprint arXiv:2512.21887, 2025. [Online]. Available: https://arxiv.org/abs/2512.21887 [38] Y. Wu, L. Ma, R. Zhang et al., “Towards edge general intelligence: Knowledge distillation for mobile agentic AI,” arXiv preprint arXiv:2511.19947, 2025. [Online]. Available: https://arxiv.org/abs/ 2511.19947 [39] R. Zhang, H. Du, Y. Liu, D. Niyato, J. Kang, Z. Xiong, A. Jamalipour, and D. In Kim, “Generative ai agents with large language model for satellite networks via a mixture of experts transmission,” IEEE Journal on Selected Areas in Communications, vol. 42, no. 12, p. 3581–3596, 2024. [40] T. Susnjak, T. R. McIntosh, A. L. C. Barczak et al., “Over the edge of chaos? excess complexity as a roadblock to artificial general intelligence,” IEEE Transactions on Cybernetics, vol. 56, no. 1, p. 1–12, 2025. 18 [41] Y. Tang, J. Yu, K. Gai et al., “Missing target-relevant information prediction with world model for accurate zero-shot composed image retrieval,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, p. 24 785–24 795. [42] Z. Li, J. Lou, Z. Tang, J. Guo, T. Wang, W. Jia, and W. Zhao, “Online layer-aware joint request scheduling, container placement, and resource provision in edge computing,” IEEE Transactions on Services Computing, vol. 18, no. 1, p. 328–341, 2025. [43] Y. Shen, H. Liu, K. Pei et al., “MetaWorld: Skill transfer and composition in a hierarchical world model for grounding high-level instructions,” arXiv preprint arXiv:2601.17507, 2026. [44] R. Worden, “AI and world models,” arXiv preprint arXiv:2601.17796, 2026. [45] W. Liu, Y. Fu, Z. Shi, and H. Wang, “When digital twin meets 6g: Concepts, obstacles, and research prospects,” IEEE Communications Magazine, vol. 63, p. 16–22, 2024. [46] F. Tao, H. Zhang, and C. Zhang, “Advancements and challenges of digital twins in industry,” Nature Computational Science, vol. 4, p. 169 – 177, 2024. [47] J. Li, S. Guo, W. Liang, J. Wang, Q. Chen, Y. Zeng, B. Ye, and X. Jia, “Digital twin-enabled service provisioning in edge computing via continual learning,” IEEE Transactions on Mobile Computing, vol. 23, p. 7335–7350, 2024. [48] R. Zhang, G. Liu, Y. Liu, C. Zhao, J. Wang, Y. Xu, D. Niyato, J. Kang, Y. Li, S. Mao, S. Sun, X. Shen, and D. I. Kim, “Toward edge general intelligence with agentic ai and agentification: Concepts, technologies, and future directions,” IEEE Communications Surveys & Tutorials, vol. 28, p. 4285–4318, 2025. [49] T. Meuser, L. Lov ́ en, M. H. Bhuyan, S. G. Patil, S. Dustdar, A. Aral, S. Bayhan, C. Becker, E. de Lara, A. Y. Ding, J. Edinger, J. Gross, N. Mohan, A. D. Pimentel, E. Rivi ` ere, H. Schulzrinne, P. Simoens, G. Solmaz, M. Welzl, and S. Dustdar, “Revisiting edge ai: Opportu- nities and challenges,” IEEE Internet Computing, vol. 28, p. 49–59, 2024. [50] Y. Zheng, P. Yang, Z. Xing, Q. Zhang, Y. Zheng, Y. Gao, P. Li, T. Zhang, Z. Xia, P. Jia, and D. Zhao, “World4drive: End-to-end autonomous driving via intention-aware physical latent world model,” 2025. [51] Q. Fang, W. Du, H. Wang, and J. Zhang, “Towards unraveling and im- proving generalization in world models,” ArXiv, vol. abs/2501.00195, 2024. [52] S. Deng, H. Zhao, J. Yin, S. Dustdar, and A. Y. Zomaya, “Edge intel- ligence: The confluence of edge computing and artificial intelligence,” IEEE Internet of Things Journal, vol. 7, p. 7457–7469, 2019. [53] E. Baccour, N. Mhaisen, A. A. Abdellatif, A. Erbad, A. Mohamed, M. Hamdi, and M. Guizani, “Pervasive ai for iot applications: A survey on resource-efficient distributed artificial intelligence,” IEEE Communications Surveys & Tutorials, vol. 24, no. 4, p. 2366–2418, 2022. [54] M. A. Ali and F. Dornaika, “Edge artificial intelligence: A systematic review of evolution, taxonomic frameworks, and future horizons,” ArXiv, vol. abs/2510.01439, 2025. [55] Q. He, J. Lin, H. Fang, X. Wang, M. Huang, X. shuang Yi, and K. Yu, “Integrating iot and 6g: Applications of edge intelligence, challenges, and future directions,” IEEE Transactions on Services Computing, vol. 18, p. 2471–2488, 2025. [56] Z. Liu, X. Chen, H. Wu, Z. Wang, X. Chen, D. T. Niyato, and K. Huang, “Integrated sensing and edge ai: Realizing intelligent perception in 6g,” IEEE Communications Surveys & Tutorials, vol. 28, p. 2725–2770, 2025. [57] S. Zeng, X. Chang, M. Xie, X. Liu, Y. Bai, Z. Pan, M. Xu, and X. Wei, “Futuresightdrive: Thinking visually with spatio-temporal cot for autonomous driving,” in The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. [58] L. Baraldi, Z. Zeng, C. Zhang, A. Nayak, H. Zhu, F. Liu, Q. Zhang, P. Wang, S. Liu, Z. Hu, A. Cangelosi, and L. Baraldi, “The safety challenge of world models for embodied ai agents: A review,” ArXiv, vol. abs/2510.05865, 2025. [59] X. Li, X. He, L. Zhang, and Y. Liu, “A comprehensive survey on world models for embodied ai,” ArXiv, vol. abs/2510.16732, 2025. [60] H. Luo, Y. Liu, R. Zhang, J. Wang, G. Sun, D. Niyato, H. Yu, Z. Xiong, X. Wang, and X. Shen, “Toward edge general intelligence with multiple-large language model (multi-llm): Architecture, trust, and orchestration,” IEEE Transactions on Cognitive Communications and Networking, vol. 11, p. 3563–3585, 2025. [61] X. Wang and W. Jia, “Optimizing edge ai: A comprehensive survey on data, model, and system strategies,” ArXiv, vol. abs/2501.03265, 2025. [62] P. Yang, B. Lu, Z. Xia, C. Han, Y. Gao, T. Zhang, K. Zhan, X. Lang, Y. Zheng, and Q. Zhang, “Worldrft: Latent world model planning with reinforcement fine-tuning for autonomous driving,” ArXiv, vol. abs/2512.19133, 2025. [63] D. Wen, Y. Zhou, X. Li, Y. Shi, K. Huang, and K. B. Letaief, “A survey on integrated sensing, communication, and computation,” IEEE Communications Surveys & Tutorials, vol. 27, no. 5, 2025. [64] B. Li, W. Liu, W. Xie, N. Zhang, and Y. Zhang, “Adaptive digital twin for UAV-assisted integrated sensing, communication, and compu- tation networks,” IEEE Transactions on Green Communications and Networking, vol. 7, no. 4, p. 1996–2009, Aug. 2023. [65] Y. Li, W. Liang, Z. Xu, W. Xu, and X. Jia, “Budget-constrained digital twin synchronization and its application on fidelity-aware queries in edge computing,” IEEE Transactions on Mobile Computing, vol. 24, no. 1, p. 165–182, Jan. 2025. [66] J. Del Ser et al., “World models in artificial intelligence: Sensing, learning, and reasoning like a child,” Mar. 2025. [67] X. Chen, K. Huang et al., “Distributed integrated sensing and edge AI exploiting prior information,” 2025. [68] Y. Ma, B. Ai, J. Li et al., “Integrated sensing, communication, computing, and control meets UAV swarms in 6g,” 2025. [69] C. Deng et al., “Integrated sensing, communication, and computation with adaptive DNN splitting in multi-UAV networks,” IEEE Transac- tions on Wireless Communications, 2024, early access. [70] X. Cheng, R. Meng, X. Xu, H. Gao, P. Zhang, and D. Niyato, “Apeg: Adaptive physical layer authentication with channel extrapolation and generative ai,” IEEE Transactions on Information Forensics and Secu- rity, vol. 21, p. 1257–1272, 2026. [71] D. Wen et al., “Integrated sensing, communication, and computation for over-the-air federated edge learning,” IEEE Transactions on Wireless Communications, vol. 25, p. 2748–2762, 2026. [72] B. Li, H. Cai, L. Liu, and Z. Fei, “Delay-aware digital twin syn- chronization in mobile edge networks with semantic communications,” IEEE Transactions on Vehicular Technology, vol. 74, no. 7, p. 10 974– 10 983, Jul. 2025. [73] S. D. Okegbile, H. Gao, and J. Cai, “A novel secure split federated semantic learning framework and its optimization for digital twin network evolution,” IEEE Transactions on Mobile Computing, vol. 25, no. 1, p. 1302–1319, Jan. 2026. [74] F. Tang, L. Luo, Z. Guo, M. Zhao, and N. Kato, “Semantic twin network: Bridging real-world and virtual networks with semantics,” IEEE Wireless Communications, vol. 32, no. 6, p. 224–232, Dec. 2025. [75] P. Jiang, J. Guo, C.-K. Wen, S. Jin, and J. Zhang, “Semantic commu- nications with world models,” arXiv preprint arXiv:2510.24785, 2025. [76] S. Tan et al., “SceneDiffuser++: City-scale traffic simulation via a generative world model,” in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. [77] Y. Yang et al., “Integrated sensing, computing, and semantic communication with fluid antenna for metaverse,” arXiv preprint arXiv:2504.07656, Apr. 2025. [78] M. Hevesli, A. M. Seid, A. M. Erbad, and M. M. Abdallah, “Task offloading optimization in digital twin assisted mec-enabled air–ground iiot 6g networks,” IEEE Transactions on Vehicular Technology, vol. 73, p. 17 527–17 542, 2024. [79] Y. Gong, H. Yao, Z. Xiong, C. L. P. Chen, and D. Niyato, “Blockchain- aided digital twin offloading mechanism in space-air-ground networks,” IEEE Transactions on Mobile Computing, vol. 24, p. 183–197, 2025. [80] Z. Lin, Z. Feng, K. Guo, A. Nauman, D. T. Niyato, and J. Wang, “Ai-driven seamless and massive access in space-air-ground integrated networks,” IEEE Wireless Communications, vol. 32, p. 72–79, 2025. [81] Y. Lu, B. Wu, Z. Li, K. Li, C. Huang, H. Wang, Q. Lan, R. Chen, L. Chen, and B. Liang, “Remote sensing-oriented world model,” ArXiv, vol. abs/2509.17808, 2025. [Online]. Available: https://api.semanticscholar.org/CorpusID:281420476 [82] W. Xie, F. Qi, L. Liu, and Q. Liu, “Radar imaging based uav digital twin for wireless channel modeling in mobile networks,” IEEE Journal on Selected Areas in Communications, vol. 41, p. 3702–3710, 2023. [83] C. Wang, Y. Han, L. Zhang, Z. Jia, H. Zhang, C. S. Hong, and Z. Han, “Computing power in the sky: Digital twin-assisted collaborative computing with multi-uav networks,” IEEE Transactions on Vehicular Technology, vol. 74, p. 14 466–14 482, 2025. [84] C. Diehl, Q. Sykora, B. Agro, T. Gilles, S. Casas, and R. Urtasun, “Dio: Decomposable implicit 4d occupancy-flow world model,” 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), p. 27 456–27 466, 2025. 19 [85] S. Chen, J. Gu, W. Duan, M. Wen, G. Zhang, and P.-H. Ho, “Hy- brid near- and far-field communications for ris-uav system: Novel beamfocusing design,” IEEE Transactions on Intelligent Transportation Systems, vol. 26, p. 17 866–17 878, 2025. [86] C. Chaccour, W. Saad, M. Debbah, Z. Han, and H. V. Poor, “Less data, more knowledge: Building next-generation semantic communication networks,” IEEE Communications Surveys & Tutorials, vol. 27, no. 1, p. 37–76, Feb. 2025. [87] J. Zheng, B. Du, H. Du, J. Kang, D. Niyato, and H. Zhang, “Energy- efficient resource allocation in generative ai-aided secure semantic mobile networks,” IEEE Transactions on Mobile Computing, vol. 23, no. 12, p. 11 422–11 435, 2024. [88] C. Hu, R. Zhang, B. Li, X. Jiang, N. Zhao, M. D. Renzo, D. Niyato, A. Nallanathan, and G. K. Karagiannidis, “Generative ai-empowered secure communications in space–air–ground integrated networks: A survey and tutorial,” IEEE Communications Surveys & Tutorials, vol. 28, p. 4156–4194, 2025. [89] S. Zhang, Q. Liu, K. Chen, B. Di, H. Zhang, W. Yang, D. Niyato, Z. Han, and H. V. Poor, “Large models for aerial edges: An edge- cloud model evolution and communication paradigm,” IEEE Journal on Selected Areas in Communications, vol. 43, no. 1, p. 21–35, 2025. [90] L. Cai, J. Wang, R. Zhang, Y. Zhang, T. Jiang, D. Niyato, X. Wang, A. Jamalipour, and X. Shen, “Secure physical layer communications for low-altitude economy networking: A survey,” IEEE Communications Surveys & Tutorials, vol. 28, p. 2497–2530, 2025. [91] B. Zhao, R. Tang, M.-M. Jia, Z. Wang, F. Man, X. Zhang, Y. Shang, W. Zhang, W. Wu, C. Gao, X. Chen, and Y. Li, “Airscape: An aerial generative world model with motion controllability,” Proceedings of the 33rd ACM International Conference on Multimedia, 2025. [92] J. Wu, Y. Yang, W. Yuan, W. Liu, J. Wang, T. Mao, L. Zhou, Y. Cui, F. Liu, G. Sun, Y. Ma, N. Wu, D. Zheng, J. Xu, N. Ma, Z. Feng, W. Xu, D. Niyato, C. Yuen, X. Jing, Z. Shi, Y. Liang, B. Ai, S. Jin, D. I. Kim, J. Wang, P. Zhang, H. Yin, and J. Zhang, “Low-altitude wireless networks: A comprehensive survey,” 2025. [93] Y. Liu, C. Yu, L. Shang, Z. Wu, X. Wang, Y. Zhao, L. Zhu, C. Cheng, W. Chen, C. Xu, H. Xie, Y. Yao, W. Zhou, C. Yingda, X. Xie, and B. Sun, “Facechain: A playground for identity-preserving portrait generation,” arXiv preprint arXiv:2308.14256, 2023. [94] H. Xie, Y. Chen, X. Xing, J. Lin, and X. Xu, “PsyDT: Using LLMs to construct the digital twin of psychological counselor with personalized counseling style for psychological counseling,” in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Jul. 2025, p. 1081–1115. [95] Z. Huang, K. Yao, Z. Zhao, C. Pan, and A. Yang, “Dttdnet: Robust 6dof pose estimation against depth noise and a comprehensive evaluation on a mobile dataset,” in Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops, June 2025, p. 1848–1857. [96] P. Talasila, C. Gomes, L. B. Vosteen, H. Iven, M. Leucker, S. Gil, P. H. Mikkelsen, E. Kamburjan, and P. G. Larsen, “Composable digital twins on digital twin as a service platform,” SIMULATION, p. 00375497241298653, 2024. [97] Z. Jiang, C.-C. Hsu, and Y. Zhu, “Ditto: Building digital twins of artic- ulated objects from interaction,” in arXiv preprint arXiv:2202.08227, 2022. [98] H. Liu, W. Yan, M. Zaharia, and P. Abbeel, “World model on million- length video and language with ringattention,” arXiv preprint, 2024. [99] R. Team, Z. Gao, Q. Wang, Y. Zeng, J. Zhu, K. L. Cheng, Y. Li, H. Wang, Y. Xu, S. Ma, Y. Chen, J. Liu, Y. Cheng, Y. Yao, J. Zhu, Y. Meng, K. Zheng, Q. Bai, J. Chen, Z. Shen, Y. Yu, X. Zhu, Y. Shen, and H. Ouyang, “Advancing open-source world models,” arXiv preprint arXiv:2601.20540, 2026. [100] V. Micheli, E. Alonso, and F. Fleuret, “Transformers are sample- efficient world models,” in The Eleventh International Conference on Learning Representations, 2023. [101] GigaAI, “Gigabrain-0: A world model-powered vision-language-action model,” arXiv, 2025. [102] A. Li, J. Sun, B. Wang, L. Duan, S. Li, Y. Chen, and H. Li, “Lotteryfl: Empower edge intelligence with personalized and communication- efficient federated learning,” in 2021 IEEE/ACM Symposium on Edge Computing (SEC), 2021, p. 68–79. [103] Y. Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang, “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,” ACM SIGARCH Computer Architecture News, vol. 45, no. 1, p. 615–629, 2017. [104] Z. Wu, S. Sun, Y. Wang, M. Liu, K. Xu, W. Wang, X. Jiang, B. Gao, and J. Lu, “Fedcache: A knowledge cache-driven federated learning architecture for personalized edge intelligence,” IEEE Transactions on Mobile Computing, p. 1–15, 2024. [105] H. Cai, Z. Zhou, and Q. Huang, “Online resource allocation for edge intelligence with colocated model retraining and inference,” in IEEE INFOCOM 2024 - IEEE Conference on Computer Communications, 2024, p. 1900–1909. [106] H. Jin, D. Bai, D. Yao, Y. Dai, L. Gu, C. Yu, and L. Sun, “Personal- ized edge intelligence via federated self-knowledge distillation,” IEEE Transactions on Parallel and Distributed Systems, vol. 34, no. 2, p. 567–580, 2023. [107] Y. Liu, J. Deng, F. Wang, L. Shang, X. Xie, and B. Sun, “DamoFD: Digging into backbone design on face detection,” in The Eleventh International Conference on Learning Representations, 2023. [108] B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Masked-attention mask transformer for universal image segmenta- tion,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, p. 1280–1289. [109] J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel et al., “Mastering atari, go, chess and shogi by planning with a learned model,” Nature, vol. 588, no. 7839, p. 604–609, 2020. [110] Z. Zhang, R. Chen, J. Ye, Y. Sun, H. Ren, X. Du, P. Wang, J.-C. Pang, K. Li, T.-S. Liu, H. Lin, Y. Yu, and Z.-H. Zhou, “WHALE: To- wards generalizable and scalable world models for embodied decision- making,” in NeurIPS 2025 Workshop on Embodied World Models for Decision Making, 2025. [111] U. Utkarsh, P. Cai, A. Edelman, R. Gomez-Bombarelli, and C. V. Rackauckas, “Physics-constrained flow matching: Sampling generative models with hard constraints,” in The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. [112] J. Zheng, C. Shi, X. Cai, Q. Li, D. Zhang, C. Li, D. Yu, and Q. Ma, “Lifelong learning of large language model based agents: A roadmap,” IEEE Transactions on Pattern Analysis and Machine Intelligence, p. 1–20, 2026. [113] T. Saanum, P. Dayan, and E. Schulz, “Simplifying latent dynamics with softly state-invariant world models,” in Advances in Neural Information Processing Systems, A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, Eds., vol. 37. Curran Associates, Inc., 2024, p. 38 355–38 382. [114] J. Jeong, D. Park, and K.-J. Yoon, “Multi-agent long-term 3d human pose forecasting via interaction-aware trajectory conditioning,” in Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024, p. 1617–1628. [115] S. Motwani, M. Baranchuk, M. Strohmeier, V. Bolina, P. Torr, L. Ham- mond, and C. Schroeder de Witt, “Secret collusion among ai agents: Multi-agent deception via steganography,” Advances in Neural Infor- mation Processing Systems, vol. 37, p. 73 439–73 486, 2024. [116] Z. Aghababaeyan, M. Abdellatif, L. Briand, R. S, and M. Bagherzadeh, “Black-box testing of deep neural networks through test case diversity,” IEEE Transactions on Software Engineering, vol. 49, no. 5, p. 3182– 3204, 2023. [117] M. Sun, J. Zheng, H. Du, H. Zhang, D. Niyato, J. Kang, J. Wang, J. Ren, L. Gao, and Z. Wang, “Trust online over-the-air computation for wireless federated learning,” IEEE Transactions on Mobile Computing, vol. 24, no. 8, p. 7152–7170, 2025.