沙利文发布《2025年中国世界模型发展白皮书》

World models are entering a critical transition period towards complex intelligent behavior generation, becoming a key infrastructure for promoting the integration of physical AI with virtual worlds, and helping China to take a leading position in global AI competition. Currently, world models in the field of autonomous driving are moving from research and development testing towards mass production empowerment. By generating massive amounts of high-fidelity scenarios, they drive continuous learning, autonomous verification, and rapid iterative optimization of autonomous driving systems, facilitating the implementation of L3/L4 systems and significantly reducing the cost and time of in-vehicle testing. In the field of embodied intelligence, world models serve as synthetic data engines, breaking through the bottleneck of scarce physical interaction data, providing efficient and safe virtual training environments for robots, and accelerating their adaptation to real-world tasks. Both applications highlight the core value of world models in promoting the closed-loop evolution of AI from perception to action through simulation and generation.

This white paper focuses on the cutting-edge artificial intelligence technology of 'World Models', analyzing their current development status, technical paths, market patterns, and future trends. World models are generative AI models that understand the dynamics of the real world (including its physical and spatial attributes). They use input data such as text, images, videos, and motion to generate videos. By learning, they can understand the physical characteristics of real-world environments, thereby representing and predicting dynamics such as motion, stress, and spatial relationships in sensory data, accelerating virtual world generation in physical AI, generating scalable augmented data, thereby eliminating data bottlenecks, and achieving more efficient basic model training. The research purpose of this white paper is to comprehensively sort out the development history, current status, core technologies of world models, and their applications in intelligent driving and embodied intelligence. By comparing and analyzing the capabilities of different manufacturers, it explores the future development trends of world models.

PART.01

The World Model 2025: The AI Leap from Perceiving Reality to Deciding the Future

World Models, as a generative AI model, are centered around understanding the dynamic laws of the real world (including physical characteristics and spatial attributes) by constructing internal representations, and generating video content with multimodal inputs (text, images, videos, motion data, etc.). They achieve an understanding of the physical attributes of real environments and simulate, guide, and implement decisions by generating environments and actions. Li Feifei, founder of World Labs and professor at Stanford University, pointed out, "World models should not only perceive and model the real world but also possess the ability to foresee possible future states, thereby providing guidance for decision-making." However, in terms of development and current status, world models are still in their early stages, mostly focusing on simulation and compression at the perceptual level, and have not yet truly achieved a stable closed-loop integration of "perception-prediction-decision-making." Although pilot applications have been made in the field of autonomous driving, they rely heavily on specific environments and strong priors, lacking generality and long-term generalization capabilities. Future development directions will concentrate on three aspects: first, enhancing understanding of world states through multimodal inputs; second, introducing causal modeling and controllable generation mechanisms to improve prediction accuracy and behavioral planning capabilities; third, deeply integrating world models with embodied intelligence systems to achieve a leap from "observing the world" to "understanding and participating in the world."