Frost & Sullivan insight
OpenAI, once a champion of closed-source development, has now turned its attention to open-source projects. On April 1st, Sam Altman, CEO of OpenAI, announced on the social media platform X that the company will release a "powerful new open-weight model" within the coming months. After years of adhering to a closed-source approach, OpenAI is re-examining the value of open-source. What are the motivations behind big companies choosing open-source models? Is it to build an ecosystem, attract developers, or to foster technological iteration through community feedback? Will multi-modal model open-source lead to the emergence of DeepSeek within the multi-modal space? What are the particularities of multi-modal models compared to text generation? How should we evaluate the current open-source ecosystem for multi-modal models? What are the main challenges that need to be overcome in current multi-modal large models? How can open-source promote the refinement and maturity of technology? Which industries might benefit from the open-source of multi-modal models?
Tran Cui, Executive Director of Frost & Sullivan Greater China, was interviewed by 21st Century Business Herald to discuss the construction of the ecosystem and industry opportunities under the wave of multi-modal model open-source.

21st Century Business Herald
*Click at the end of the articleRead the original articlefor a complete report
Q:What are the motivations behind big companies choosing open-source models? Is it to build an ecosystem, attract developers, or to foster technological iteration through community feedback?
Tran Cui
Executive Director of Frost & Sullivan Greater China
Firstly, big companies' open-source of multi-modal models will build a three-dimensional ecosystem that includes developer communities, hardware adaptation solutions, and industry application cases. Such an ecosystem construction will lower the entry threshold for developers, enabling them to quickly get started and innovate. With the development of the ecosystem, technology will advance rapidly and explore the application possibilities of multi-modal models in more scenarios, laying a solid foundation for future commercialization.
In addition, open-source has a deeper meaning, which is to promote the popularization and democratization of technology. Big companies have lowered the threshold for users to enter the multi-modal model field through open-source, allowing more small and medium-sized enterprises and developers to use advanced technology at a lower cost. This inclusiveness of technology not only accelerates the penetration rate of AI technology but also injects vitality into the rapid development of the entire industry.
At the same time, open-source also reflects the social responsibility of big companies. By open-sourcing, big companies are actually paving the way for future technical standardization. This open attitude not only makes technology more transparent but also helps with the healthy development of the industry.
Q:Will multi-modal model open-source lead to the emergence of DeepSeek within the multi-modal space?
Tran Cui
Executive Director of Frost & Sullivan Greater China
Under the trend of multi-modal model open-source, enterprises like DeepSeek are expected to stand out. These enterprises, with their advantages in technology, cost-effectiveness, open-source strategy, and ecosystem construction, will emerge in the market and drive progress in the entire field of multi-modal large models.
Q:What are the particularities of multi-modal models compared to text generation? How should we evaluate the current open-source ecosystem for multi-modal models?
Tran Cui
Executive Director of Frost & Sullivan Greater China
Firstly, compared to text generation models, multi-modal models need to handle various types of data such as text, images, audio, and video, while text generation models mainly handle text data. This characteristic of diverse data processing makes the application scenarios of multi-modal large models more extensive, such as environmental perception and decision-making in intelligent driving, and diagnostic assistance in medical imaging. In addition, multi-modal models are more complex in technical implementation and need to solve problems such as alignment and fusion between modalities. For example, how to effectively combine visual information in images with semantic information in text to achieve accurate understanding and generation. Therefore, multi-modal models usually require larger computing resources and more complex training strategies. In summary, compared to text generation models, multi-modal large models have characteristics such as diverse data processing, extensive application scenarios, and technical implementation complexity.
Currently, the contribution of open-source communities to multi-modal models is increasing day by day. Developers and researchers share code, datasets, and research results through community platforms, accelerating the iteration and optimization of technology. This positive development has continuously led to breakthroughs in the performance of multi-modal large models, gradually narrowing the gap with closed-source models, and even surpassing them in some aspects. At the same time, their application scenarios are also constantly expanding, covering multiple fields such as image generation, video editing, and voice interaction.
Q:What are the main challenges that need to be overcome in current multi-modal large models? How can open-source promote the refinement and maturity of technology?
Tran Cui
Executive Director of Frost & Sullivan Greater China
Currently, multi-modal large models face challenges in data processing and alignment. Due to the heterogeneity of multi-modal data, there are differences in format, dimensionality, and statistical properties among different modalities, making data fusion and processing complex. In addition, data from different modalities may have complex associations at the time, space, and semantic levels, thus triggering challenges in modal alignment and fusion during training, as well as problems such as "catastrophic forgetting" of language ability. Therefore, training multi-modal large models not only requires a large amount of computing resources but also has long training times and low efficiency.
However, open-source initiatives have lowered the usage threshold for multi-modal large models, enabling more developers and enterprises to access and apply advanced technology. This change has not only promoted the spread of technology but also enriched data resources and improved data quality as an opportunity. These improvements help train better models and further promote the refinement of technology.
Q:Which industries might benefit from the open-source of multi-modal models?
Tran Cui
Executive Director of Frost & Sullivan Greater China
The open-source of models will greatly promote the development of multiple industries. By providing high-quality capabilities such as video generation, voice interaction, and image generation, these models will bring innovation and efficiency improvements to multiple fields such as film and television entertainment, intelligent vehicles, education, autonomous driving, finance, logistics, healthcare, entertainment creativity, office tools, social entertainment, and legal consulting.
*This interview has been published in 21st Century Business Herald. The reporter is Dong Jingyi, and the original title is: "Tearing off the Closed-Source Label! OpenAI 'Testing the Waters' with Open-Source, a Counterattack or Compromise?"


