NIE 2024 | 商汤科技刘亮：大模型发展的下一个大事件：多模态、新交互，以及端侧模型

On August 30th, the AI Reconstructing the Digital Economy Sub-forum of the 18th Frost & Sullivan China Growth, Innovation and Leadership Summit and the 3rd New Investment Conference (hereinafter referred to as '2024 Frost & Sullivan New Investment Conference'), hosted by the world-leading growth consulting firm Frost & Sullivan (Frost & Sullivan, abbreviated as 'Frost & Sullivan'), was held in Shanghai.

At the sub-forum, Liu Liang, Director of Strategic Research at SenseTime Technology, delivered a keynote speech, delving into the current development status and key trends of large language models in China. The speech pointed out that in recent years, as the digital transformation process of enterprises has accelerated, large language models in China have gradually been adopted by enterprises. Among them, 53% of model adopters have moved from exploration and pilot stages to production deployment, and embedding large language models into current software is a commonly used deployment method by enterprises.

The following are the key points of Liu Liang's speech:

Liu Liang stated that generative AI technology is diverse, including large models, engineering tools, applications and use cases, enabling technologies, and infrastructure. It is estimated that by 2027, 40% of large model applications and solutions will be multimodal (text, image, audio, and video), up from 1% in 2023. Multimodal large models can be trained using different types of data (also known as modalities) such as images, videos, audio, and text simultaneously, enabling the creation of shared data representations to improve performance on different tasks.

Liu Liang then introduced the driving factors behind the development of multimodal large models. Firstly, multimodality enhances the technical capabilities of large models, providing a multi-modal experience through the combination of single-modal models; at the same time, multimodal large models offer a new user interaction interface, improving the user experience; moreover, as the data used for model training becomes more diverse, future scalability and generalization will be higher. Liu Liang stated that multimodal large models will gradually become the mainstream of the market, with SenseTime Technologies recently releasing the country's first large model with streaming native multimodal interaction capabilities, 'Renaissance SenseNova 5.5'. In the future, there will be more generative AI applications based on multimodal large models emerging.

Finally, Liu Liang introduced the product layout of SenseTime Technologies, including computing power infrastructure, large model development platforms, large model systems, and applications. In terms of computing power, SenseTime's large-scale device has achieved a total computing power scale exceeding 20,000 PetaFLOPS, and is actively deploying a computing power layout that integrates 'cloud, edge, and endpoint' collaboration. SenseTime's daily updates to the 5.5 large model system are comprehensive, including a basic model of 600 billion, and it has pioneered the launch of China's first 'what-you see-is-what-you-get' model 'Risen 5o'. The endpoint models have been fully upgraded, with the release of 'Risen 5.5 Lite'. In terms of applications, SenseTime's daily updates to the large models have been implemented in multiple industries, such as the instant photo app SenseMirage, the code and office scenarios of Little Bear, the human-like dialogue large model, and the financial scenario large model.

NIE 2024 | Business Insider Liu Liang: The Next Big Event in Large Model Development: Multimodal, New Interactions, and End-Device Models

Contact Us