NIE 2023 | Rong Liang of SenseTime: An Aspect That Cannot Be Ignored in the Development of Large Models — Evaluation and Comparison

NIE 2023 | Rong Liang of SenseTime: An Aspect That Cannot Be Ignored in the Development of Large Models — Evaluation and Comparison

Published: 2023/11/17

NIE 2023丨商汤科技刘亮:大模型发展不可忽视的一面——评测与比较
On September 27th, the second Frost & Sullivan New Investment Expo and the 17th Frost & Sullivan Global Growth, Innovation and Leadership Summit (referred to as the 'Frost & Sullivan New Investment Conference') Digital Economy Sub-forum was grandly held at the Shangri-La Hotel in Pudong, Shanghai.

 

The forum, themed 'Digital Economy and Industrial Integration: Industry-wide Transformation Accelerated by AI', invited 18 heavyweight guests and industry experts. It brought together industry leaders, experts, and investment institutions to focus on new investment opportunities in digital economy financing and investment, and to jointly discuss the capital and industrial forces for enterprises to navigate through cycles.

 

Liu Liang, Director of Business Strategy Research at Shangtang Technology

 

At this forum, Liu Liang, Director of Strategic Research at SenseTime Technology, introduced the significant impact of large language models on enterprises. He emphasized that these models are gradually changing the way businesses operate daily, driving a transformation in their original AI strategies and forming a business logic centered around artificial intelligence. This transformation is accelerating the digital transformation of enterprises, improving operational efficiency and innovation capabilities, and bringing great potential and opportunities to businesses.

 

 

 

 

I. Changes and Impacts Brought about by Large Models

Liu Liang stated that the development of large models has been almost 'crazy' and filled with 'noise'. Currently, there are over 100 large model vendors, more than 200 basic large models, over 400 fine-tuned large models, and over 1,000 generative AI applications. Moreover, the market is still at its peak of hype and has begun to affect the daily operations of enterprises.

 

After the release of ChatGPT, 45% of enterprises stated that they have increased investment in large model-related technologies, and one-fifth of them have deployed large model-related technology applications. At the same time, large models are changing the original AI strategies of enterprises, gradually forming an AI-centered business logic. The upgraded AI strategy will use generative AI to enhance human work capabilities, enabling the generation of text, audio and video, code, etc. In terms of governance, enterprises need to clarify their business responsibilities and establish a unified governance organization. In terms of talent, enterprises need to educate everyone on the responsible use of generative AI.

 

II. Large Model Evaluation

Liu Liang stated that the evaluation pattern for large models is a crucial aspect of the development of the large model market ecosystem.From the perspective of evaluation patterns,It is divided into academic evaluation datasets and market-oriented lists. The academic evaluation datasets include dozens of task evaluation sets such as MMLU, OpenBookQA, HumanEval, GSM8K, RACE, etc. The market-oriented lists include media, communities/think tanks, analyst institutions, etc.

 

From the perspective of evaluation,The combination of large model capabilities and vendor capabilities aims for comprehensiveness and continuous evolution.

 

Liu Liang pointed out,Looking at the institutions shaping the evaluation landscape,Including the evaluation of large model systems and open platforms, it provides a complete open-source and reproducible evaluation framework, supporting one-stop evaluations of various models such as large language models and multimodal models. The comprehensive evaluation benchmark SuperCLUE for general Chinese large models focuses on four capability quadrants of large models, including language understanding and generation, professional skills and knowledge, Agent intelligence, and security. It is further refined into 12 basic capabilities for evaluation. The global consulting firm Frost & Sullivan, through research, found that the Chinese market was one of the earliest to observe the large model market and conducted evaluations among large model manufacturers. It not only examines the capabilities of large models themselves but also comprehensively incorporates the comprehensive competitiveness of manufacturers.

 

III. Large Model Evaluation Cases

According to Liu Liang, the Shangdong Basic Large Model InternLM-123B ranks second globally in academic evaluations. Compared with GPT-4, GPT-3.5-turbo, and LLaMA-2-70B, the content generated by the Shangdong Basic Large Model InternLM-123B is more accurate and reliable. It has the ability to autonomously reflect and correct errors, as well as upgraded code interpreter and plugin calling capabilities. Additionally, it can be flexibly used to build AI intelligent agent applications. Currently, the open-source Shangdong Basic Large Model InternLM-20B has been made available for free commercial use licenses to enterprises and developers.

 

Meanwhile, the Frost & Sullivan Large Model Assessment Report shows that with its forward-looking AI large-scale device construction layout and the leading release of new large models on a daily basis, SenseTime maintains a leading position in three dimensions. It has received the highest scores in product technology and ecosystem openness, ranking first in comprehensive competitiveness.

 
联系我们
联系我们
电话

业务咨询热线

(021)54075836

微信
二维码

扫码关注官方微信公众号

返回顶部
返回顶部

联系我们

×
请选择职位类别
请选择
×