Currently, the development speed and rhythm vary greatly among regions globally in the cloud computing market, and the development paths at home and abroad are gradually widening the gap. Based on regional differences in cloud usage, the relationship between computing power as a resource management and application also differs. Therefore, understanding and optimizing these factors can narrow the global cloud computing market gap.
When we consider computing power as a new type of resource, there are three key regulatory factors that continuously affect the development of the industry chain:
1. Resource creation and ownership
2. Resource redistribution
3. How to better use resources for secondary creation
# Continuous deployment of resource infrastructure
Among these three core points, resource creation and ownership should be considered together with resource distribution:
AWS has announced plans to add more than three additional AZs in each region across 32 geographical areas, which means that although the deployment speed of resource infrastructure has indeed slowed down compared to previous years, AWS continues to deploy more infrastructure. The main purpose of this increase in infrastructure is to reduce physical isolation, latency, and risk of emergencies in key areas, while improving capacity and availability. In this regard, it is basically consistent with the concepts currently proposed domestically such as edge cloud and 'one city, one pool'. Additionally, domestic operator clouds have existing advantages based on CDN and network capabilities in China.
# Improving the mode of resource usage
More optimization improvements revolve around how to better use resources:
1. Optimization of storage layers
The release of S3 intelligent-tiering has optimized storage costs. Compared to other major product iterations, storage as the underlying core has not seen many new transformative breakthroughs. In fact, reducing storage costs is also the optimal direction for product evolution.
Another significant aspect is S3 express one zone. In actual scenarios, the operational speed and difficulty of analysis tasks brought about by workloads actually involve multiple storage infrastructures and APIs. It is said that the release of this product has achieved the fastest object storage in the cloud and allows for proximity selection of AZs. However, some domestic cloud vendors have already demonstrated such capabilities in actual customer application scenarios, and some non-cloud vendors' innovative companies are also focusing on high-performance object storage.
2. Computing power optimization
Since 2018, AWS has started developing its own server processing chips, and now it has progressed to the Graviton4. Data shows that the number of cores has increased by 50%, and memory bandwidth has increased by 75%, which is 30% faster than Graviton3. Moreover, for database and Java applications, workload performance is even better. However, for databases, the ARM architecture chips do not show the expected higher cost-effectiveness compared to traditional x86 chips. It is expected that the Graviton4 will bring more satisfactory performance.
Compared with previous years, this year there was also a special mention of energy conservation and emission reduction, aiming to improve energy use efficiency. Looking at domestic developments, Huawei Cloud's Kirin series and Alibaba Cloud's Yitian series are also continuously investing in research and development. The battle for computing power has not stopped. Currently, compared with large-scale infrastructure deployment, improving marginal benefits and computing power quality are also part of the future price premium for computing resources themselves.
3. New upgrade of AI
On the eve of Reinvent, a series of changes at OpenAI have reignited the topic of AI, and developers around the world have placed more expectations on AWS's actions. AWS seems to have its own determination in AI, which is still highly consistent with its original strategic path. It provides developers with the most useful and convenient tools, helps customers reinvent better, and innovates for the three-tier architecture respectively:
# 3.1 Chip layer
At the infrastructure end, the most important issue is also the chip. Amazon has successively deployed P3 instances equipped with NVIDIA V100 GPUs, P4 instances equipped with NVIDIA A100 Tensor Core GPUs, and P5 equipped with NVIDIA H100 GPUs. These can be interconnected in AWS EC2 through Elastic Fabric Adapter (EFA) with a network connection of 3200 Gbit per second, making it possible to scale up to 20,000 GPUs in a single cluster. In terms of this capability, domestic cloud vendors are indeed greatly restricted. According to AWS's plan, network and virtualization functions will be further integrated with chips, which will greatly improve computing efficiency. Huang Renxun also mentioned that AWS will become the first cloud service provider to introduce the latest NVIDIA GH200 Grace-Hopper superchip into the cloud through new multi-node NVLink. This cooperation undoubtedly further magnifies the difficulties faced by domestic cloud vendors in development. It is not difficult to see that many customers with high demand for high-performance computing in the future will inevitably face a situation where only overseas cloud vendors can offer options.
However, from the current domestic development perspective, innovative scenarios for high-performance computing power have not yet opened up. The progress speed of the domestic market in terms of how AI can innovate and create commercial value is still somewhat lagging behind globally. The main reason for this lag is the lack of accumulation of solid basic tool products. The most advantageous point of AWS is that it can use EC2 computing blocks for ML and deploy them in UltraClusters, interconnected through the EFA network, allowing for the expansion of hundreds of GPUs in a single cluster. It is planned to deploy ML workloads with confidence and truly pay on demand. At the same time, Trainium and Inferentia have also been further optimized. Previously, it was mainly Inferentia2, which has four times the throughput and ten times lower latency compared to Inferentia1. The release of Trainium2 this time is four times faster than the first-generation chip, and it is expected that the first batch of instances based on Trainium2 will go online next year.
# 3.2 Intermediate layer
Next, moving on to the intermediate layer, we need to mention the key product bedrock released this year. The emergence of bedrock mainly solves model problems. This is similar to the model as a service proposed by Alibaba before. We do not expect a single all-powerful model, but rather believe that there will be more models of different sizes available for users to choose from in the future. From my personal perspective, diversified models will become mainstream for a long time in the future because existing scenario problems and data are mostly block-based and independent. At present, there is no conclusion on whether there is the most powerful model, and the iterative update speed of model capabilities is also very fast. More users also hope to have easier access to powerful and diverse models and be able to quickly build applications using them at any time.
Of course, bedrock also has an interesting ability as a starting point, and there is still great room for development in the future with Agent, which can not only answer questions and interpret information but also take actions to actually complete tasks. Of course, it currently also relies on powerful third-party interfaces to achieve this. Through the agent, GenAI applications can execute multi-step tasks between company systems and data sources. Currently, if an enterprise can fully deploy generative AI capabilities, in the future it needs to communicate with various internal and external institutions, which has high requirements for the stability of the overall system and also requires the underlying platform to have stronger compatibility capabilities.
Here, one concept proposed by AWS that needs to be emphasized is called responsible AI. Privacy and security capabilities were emphasized at this conference, mainly that models will not be trained using customer data. This point is very clear. Customer data is mainly trained in containers through private replicas, which has a strong attraction for users whether they are willing to apply AI services on public clouds. On the one hand, it is considered from the perspective of data assets, and on the other hand, it can also ensure the stability of AI applications in certain scenarios. Regarding security, a product called Guardrails was specifically released to ensure the security of AI applications.
The most attractive application-level product this time is Amazon Q:
The release of the Amazon Q product can be said to bring a good entry point for many who are currently facing difficulties in realizing the practical value of AI. Using AI to achieve organizational management, with a powerful enterprise-level Q set, to help solve problems such as internal information flow and organizational collaboration within the enterprise. Combined with the previously mentioned Agent function, in work scenarios, it can achieve a personal assistant, truly creating a new enterprise-level application scenario. By understanding this ability, it is more likely that the first to be implemented domestically is Flying Book.
# 4. Data tool reserves
Centering on AI as a major theme, the data issue will also be highlighted with two core products:
Aurora limitless is a distributed database belonging to NewSQL. Structurally, it is similar to Google Spanner. The biggest challenge for this type of database lies in how to achieve high-performance distributed transactions. In terms of compatibility, PostgreSQL was also first launched this time instead of MySQL because the PostgreSQL Server code is easier to peel off the routing layer to achieve optimal link matching. So far, AWS's relational database has achieved four technological breakthroughs, and the form of Aurora limitless is basically close to the complete form. The next core issue to be solved is concentrated on how to handle workflows. No similar architecture has been seen in domestic vendor database products so far, but it is expected that next year's versions of various companies will refer to and emerge with similar capabilities.
Another product related to data is zero-ETL.
ETL was mainly an important data processing link in BI scenarios before. How to uniformly dump data into a data warehouse and mine value from data means choosing suitable tools to process data within exponential growth data, so that the processing speed must be greater than the data collection speed. Zero-ETL perfectly solves the problem of cross-hive, enabling data migration and transformation. This product was released at last year's conference, and last year it also achieved integration between Aurora and Redshift, meaning that users can jump out of the import link of S3 after entering Aurora and directly create machine learning applications in real-time, saving a lot of time in the data processing link. Through zero-ETL, transaction processing and data analysis can be integrated, realizing the connection of data between different services. This year, AWS will integrate more products, such as: integration between Amazon Aurora PostgreSQL, Amazon DynamoDB, Amazon RDS for MySQL and Amazon Redshift databases, as well as Zero-ETL integration between Amazon DynamoDB and Amazon OpenSearch services, and Zero-ETL integration between Amazon S3 and Amazon OpenSearch services. This is undoubtedly a big step forward in realizing the all-in-one data strategy.

