亚马逊云科技2025 re：invent

As Agentic AI transitions from a phase of “technological exploration” to one of “value realization,” enterprises must deeply understand and adapt to several pivotal shifts in strategic paradigms. In this profound wave of transformation, those who can most accurately grasp the essence of the technology and most foresightedly architect for the future will establish a decisive competitive advantage in the market.We have identified the following three critical trends that will shape the landscape ahead:

Paradigm Shift in AI Infrastructure (AI Infra):
The primary interaction partner of software is undergoing a fundamental transformation—shifting from being centered on human developers to being centered on AI Agents. This demands a redefinition of infrastructure-layer interfaces, performance metrics, and resource scheduling strategies to accommodate the new paradigm in which AI autonomously consumes computational resources.

The Tipping Point for Mass-Scale AI Agent Adoption:
We stand on the cusp of AI Agents’ transition from proof-of-concept experiments to full-scale commercial deployment. According to Gartner’s Hype Cycle, AI Agents are rapidly ascending the curve and are poised to become one of the most impactful technologies over the next 2–5 years. This evolution represents far more than an incremental tool upgrade—it is the core engine driving the reengineering of business processes and a fundamental shift in productivity.

Parallel Progress in the U.S. and China, China’s Edge in Scenario-Driven Innovation:
At the application layer, the U.S. and China are effectively starting from the same baseline. China, however, possesses a distinct advantage: its vast economic scale, rich industrial ecosystems, and highly dynamic digital environments create a globally unique “testing ground” for rapid experimentation, iteration, and large-scale deployment of AI Agents.

At its recent re:Invent conference, AWS clearly validated the aforementioned trends, particularly through its comprehensive embrace of agent-centric development. By thoroughly rearchitecting and upgrading its full-stack product portfolio around AI Agent development, AWS has provided users with an efficient pathway to migrate from traditional applications to generative AI architectures.

Against the backdrop of today’s rapidly flourishing open-source model ecosystem, heavy investment in foundational model training is no longer the optimal strategy. AWS has wisely pivoted toward a “platform and tools” approach, focusing on delivering users broad model choices and flexible integration capabilities. This reflects a core philosophy centered on user value first, lowering technical barriers and mitigating vendor lock-in risks to empower customers to make optimal selections within a diverse and evolving model ecosystem.

Full-Stack Agentic AI: Redefining Development and Optimizing Practice

“Full-stack Agentic AI” is fundamentally reshaping the paradigms of software development and application delivery. This demands that infrastructure and software providers continuously refine their approaches through real-world implementation, iterating toward greater efficiency, adaptability, and value creation.

User Base as the Moat:

In the competition for Agent ecosystems, whoever possesses a larger user base and richer application scenarios is more likely to leverage the data flywheel effect to iteratively develop more powerful and fully-featured Agent solutions.

Redefining the Boundary of Work:

A core challenge for infrastructure and software companies lies in clearly understanding and architecturally distinguishing the differing capabilities and operational boundaries between human developers and AI Agents. This requires building new infrastructure capable of understanding AI Agent behavior patterns.

A Dynamically Balanced Collaborative Architecture:

The future of development and operations will be defined by dynamic collaboration between human developers and AI Agents. A fundamental architectural imperative for all infrastructure and software providers is to enable elastic complementarity and real-time balance between human and agent capabilities throughout task execution.

We must acknowledge that geopolitical tensions have intensified global technological competition, leading to a degree of fragmentation in technical stacks in the short term. Yet, over the long time, the historical tide toward integration and shared progress in technology remains irreversible.

Short-term technological competition and debates over development pathways objectively help to separate the wheat from the chaff and enable human society to better manage the pace at which comprehensive AI arrives, giving us more time to reflect on and prepare for issues related to AI ethics, safety, and governance. After all, as a general-purpose technology, AI’s ultimate purpose is to serve the well-being of all humanity. We should strive to seek cooperation amid competition and build consensus despite differences, jointly guiding this powerful technology toward beneficial outcomes.

Stage One: Rebuilding the Infrastructure — Cost Reduction as the Prerequisite for Agent Adoption

As artificial intelligence transitions from “conversational assistants” to “digital employees,” intelligent agents (Agents) are becoming the core vehicle for enterprise intelligent transformation. However, a critical reality cannot be ignored: true Agents must remain online 24/7, continuously perceiving, reasoning, making decisions, and executing tasks. This demands stable, efficient, and low-cost computing infrastructure as foundational support. Without overcoming the bottleneck of high inference and operational costs, affordability—let alone widespread adoption—remains an empty promise. Industry consensus is now coalescing around a key insight: the large-scale deployment of Agents begins with a revolution in infrastructure, and the central theme of this revolution is “cost reduction.”

Custom Silicon’s Disruptive Edge: From Trainium3 to Graviton5, Redefining the Economics of AI Compute

At its 2025 re:Invent conference, Amazon AWS unveiled a “dual-chip synergy” strategy—its next-generation training chip Trainium3 and general-purpose server CPU Graviton5—together forming a high-performance, cost-efficient compute foundation tailored for the Agent era. As Amazon’s first chip built on a 3nm process, Trainium3 delivers significant leaps in computational power, energy efficiency, and memory bandwidth. Compared to traditional GPU-based solutions, AI model training and inference using Trainium3 can reduce costs by up to 50%.

Amazon’s new-generation custom server CPU, Graviton5, represents the most powerful and energy-efficient data center CPU in the company’s history. Each Graviton5 core accesses L3 cache capacity 2.6 times larger than that of Graviton4, while network and storage bandwidth have increased by 15% to 20%. Delivering up to 25% higher compute performance than its predecessor—all while maintaining industry-leading energy efficiency—Graviton5 enables customers to run applications faster, substantially lower compute costs, and advance sustainability goals.

Co-Optimization of Inference Engine and Models: Mantle + Nova, Building a Precision Inference Pipeline

Beyond hardware-driven cost savings, deep software-hardware co-optimization is equally crucial. AWS’s newly launched Mantle inference engine is purpose-built for large-model inference scenarios. Through kernel-level scheduling, memory pool reuse, and dynamic batch fusion, Mantle significantly reduces latency and resource fragmentation. At the service layer, the system allows customers to route requests through three distinct channels: Priority for real-time, low-latency workloads; Standard for stable and predictable performance; and Flex for background tasks prioritizing efficiency. Each customer has a dedicated queue, ensuring one customer’s traffic spikes do not impact others’ performance.

In generative AI inference, Amazon Web Services introduced the Amazon Nova 2 family of foundation models, designed to meet diverse application needs—from low-cost text-to-text responses to powerful multimodal capabilities—delivering exceptional performance and accuracy across the board. Coupled with the Mantle engine, the Amazon Nova model family further refines the trade-off between cost and performance. Nova2 Sonic targets high-throughput, low-latency real-time interaction scenarios (e.g., customer service Agents or transaction risk control), leveraging structured sparsity and quantization-aware training to achieve 2.1× faster inference speeds while retaining over 95% of the original model’s capability. Meanwhile, Nova2 Lite focuses on edge and lightweight deployments, applying knowledge distillation and module pruning to shrink model size to 1/8 of the original, enabling hundreds of micro-Agents to run concurrently at ultra-low cost on Graviton5 instances.

Hardware-Enforced Security: Confidential Computing Grants Agents a “Trusted Identity”

Beyond cost, trust is another critical barrier to enterprise Agent adoption. As Agents begin handling customer data, financial information, and even core business workflows, data security and privacy compliance become non-negotiable requirements. AWS addresses this through Confidential Computing—a technology that establishes a “zero-trust” execution environment at the hardware level. Complementing this, the all-new sixth-generation Nitro system and Nitro Isolation Engine further enhance Graviton5’s security posture. The Nitro system offloads virtualization, storage, and networking tasks onto dedicated hardware. Graviton5 introduces the Nitro Isolation Engine, which strengthens the Nitro system through formal verification to ensure strict workload isolation and security. Built on a minimal, formally verified codebase backed by mathematical proofs guaranteeing its behavior strictly adheres to specification, this new Nitro Isolation Engine sets a new benchmark for mathematically verifiable cloud security.

Stage 2: Construction and Development — From “Writing Code” to “Orchestrating Agents”

A profound transformation is currently underway in the paradigm of artificial intelligence development. While traditional software development relies on writing deterministic sequences of instructions, the core of agent construction has shifted to orchestrating autonomous cognitive systems. This change requires developers to transition from being process coders to integrators of agent capabilities and components. The suite of tools and services launched by AWS at re:Invent 2025 is designed to systematically support this new paradigm, lowering barriers across the entire journey from proof-of-concept to large-scale deployment.

Core Development Frameworks for the New Paradigm

To implement the model-driven development philosophy, AWS open-sourced the Strands Agents SDK. This framework departs from the traditional approach of predefining rigid workflows. Its core principle is to trust and leverage the inherent planning, reasoning, and tool-calling capabilities of large language models. Developers can quickly build functional agents by concisely defining task objectives and available tools, , achieving “configure to build.”

Lowering Barriers and Ecosystem Expansion:

The SDK fully supports Python and TypeScript and comes with over 20 out-of-the-box tools. Any function can be transformed into an agent-callable tool via a simple decorator. Its deep integration with the Model Context Protocol (MCP) further allows agents to securely access a vast array of third-party tools and services, significantly expanding their capability boundaries.

Native Support for Complex Collaboration:

The framework natively supports multiple paradigms for multi-agent collaboration. Depending on task requirements, developers can choose the Workflow mode for strictly sequential execution, the Graph mode based on a directed acyclic graph structure, or the Swarm mode that allows for autonomous collaboration among multiple agents, enabling the construction of systems suited to varying levels of complexity.

Modular Core Capabilities for Production

Deploying agents in enterprise-grade production environments requires systematically addressing engineering challenges such as memory, secure integration, and trustworthy execution. The Amazon Bedrock AgentCore service adopts a modular design, offering a series of enterprise-grade building blocks that can be used independently or in combination, providing a solid foundation for the scalable application of agents.
Memory, From Fact Storage to Episodic Learning:

AgentCore Memory provides hierarchical memory management. In addition to short-term memory for maintaining conversational context and long-term memory for storing user preferences, the newly launched Episodic Memory feature represents a key breakthrough. It enables agents to perform pattern extraction and experiential learning from continuous historical interactions, rather than merely retrieving static facts. For instance, an agent can autonomously infer a behavioral pattern such as “the user requires more lead time when traveling with family” and proactively apply it in future similar scenarios, thereby achieving continuously optimized personalized service.
Identity & Gateway, Ensuring Secure and Controlled System Access:

Enabling agents to securely access internal and external systems as trusted entities is key to deployment. AgentCore Identity provides each agent with an independent and auditable digital identity, managing its fine-grained access permissions. The closely integrated AgentCore Gateway serves as a unified integration layer, responsible for encapsulating and managing internal enterprise APIs (e.g., databases, CRM systems) and third-party services (e.g., Slack, Jira), while securely handling complex authentication flows. This combined mechanism ensures the compliant and secure operation of agents within the enterprise environment.
Runtime & Code Interpreter, Providing an Isolated and Reliable Execution Environment:

AgentCore Runtime is a serverless environment optimized for long-running, stateful agent tasks, supporting continuous execution for several hours and compatible with mainstream development frameworks. AgentCore Code Interpreter provides a secure sandbox environment for agents to execute code or computational logic, ensuring that any computational task is completed within an isolated space, thereby guaranteeing the stability and security of the underlying system.
Observability, Ensuring Full-Process Transparency and Management:

AgentCore Observability provides comprehensive monitoring and insight capabilities. It enables operations personnel to clearly trace an agent’s internal decision-making processes, tool invocation chains, and performance metrics. This is crucial for problem diagnosis, performance evaluation, and necessary intervention within complex business flows, serving as the technical foundation for the responsible and manageable operation of agents.

Phase 3: Domain-Specific "Digital Employees" — Vertical Scenario Use Cases

AI Agents are evolving from "general-purpose chatbots" into "digital employees" with deep domain expertise. They no longer just answer questions; they understand specific business logic, execute complex workflows, and work alongside humans as integral team members.

The following is an analysis of case studies across three core vertical scenarios:

1. Software Development: From "Assisted Coding" to "Autonomous Delivery" (Kiro)

In software development, agents have leaped from simple "code completion" to spec-driven autonomous development partners.

Essential Logic: Developers describe goals in natural language, Kiro generates detailed technical specifications (Specs), and then agents autonomously write code and execute tests based on those specs.

Stunning Efficiency Gains: Amazon internally assessed that a large-scale refactoring project would require 30 developers working for 18 months; by fully adopting Kiro agents for autonomous development, the project was ultimately delivered by just 6 people in only 76 days.

Autonomous Backlog Resolution: As "Frontier Agents," Kiro Autonomous Agents can work independently for days, handling tasks such as bug triaging, library upgrades across 15 microservice repositories, and improving code coverage—tedious and time-consuming tasks that allow engineers to remain in a high-efficiency "flow state."

2. Security & Operations: Automated "Gatekeepers" and "Firefighters" (Security & DevOps Agent)

Security and operations agents achieve an automated closed loop of "coding-security-operations" through deep linkage with the development process.

AWS Security Agent (Security Consultant): This has changed the low-frequency model of conducting only a few penetration tests per year. It can intervene at the upstream stage, proactively reviewing design documents and scanning for vulnerabilities when code is committed to GitHub. It transforms expensive, time-consuming penetration testing into an on-demand automated process, providing instant remediation suggestions.

AWS DevOps Agent (On-call Ops Team): Acting as an experienced operations engineer, it responds instantaneously to system alerts before human intervention. By correlating observability data from sources like CloudWatch or Dynatrace, it traces root causes (e.g., discovering an error caused by a simple IAM policy change) and directly submits fix recommendations for human approval.

3. Customer Service: Personalized "Star Agents" (Amazon Connect)

Amazon Connect is reshaping contact centers into the operational backbone of enterprises through agent technology.

Autonomous Execution and Voice Evolution: After integrating the Nova Sonic voice model, agents can communicate with customers using a highly natural voice with tonal variations, autonomously handling complex refund or inquiry tasks.

Real-time Mentor for Human Agents: Agents can analyze conversation context and customer sentiment in real-time, recommending the next best action to human employees and automatically completing document preparation.

Implementation Results:

Lyft: Established "Intent Agents" for over 1 million drivers to directly handle issues such as earnings disputes using backend data, reducing average resolution time by 87%.

Credit Card Fraud Investigation: Agents can automatically analyze thousands of cross-regional transaction patterns in minutes (which would originally take days), identifying credit card fraud risks and proactively providing customers with safer travel account suggestions.

These cases demonstrate that by deploying AI agents into specific professional domains, enterprises are moving from the "testing phase" into an industrial application phase that yields "real return on investment (ROI)".

Phase 4: Governance & Evaluation — Making Agents Trustworthy and Controllable

As agents shift from simple information retrieval to autonomous execution with higher privileges, corporate focus is moving from traditional "content safety" (preventing harmful speech) to "behavioral governance". When agents can conduct transactions or access sensitive data on behalf of a company, enterprises need a new management philosophy that treats AI as employees with autonomous capabilities.

If early AI governance was like a "content filter" responsible only for blocking profanity, current AgentCore governance is like establishing an "auditing system" and "financial red lines" for a senior manager. You don't need to teach them how to write every email, but you must ensure they don't have the authority to sign a million-dollar check without an audit.

The following is the core governance framework for achieving trustworthy and controllable agents:

1. Board of Directors Governance

Authorization and Boundaries: Managers should not micromanage every step of an agent through hard coding, as this would stifle its creativity and adaptability.

Management Mode: Much like a CEO leading a team or a parent dealing with a child who has started driving, humans should set strategic goals, decision boundaries, and risk thresholds, and establish inspection mechanisms (such as "surround cameras" or application trace analysis) to ensure everything stays on track.

2. Behavioral Policy: Setting Deterministic Red Lines

To address the non-deterministic issues when agents generate code or execute actions, AWS launched Bedrock AgentCore Policy.

Defining Rules with Natural Language: Managers can use natural language to set specific "red lines." For example: "If a refund amount exceeds $1,000, it must be intercepted and require human approval."

Neuro-symbolic AI Integration: The system automatically translates natural language into Cedar, a deterministic authorization policy language.

Real-time Interception: This policy enforcement resides outside the agent's application code, acting as an interception layer between the agent and corporate tools/data, verifying the compliance of every action within milliseconds to ensure it never "exceeds its authority."

3. Automated "Performance Reviews"

Traditional pre-release testing cannot fully cover an agent's non-deterministic behavior in real-world environments. AgentCore Evaluations provides an automated mechanism for continuous quality inspection.

Comprehensive Assessment Dimensions: The service provides 13 pre-built evaluators covering key metrics such as Correctness, Helpfulness, Harmfulness, and Brand Alignment.

Closed-loop Monitoring: Evaluation results are directly integrated with CloudWatch, allowing developers to monitor agent performance in production environments in real-time.

Rapid Feedback: When an agent's model is upgraded or its process modified, automatic testing can be conducted through thousands of simulated scenarios to ensure no degradation in quality—much like an "annual performance review" to ensure digital employees remain competent.

The reshaping of corporate AI management — evolving from "monitoring conversations" to "governing behavior." By setting immediate red lines via Policies and conducting long-term quality audits via Evaluations, enterprises can finally break free from "POC Jail" and confidently deploy agents into their core business.