The giant has woken up from his nap!
In the drive to provide its users with attractive first-party AI models and tools, especially the millions of developers working on top of Amazon Web Services’ (AWS) cloud infrastructure, it appeared for a while that Amazon was lagging behind.
However, it introduced its own proprietary foundation model family, Amazon Nova, in late 2024, which has the ability to generate text, images, and even videos. Additionally, last month, a new Amazon Alexa voice assistant was released, which was partially powered by Anthropic’s Claude family of models.
Amazon Nova Act, an experimental developer kit for creating AI agents that can navigate the web and perform tasks on their own, was then released on Monday by Amazon AGI, the artificial general intelligence division of the massive cloud and e-commerce company. It is powered by a proprietary, customized version of Amazon’s Nova large language model (LLM). Additionally, the SDK is open source under a permissive Apache 2.0 license; nevertheless, it is only intended to be used with Amazon’s proprietary Nova model, not any other third-party models.
Enabling third-party developers to create AI agents that can consistently complete tasks within web browsers is the aim.
However, how does Amazon’s Nova Act compare to other agent creation platforms available on the market, including Salesforce’s Agentforce, Microsoft’s AutoGen, and, of course, OpenAI’s freshly published open source Agents SDK?
A more unique and considerate method of dealing with AI agents
Since large language models (LLMs) became widely known, the majority of “agent” systems have been restricted to using natural language responses or knowledge base queries to provide information.
Nova Act is a component of the broader industry trend toward action-based agents, or programs that can carry out real-world actions in digital environments on the user’s behalf. An excellent illustration of this is the new Responses API from OpenAI, which allows users to access its browser navigator on their own. Developers can incorporate this into AI agents using the OpenAI Agents SDK.
Amazon AGI highlights that although present agent systems show promise, they are not always reliable and frequently need human oversight, particularly when managing intricate or multi-step processes.
Because Nova Act offers a collection of atomic, prescriptive instructions that may be combined to create reliable workflows, it is especially made to overcome these constraints.
In a video unveiling Nova Act, Deniz Birlikci, a member of Amazon’s technical staff, outlined the overarching vision: in the near future, there will be more AI agents than human web browsers performing activities for customers.
In a recent video call interview, David Luan, the Head of AGI SF Lab and VP of Amazon’s Autonomy Team, stated the objective more clearly: They have developed a novel experimental artificial intelligence model that has been trained to execute actions within a web browser. According to him, they essentially believe that agents are the fundamental unit of computation.
In 2024, Luan, who was previously the CEO and co-founder of Adept AI, became an aqcui-hire at Amazon. According to Luan, he has supported AI agents for a long time. They were the first business to actually begin developing AI agents with Adept. Everyone is now aware of the significance of agents. “Being a little ahead of our time was pretty cool,” he continued.
Nova Act’s benefits to developers
With the help of the Nova Act SDK, developers may create web-based automation agents with a framework that breaks down natural language prompts into easily understood stages.
The goal of Nova Act is to gradually carry out smaller, verifiable activities, as opposed to standard LLM-powered agents that attempt complete workflows from a single cue, frequently producing inaccurate behavior.
Some of the key features of Nova Act include:
Fine-Grained Task Decomposition: By dividing intricate digital workflows into smaller act() functions, developers can instruct the agent to carry out particular user interface operations.
Direct Browser Manipulation through Playwright: Microsoft’s open-source browser automation technology Playwright is integrated with Nova Act. Without depending entirely on AI predictions, Playwright enables developers to programmatically manage web browsers, enabling them to click components, complete forms, and navigate pages. When it comes to sensitive chores like entering credit card numbers or passwords, this integration is especially helpful. Developers can, for instance, tell Nova Act to focus on a password field and then use Playwright APIs to safely enter the password without the model ever “seeing” it, as an alternative to sending sensitive data to the model. This method aids in enhancing privacy and security when automating online interactions.
Python Integration: The SDK enables programmers to incorporate Nova Act commands into Python programs, using common Python capabilities like assertions, breakpoints, and thread pooling for parallel execution.
Structured Information Extraction: Agents can transform screen content into structured representations by using the SDK’s support for structured data extraction via Pydantic schemas.
Parallelization and Scheduling: Without constant human supervision, developers can plan automated tasks and execute many Nova Act instances at once.
Luan stressed that Nova Act is not a general-purpose chatbot, but rather a tool for developers. Nova Act is designed with developers in mind. It’s not a chatbot you converse with for fun. According to him, it’s made to enable developers to begin creating practical products.
One of the sample workflows illustrated in Amazon’s documentation, for instance, demonstrates how Nova Act may automate apartment searches by calculating the distance to train stations by biking and scraping rental data, then organizing results in a structured table.
To demonstrate how developers may automate repetitive digital operations in a way that seems reliable and flexible, another example that is highlighted uses Nova Act to order a particular salad from Sweetgreen every Tuesday, completely hands-free and on a schedule.
Benchmark performance with an emphasis on reliability
Reliability, not simply intelligence, is the main obstacle to mass agent use, according to Amazon’s announcement.
According to Amazon, the state-of-the-art models now in use are relatively fragile when it comes to driving AI agents, with agents generally obtaining success rates of between 30% and 60% on browser-based multi-step tasks.
However, Nova Act prioritizes a building-block approach, achieving above 90% on internal assessments of tasks that test various models, like interacting with pop-ups, date pickers, and dropdown menus.
The importance of reliability focus was emphasized by Luan. The primary goal has been on how to make agents genuinely dependable. According to him, you’re unlikely to use it again if you ask it to update a record in Salesforce and it erases your database one out of ten times.
Amazon AGI compared Nova Act against rival models such as OpenAI’s CUA model and Anthropic’s Claude 3.7 Sonnet. In the ScreenSpot Web Text benchmark, which evaluates instruction-following on textual screen elements, Nova Act outperformed OpenAI CUA (0.883) and Claude 3.7 Sonnet (0.900) with a score of 0.939.
Nova Act outperformed the other models with a score of 0.879 on the ScreenSpot Web Icon benchmark, which concentrates on visual user interface elements.
However, Nova Act received a score of 0.805, little lower than its competitors on the GroundUI Web benchmark, which evaluates general UI interaction.
Amazon used similar prompts and evaluation criteria to measure these ratings internally.
Amazon also emphasized preliminary findings about Nova Act’s capacity to generalize outside of typical settings.
For example, teammate Rick Liu showed how the agent effectively engaged with a web game with a pigeon theme—assigning metrics, fighting opponents, and advancing in the game—without any explicit instruction.
That capacity for generalization is essential to the long-term goal, according to Luan. They want Nova Act to be a browser-based solution that works for everyone. He stated, “We want an agent that can do anything you want on a computer for you.”
Flexible for use in different clouds, but locked to Amazon’s Nova model
Luan explained that although Nova Act is available to developers worldwide via nova.amazon.com, the system is closely linked to Amazon’s proprietary Nova foundation models.
Unlike OpenAI’s Agents SDK and, to a lesser extent, Microsoft’s AutoGen and Salesforce’s Agentforce platforms (which permit switching to a selected number of supplier businesses and model families), developers are unable to plug in external LLMs like OpenAI’s GPT-4o or Anthropic’s Claude 3.7 Sonnet.
According to him, Nova Act is a specially trained variant of the Nova model. It’s more than just a general LLM scaffolding. It has been trained to operate on your behalf on the internet.
Nova Act is not limited to AWS environments, though. The SDK is available for developers to download and use locally, in the cloud, or anywhere else they like. According to Luan, you can utilize it without being on AWS.
Therefore, Nova Act is probably not the greatest option for companies seeking to give their agents the most underlying model flexibility. However, it’s definitely worth a look if you’re already in the Amazon or AWS developer environment and are searching for a model that was created expressly to explore the web and carry out tasks across a broad variety of websites with highly various user interfaces (UIs).
Pricing, licensing, and security
The open source Apache License, Version 2.0 (January 2004), governs the use of the Nova Act SDK. This is exclusive to the SDK software, though.
The Nova Act model is private and still closed-source, as are its weights and training data. Luan clarified that the model is closely linked and co-trained with the SDK to achieve reliability, indicating that the method is intentional.
When Nova Act first launches, it is available as a free research preview. Production use pricing has not yet been disclosed.
Developers can experiment and build with the technology during this time, according to Luan. According to them, most of the most beneficial agent items are still in the early stages of development. Whether for themselves or as a product, they want to make it possible for anyone to create a very helpful agent, he stated.
Amazon intends to launch production-grade terms in the future, such as usage-based billing and scaling guarantees, but these are not yet available.
What’s next for Nova Act?
Amazon’s larger goal to establish action-oriented AI agents as a fundamental part of computing is reflected in the launch of Nova Act.
“My personal dream is that agents become the building block of computing, and the coolest new startups and products get built on top of what our team is developing,” Luan said, encapsulating the potential that lies ahead.
The Nova Act SDK is currently accessible on Github and Amazon for testing and prototyping.







