Nothing really exciting about AI agents

There is a lot of hope in Silicon Valley regarding AI bots.

The technology can, in essence, solve issues, carry out jobs, and become more intelligent as it absorbs information from its surroundings. The majority of workers aspire to have a virtual assistant, which is what agents are like. They already use them to make choices, gather information, compile reports, and schedule flights.

Errors and hallucinations are still frequent, but agents are far from flawless, and they worsen with increased usage.

Agents are being used by businesses to automate complex, multi-step processes. To do so, new tools have been developed. Regie AI employs “auto-pilot sales agents” to follow up with customers, generate customized emails, and locate leads automatically. Devin is an agent created by Cognition AI that performs intricate technical jobs. A platform called “agent OS” was introduced by PwC, a Big Four professional services business, to facilitate communication between agents in order to carry out tasks.

However, the likelihood that an agent’s error rate, or the proportion of wrong outputs, will affect the result increases with the number of steps it takes to finish a task. Patronus AI, a startup that assists businesses in assessing and optimizing AI technologies, claims that some agent processes can involve up to 100 stages or more.

Patronus AI calculated the risk and revenue loss resulting from AI agents’ errors. Its conclusions support a well-known fact: enormous power entails great responsibility.

Any mistake can cause the entire task to fail. According to the company’s blog post, “the more steps involved, the higher the chance something goes wrong by the end.” It developed a statistical model that showed that by the hundredth step, an agent with a 1% error rate each step can compound to a 63% risk of error.

Quintin Au, growth lead for ScaleAI, stated that error rates are significantly higher in the wild.

Currently, there is about a 20% possibility of error in every move an AI takes (this is how LLMs function, we can’t expect 100% accuracy), he noted in a LinkedIn article last year. An agent’s chances of correctly completing a task that requires five actions are only 32%.

At a recent event, DeepMind CEO Demis Hassabis advised comparing error rate to “compound interest,” according to Computer Weekly. The likelihood that it is right could be random by the time it completes the 5,000 steps required to complete a task in the real world.

According to Computer Weekly, Hassabis stated during the event that you don’t have perfect information in the real world. We need AI models that can comprehend the environment around us because there is hidden information that we are unaware of.

Businesses run a bigger risk of losing their end users due to AI agents’ increased failure rate.

The good news is that error rates can be reduced by using guardrails, which are tools, regulations, and filters that may be used to detect and eliminate erroneous material. In its post, Patronus AI stated that minor enhancements “can yield outsized reductions in error probability.”

Anand Kannappan, CEO of Patronus AI, told that guardrails can be as easy as extra checks to make sure agents don’t malfunction while they’re working. According to him, they have the ability to stop the agent from going farther or even request that they try again.

According to Douwe Kiela, a cofounder of Contextual AI and advisor to Patronus AI, this is why it’s crucial to measure performance thoroughly and comprehensively.

Source link