Home Artificial Intelligence News Using AI for gym safety

Using AI for gym safety

Elon Musk-founded OpenAI has opened the doors of its “Safety Gym” designed to enhance the training of reinforcement learning agents.

OpenAI describes Safety Gym as “a suite of environments and tools for measuring progress towards reinforcement learning agents that respect safety constraints while training.”

Basically, Safety Gym is the software equivalent of your spotter making sure you’re not going to injure yourself. And just like a good spotter, it will check your form.

“We also provide a standardised method of comparing algorithms and how well they avoid costly mistakes while learning,” says OpenAI.

“If deep reinforcement learning is applied to the real world, whether in robotics or internet-based tasks, it will be important to have algorithms that are safe even while learning—like a self-driving car that can learn to avoid accidents without actually having to experience them.”

Reinforcement learning is based on trial and error, with AIs training to get the best possible reward in the most efficient way. The problem is, this can lead to dangerous behaviour which could prove problematic.

Taking the self-driving car example, you wouldn’t want an AI deciding to go around the roundabout the wrong way just because it’s the quickest way to the final exit.

OpenAI is promoting the use of “constrained reinforcement learning” as a possible solution. By implementing cost functions, agents consider trade-offs which still achieve defined outcomes.

In a blog post, OpenAI explains the advantages of using constrained reinforcement learning with the example of a self-driving car:

“Suppose the car earns some amount of money for every trip it completes, and has to pay a fine for every collision. In normal RL, you would pick the collision fine at the beginning of training and keep it fixed forever. The problem here is that if the pay-per-trip is high enough, the agent may not care whether it gets in lots of collisions (as long as it can still complete its trips). In fact, it may even be advantageous to drive recklessly and risk those collisions in order to get the pay. We have seen this before when training unconstrained RL agents.

By contrast, in constrained RL you would pick the acceptable collision rate at the beginning of training, and adjust the collision fine until the agent is meeting that requirement. If the car is getting in too many fender-benders, you raise the fine until that behaviour is no longer incentivised.”

Safety Gym environments require AI agents — three are included: Point, Car, and Doggo — to navigate cluttered environments to achieve a goal, button, or push task. There are two levels of difficulty for each task. Every time an agent performs an unsafe action, a red warning light flashes around the agent and it will incur a cost.

Going forward, OpenAI has identified three areas of interest to improve algorithms for constrained reinforcement learning:

  1. Improving performance on the current Safety Gym environments.
  2. Using Safety Gym tools to investigate safe transfer learning and distributional shift problems.
  3. Combining constrained RL with implicit specifications (like human preferences) for rewards and costs.

OpenAI hopes that Safety Gym can make it easier for AI developers to collaborate on safety across the industry via work on open, shared systems.

Source link

Must Read

Building a Continuous Integration pipeline

What is continuous integration? In the event that you haven’t used continuous integration systems in the past, let’s do a quick run through of what...

IOHK Joins Hyperledger

Leading blockchain research and development company behind Cardano, IOHK, has joined the Hyperledger consortium. Hyperledger is an open-source community focused on developing a suite of...

Transforming the pension system using blockchain

 When teachers retire, they expect accurate pension payouts. That’s also the goal of plan administrators, who have an obligation to ensure pension system integrity.Still,...

Business utilities of Machine Learning & Predictive Analytics

What’s the first thing that comes to mind when you hear “artificial intelligence” (AI)? While I-Robot was a great film, it doesn’t count. Many don’t realize how...

Google Meet gets AI based noise cancellation for video calls

Google has added a new noise cancellation feature on Google Meet that uses Artificial Intelligence (AI) to cancel out the noise in the background...

Highlighting AI Bias

On Monday, IBM made a monumental announcement: the company is getting out of the facial recognition business, citing racial justice concerns and the need...

Understanding Federal IT

http://www.podcastone.com/downloadsecurity?url=aHR0cHM6Ly9wZHN0LmZtL2UvY2h0YmwuY29tL3RyYWNrL0UyRzg5NS9hdy5ub3hzb2x1dGlvbnMuY29tL2xhdW5jaHBvZC9hZHN3aXp6LzE3MDYvMDYwOWZlZGVyYWx0ZWNodGFsa19wb2RjYXN0X21scDJfYWQyNzk4OWMubXAzP2F3Q29sbGVjdGlvbklkPTE3MDYmYXdFcGlzb2RlSWQ9N2UwNDEzYWItZmEyZi00YTdjLWJlMWItZmQwZmFkMjc5ODljKip8MTU5MjM4Nzc5NTM2OCoqfA==.mp3This week on Federal Tech Talk, host John Gilroy interviews Chase Cunningham, principal analyst serving security and risk professionals at Forrester Research. Cunningham has four patents,...

Artificial Brains Need Sleep Too

 States that resemble sleep-like cycles in simulated neural networks quell the instability that comes with uninterrupted self-learning in artificial analogs of brains.No one can...

Differenciating Bitcoin and Electronic Money

Bitcoin has the largest market share among virtual currencies, and is already being used on a daily basis overseas. Since it is a virtual...

Answering the Woes of Staking Centralization

What if better behavior on blockchains could be encouraged with fun rather than value?Josh Lee and Tony Yun of Chainapsis built a staking demo at the Cross-Chain...
banner image