HomeArtificial IntelligenceArtificial Intelligence DIYDynamic Programming For Frozen Lake Environment

Dynamic Programming For Frozen Lake Environment

Reinforcement learning is built on the mathematical foundations of the Markov decision process (MDP). It’s critical to compute an optimal policy in reinforcement learning, and dynamic programming primarily works as a collection of the algorithms for constructing an optimal policy. Unlike the classical algorithms that always assume a perfect model of the environment, dynamic programming comes with greater efficiency in computation.  In a finite-state reinforcement learning environment, we can represent the state, action, and reward sets as StAc(St), and R, for st ∈St, where the states are finite. The probability of the environmental dynamics provided by the set of the probabilities p(St, r|St, Ac), for all the elements st ∈St, ac ∈Ac(st), r ∈ ℝ, and st𝜄 ∈ StSt+ can be represented as a terminal state of multiple iterations in episodes. The dynamic programming in a reinforcement learning landscape is applicable for both continuous and discrete state spaces. Dynamic programming explores the good policies by computing the value policies by deriving the optimal policy that meets the following Bellman’s optimality equations.

Dynamic Programming For Frozen Lake Environment

We need to compute the state-value function GP with an arbitrary policy 𝞹 for performing a policy evaluation for the predictions.

Dynamic Programming For Frozen Lake Environment 2

𝞹(ac|st) 🡪 Probability in the environment for taking action ac for state st with policy 𝞹

The computation of value-state function GP is for the exploration of the best policy, the policy improvement Kimprove defined as:

Dynamic Programming For Frozen Lake Environment 3

We can apply policy improvement by expanding Kimprove𝞹 iteratively till there is an improvement.

Dynamic Programming For Frozen Lake Environment 4

The dynamic programming works better on grid world-like environments. The objective of the agent in the gridworld is to control the movement of the characters. Some of the tiles in the gridworld are walkable by the characters, while other tiles may lead the characters/agents to fall inside the water of the frozen lake. The ultimate objective of the agent is to find the goal tile by finding the most optimal walkable path. Every time the agent finds the walkable path to the goal, the agent is awarded.

The following are the key components to watch out for in the gridworld.

S 🡪 Starting position (Safe)

F 🡪 Frozen surface (Safe for some time)

H 🡪 Hole (Death)

G 🡪 Goal (Safe and ultimate goal).

The agent can perform the following actions in the frozen lake environment

  1. Left – 0
  2. Down – 1
  3. Right – 2
  4. Up – 3

Dynamic Programming For Frozen Lake Environment 5

We will implement dynamic programming with PyTorch in the reinforcement learning environment for the frozen lake, as it’s best suitable for gridworld-like environments by implementing value-functions such as policy evaluation, policy improvement, policy iteration, and value iteration.

Import the gym library, which is created by OpenAI, an open-source ecosystem leveraged for performing reinforcement learning experiments. In the following step, we register the parameters for Frozen Lake and make the Frozen lake game environment, and we print the observation space of the environment.

Dynamic Programming For Frozen Lake Environment 6

Assign the observation space to a variable and print to see the number of state spaces available in the environment.

Dynamic Programming For Frozen Lake Environment 7

We will sample the grids from 0 to 15 from the observation space for a range of g. The total grids that are possible in the environment are from 0 to 15.

Dynamic Programming For Frozen Lake Environment 8

Then, we print the action space for the agent to find the walkable path in the shortest amount of time with optimal policy.

Dynamic Programming For Frozen Lake Environment 9

We can find the possible actions by the agent in the Frozen lake environment from the action space by sampling the actions for a range of 15.

Dynamic Programming For Frozen Lake Environment 10

We then render the environment to explore the current state of the environment

Dynamic Programming For Frozen Lake Environment 11

We can navigate in the frozen lake environment of the gridworld by going left by executing the action as zero. This will not result in a penalty, as there’s nothing on the left side. We should be able to navigate down with action as one, and going to the right should not cause a problem either with action two, as the agent will still be standing on the surface of the frozen lake that does not cause any problem, and we can go to the right twice by executing the action as two, that should be safe as well as the agent steps on the frozen lake’s surface. However, going down thrice, the agent encounters death as the agent will fall through the hole into the frozen lake. The agent is not likely to survive, recover, and swim back from the hole; it’s a high risk unless the agent has exceptionally overcome near-death experiences. For the sake of the frozen lake game, we can consider, the agent will die.

Dynamic Programming For Frozen Lake Environment 12

Dynamic Programming For Frozen Lake Environment 13

To navigate successfully inside the gridworld of the frozen lake environment, the agent has to navigate to the right twice, and down thrice, and go right once to reach the goal.

Dynamic Programming For Frozen Lake Environment 14

This article has been published from the source link without modifications to the text. Only the headline has been changed.

Source link

 

Most Popular