Path: blob/master/examples/rl/md/actor_critic_cartpole.md
3508 views
Actor Critic Method
Author: Apoorv Nandan
Date created: 2020/05/13
Last modified: 2024/02/22
Description: Implement Actor Critic Method in CartPole environment.
Introduction
This script shows an implementation of Actor Critic method on CartPole-V0 environment.
Actor Critic Method
As an agent takes actions and moves through an environment, it learns to map the observed state of the environment to two possible outputs:
Recommended action: A probability value for each action in the action space. The part of the agent responsible for this output is called the actor.
Estimated rewards in the future: Sum of all rewards it expects to receive in the future. The part of the agent responsible for this output is the critic.
Agent and Critic learn to perform their tasks, such that the recommended actions from the actor maximize the rewards.
CartPole-V0
A pole is attached to a cart placed on a frictionless track. The agent has to apply force to move the cart. It is rewarded for every time step the pole remains upright. The agent, therefore, must learn to keep the pole from falling over.
References
Setup
Implement Actor Critic network
This network learns two functions:
Actor: This takes as input the state of our environment and returns a probability value for each action in its action space.
Critic: This takes as input the state of our environment and returns an estimate of total rewards in the future.
In our implementation, they share the initial layer.