Policy-Search Methods
Title: Policy-Search Methods
Research Question: How can we improve the policy of a reinforcement learning agent by planning ahead and taking into account the expected future reward?
Methodology: The authors introduce a new method called "Gradient-based Reinforcement Planning" (GREP). This method improves the policy of an agent by calculating the gradient of the expected future reward with respect to the policy parameters. They derive the exact policy gradient and confirm their ideas with numerical experiments.
Results: They provide formulas for the exact policy gradient and demonstrate how GREP can be used to improve an agent's policy before it interacts with the environment.
Implications: GREP is a novel approach to reinforcement learning that combines gradient-based learning with explicit planning. It may be particularly useful in large state spaces and in POMDP settings, where other methods may be less effective.
Link to Article: https://arxiv.org/abs/0111060v1 Authors: arXiv ID: 0111060v1