Policy-Search Methods: Revision history

Jump to navigation Jump to search

Diff selection: Mark the radio buttons of the revisions to compare and hit enter or the button at the bottom.
Legend: (cur) = difference with latest revision, (prev) = difference with preceding revision, m = minor edit.

24 December 2023

  • curprev 03:4003:40, 24 December 2023SatoshiNakamoto talk contribs 1,154 bytes +1,154 Created page with "Title: Policy-Search Methods Research Question: How can we improve the policy of a reinforcement learning agent by planning ahead and taking into account the expected future reward? Methodology: The authors introduce a new method called "Gradient-based Reinforcement Planning" (GREP). This method improves the policy of an agent by calculating the gradient of the expected future reward with respect to the policy parameters. They derive the exact policy gradient and confi..."