Minimal RL Implementations

2020, Apr 24    

OpenAI has done a fantastic job with the developement of Spinning Up, an educational resource for understanding Deep Reinforcement Learning (Deep RL). Its simple code design, the well-documented, standalone, and reasonably good implementations of key algorithms, and its curated list of key papers in Deep RL have significantly lowered the barrier to entry for new researchers. Additionally, the guide on spinning up as a Deep RL researcher gives a very nice overview of the common pitfalls.

While awesome RL repos such as rllib, Baselines, Dopamine, and rlpyt are great for running state-of-the-art RL algorithms in large-scales, their often complicated frameworks make understaing the core concepts of Deep RL algorithms almost impossible. This is where the true benefit of simple implementations of Spinning Up come into play.

RLBase: A Spinning Up Fork

With the PyTorch update of Spinning Up in January 2020, OpenAI has officially announced that no major updates are currently planned out. The original repo implements the following 6 core Deep RL algorithms:

  1. Vanilla Policy Gradient (VPG)
  2. Trust Region Policy Optimization (TRPO)
  3. Proximal Policy Optimization (PPO)
  4. Deep Deterministic Policy Gradient (DDPG)
  5. Twin Delayed DDPG (TD3)
  6. Soft Actor-Critic (SAC)

You can read more about the reasoning behind this selection here, but briefly, these are the core methods that cover the progression of ideas in the recent history of policy-learning algorithms. As the result, some of core Deep RL methods from other families such as Value Function methods, Distributional RL, Heirarchical RL, and Model-based RL have been left behind.

With that said, I have recently started working on my fork of Spinning Up, RLBase, with the goal of adding some of the missing algorithms. The implementations are all in PyTorch and follow the Spinning Up code format. Note that this is a work in progress and more algorithms will be added in the following weeks. Additionally, I will try to write a brief description similar to that of Spinning Up in subsequent blog posts! The following algorithms have been already implemented:

  1. REINFORCE
  2. Deep Q Networks (DQN)
  3. Hindsight Experience Replay (HER)

Similar to any code base, there may be bugs or issues with my implementations. I greatly appreciate feedbacks and pull requests.

Happy Reinforcement Learning!