DeepDPG-TensorFlow
TensorFlow Implementation of Deep Deterministic Policy Gradients
Intro
Replay buffers and target networks, as first proposed in ATARI playing paper, have made it possible to train deep value networks (DQN) over complicated environments. This is great, but DQN only works fine with discrete domains, since it relies on finding the action that maximizes the action-value function. Insisting on solving continuous valued cases, same authors came up with this model-free off-policy actor-critic algorithm, again by putting the DQN successes to good use. Here the exact algorithm is implemented using TensorFlow for continuous OpenAI Gym environments.
Overview
This code contains:
- Deep Q-Networking and Policy Improvement
- Easy Network Setting and Batch Normalization at Will
    - changing your network architecture reduces to editing a list
 
- Experience Replay Memory
    - makes the algorithm off-policy
 
- Target Networks for Both Action-Value and Policy Functions
    - stabilizes the learning process
 
- Ornstein—Uhlenbeck Action Noise for Exploration
- It’s Modular
A Playground for Controlling OpenAI Gym
Can play with and tune network settings in config.py and control other environments.
TODOS
- extend it to MuJoCo environments
- saving and loading checkpoints (net weights)
- make nice summaries