DeepDPG-TensorFlow

TensorFlow Implementation of Deep Deterministic Policy Gradients

Intro

Replay buffers and target networks, as first proposed in ATARI playing paper, have made it possible to train deep value networks (DQN) over complicated environments. This is great, but DQN only works fine with discrete domains, since it relies on finding the action that maximizes the action-value function. Insisting on solving continuous valued cases, same authors came up with this model-free off-policy actor-critic algorithm, again by putting the DQN successes to good use. Here the exact algorithm is implemented using TensorFlow for continuous OpenAI Gym environments.

Overview

This code contains:

Deep Q-Networking and Policy Improvement
Easy Network Setting and Batch Normalization at Will
- changing your network architecture reduces to editing a list
Experience Replay Memory
- makes the algorithm off-policy
Target Networks for Both Action-Value and Policy Functions
- stabilizes the learning process
Ornstein—Uhlenbeck Action Noise for Exploration
It’s Modular

A Playground for Controlling OpenAI Gym

Can play with and tune network settings in config.py and control other environments.

TODOS

extend it to MuJoCo environments
saving and loading checkpoints (net weights)
make nice summaries

DeepDPG-TensorFlow

TensorFlow Implementation of Deep Deterministic Policy Gradients for Continuous OpenAI Gym Environments

DeepDPG-TensorFlow

Intro

Overview

A Playground for Controlling OpenAI Gym

TODOS

References