Introduction

Like many other areas of machine learning research, reinforcement learning (RL) is evolving at breakneck speed. Just as they have done in other research areas, researchers are leveraging deep learning to achieve state-of-the-art results.

In particular, reinforcement learning has significantly outperformed prior ML techniques in game playing, reaching human-level and even world-best performance on Atari, beating the human Go champion, and is showing promising results in more difficult games like Starcraft II.

In this lab, you will learn the basics of reinforcement learning by building a simple game, which has been modelled off of a sample provided by OpenAI Gym.

Objectives

In this lab, you will:

Once you're ready, scroll down and follow the steps below to get your lab environment set up.

Reinforcement learning 101

Reinforcement learning (RL) is a form of machine learning whereby an agent takes actions in an environment to maximize a given objective (a reward) over this sequence of steps. Unlike more traditional supervised learning techniques, every data point is not labelled and the agent only has access to "sparse" rewards.

While the history of RL can be dated back to the 1950s and there are a lot of RL algorithms out there, 2 easy to implement yet powerful deep RL algorithms have a lot of attractions recently: deep Q-network (DQN) and deep deterministic policy gradient (DDPG). We briefly introduce the algorithms and variants based on them in this section.

https://cdn.qwiklabs.com/cDBDy0wLYFlwkAnG0PrdbCg7UAEngRYH%2BORdWseL14A%3D

A conceptual process diagram of the Reinforcement Learning problem

The Deep Q-network (DQN) was introduced by Google Deepmind's group in this Nature paper in 2015. Encouraged by the success of deep learning in the field of image recognition, the authors incorporated deep neural networks into Q-Learning and tested their algorithm in the Atari Game Engine Simulator, in which the dimension of the observation space is very large.

The deep neural network acts as a function approximator that predicts the output Q-values, or the desirability of taking an action, given a certain input state. Accordingly, DQN is a value-based method: in the training algorithm DQN updates Q-values according to Bellman's equation, and to avoid the difficulty of fitting a moving target, it employs a second deep neural network that serves as an estimation of target values.