From 3fce6b57c93ae505fd7990bad63c57cee4f9a6c1 Mon Sep 17 00:00:00 2001 From: Alex Date: Fri, 1 Dec 2017 15:38:01 +0900 Subject: [PATCH 1/3] Updated Readme. Changed Lambda to Gamma --- ...alue Function Approximation Solution.ipynb | 40 ++++++------------- ...ng with Value Function Approximation.ipynb | 30 +++++--------- FA/README.md | 2 + 3 files changed, 25 insertions(+), 47 deletions(-) diff --git a/FA/Q-Learning with Value Function Approximation Solution.ipynb b/FA/Q-Learning with Value Function Approximation Solution.ipynb index a271d6a63..49c62ca37 100644 --- a/FA/Q-Learning with Value Function Approximation Solution.ipynb +++ b/FA/Q-Learning with Value Function Approximation Solution.ipynb @@ -3,9 +3,7 @@ { "cell_type": "code", "execution_count": 1, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", @@ -31,9 +29,7 @@ { "cell_type": "code", "execution_count": 2, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [ { "name": "stderr", @@ -50,9 +46,7 @@ { "cell_type": "code", "execution_count": 3, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [ { "data": { @@ -74,7 +68,7 @@ "scaler = sklearn.preprocessing.StandardScaler()\n", "scaler.fit(observation_examples)\n", "\n", - "# Used to converte a state to a featurizes represenation.\n", + "# Used to convert a state to a featurizes represenation.\n", "# We use RBF kernels with different variances to cover different parts of the space\n", "featurizer = sklearn.pipeline.FeatureUnion([\n", " (\"rbf1\", RBFSampler(gamma=5.0, n_components=100)),\n", @@ -88,9 +82,7 @@ { "cell_type": "code", "execution_count": 4, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "class Estimator():\n", @@ -151,9 +143,7 @@ { "cell_type": "code", "execution_count": 5, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "def make_epsilon_greedy_policy(estimator, epsilon, nA):\n", @@ -182,9 +172,7 @@ { "cell_type": "code", "execution_count": 14, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "def q_learning(env, estimator, num_episodes, discount_factor=1.0, epsilon=0.1, epsilon_decay=1.0):\n", @@ -196,7 +184,7 @@ " env: OpenAI environment.\n", " estimator: Action-Value function estimator\n", " num_episodes: Number of episodes to run for.\n", - " discount_factor: Lambda time discount factor.\n", + " discount_factor: Gamma discount factor.\n", " epsilon: Chance the sample a random action. Float betwen 0 and 1.\n", " epsilon_decay: Each episode, epsilon is decayed by this factor\n", " \n", @@ -283,9 +271,7 @@ { "cell_type": "code", "execution_count": 16, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [ { "name": "stdout", @@ -305,9 +291,7 @@ { "cell_type": "code", "execution_count": 17, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [ { "data": { @@ -384,9 +368,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.5.1" + "version": "3.5.2" } }, "nbformat": 4, - "nbformat_minor": 0 + "nbformat_minor": 1 } diff --git a/FA/Q-Learning with Value Function Approximation.ipynb b/FA/Q-Learning with Value Function Approximation.ipynb index e83b6bbb0..442605562 100644 --- a/FA/Q-Learning with Value Function Approximation.ipynb +++ b/FA/Q-Learning with Value Function Approximation.ipynb @@ -4,7 +4,7 @@ "cell_type": "code", "execution_count": 1, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ @@ -31,9 +31,7 @@ { "cell_type": "code", "execution_count": 2, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [ { "name": "stderr", @@ -50,9 +48,7 @@ { "cell_type": "code", "execution_count": 3, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [ { "data": { @@ -89,7 +85,7 @@ "cell_type": "code", "execution_count": 4, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ @@ -149,7 +145,7 @@ "cell_type": "code", "execution_count": 5, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ @@ -180,7 +176,7 @@ "cell_type": "code", "execution_count": 18, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ @@ -193,7 +189,7 @@ " env: OpenAI environment.\n", " estimator: Action-Value function estimator\n", " num_episodes: Number of episodes to run for.\n", - " discount_factor: Lambda time discount factor.\n", + " discount_factor: Gamma discount factor.\n", " epsilon: Chance the sample a random action. Float betwen 0 and 1.\n", " epsilon_decay: Each episode, epsilon is decayed by this factor\n", " \n", @@ -237,9 +233,7 @@ { "cell_type": "code", "execution_count": 20, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [ { "name": "stdout", @@ -259,9 +253,7 @@ { "cell_type": "code", "execution_count": 21, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [ { "data": { @@ -326,9 +318,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.4.3" + "version": "3.5.2" } }, "nbformat": 4, - "nbformat_minor": 0 + "nbformat_minor": 1 } diff --git a/FA/README.md b/FA/README.md index fb6dd111a..579498c85 100644 --- a/FA/README.md +++ b/FA/README.md @@ -35,6 +35,8 @@ ### Exercises +- Get familiar with the [Mountain Car Playground](MountainCar%20Playground.ipynb) + - Solve Mountain Car Problem using Q-Learning with Linear Function Approximation - [Exercise](Q-Learning%20with%20Value%20Function%20Approximation.ipynb) - [Solution](Q-Learning%20with%20Value%20Function%20Approximation%20Solution.ipynb) From 152dbc414cfd70d67aff46241c3fc69887256c8b Mon Sep 17 00:00:00 2001 From: Alex Date: Wed, 6 Dec 2017 17:15:13 +0900 Subject: [PATCH 2/3] Updated link to Sutton's book --- DP/README.md | 2 +- FA/README.md | 4 ++-- Introduction/README.md | 2 +- MC/README.md | 2 +- MDP/README.md | 2 +- PolicyGradient/README.md | 2 +- README.md | 2 +- TD/README.md | 6 +++--- 8 files changed, 11 insertions(+), 11 deletions(-) diff --git a/DP/README.md b/DP/README.md index 7a7d9389a..1c2bb768b 100644 --- a/DP/README.md +++ b/DP/README.md @@ -28,7 +28,7 @@ **Optional:** -- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 4: Dynamic Programming +- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 4: Dynamic Programming ### Exercises diff --git a/FA/README.md b/FA/README.md index 579498c85..f50f56cef 100644 --- a/FA/README.md +++ b/FA/README.md @@ -25,8 +25,8 @@ **Required:** - David Silver's RL Course Lecture 6 - Value Function Approximation ([video](https://www.youtube.com/watch?v=UoPei5o4fps), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/FA.pdf)) -- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 9: On-policy Prediction with Approximation -- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 10: On-policy Control with Approximation +- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 9: On-policy Prediction with Approximation +- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 10: On-policy Control with Approximation **Optional:** diff --git a/Introduction/README.md b/Introduction/README.md index f476fabb9..9e5b383ac 100644 --- a/Introduction/README.md +++ b/Introduction/README.md @@ -17,7 +17,7 @@ **Required:** -- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 1: The Reinforcement Learning Problem +- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 1: The Reinforcement Learning Problem - David Silver's RL Course Lecture 1 - Introduction to Reinforcement Learning ([video](https://www.youtube.com/watch?v=2pWv7GOvuf0), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/intro_RL.pdf)) - [OpenAI Gym Tutorial](https://gym.openai.com/docs) diff --git a/MC/README.md b/MC/README.md index 2c1a512d7..9d23968c2 100644 --- a/MC/README.md +++ b/MC/README.md @@ -26,7 +26,7 @@ **Required:** -- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 5: Monte Carlo Methods +- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 5: Monte Carlo Methods **Optional:** diff --git a/MDP/README.md b/MDP/README.md index 404cb141b..539799a09 100644 --- a/MDP/README.md +++ b/MDP/README.md @@ -25,7 +25,7 @@ **Required:** -- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 3: Finite Markov Decision Processes +- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 3: Finite Markov Decision Processes - David Silver's RL Course Lecture 2 - Markov Decision Processes ([video](https://www.youtube.com/watch?v=lfHX2hHRMVQ), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MDP.pdf)) diff --git a/PolicyGradient/README.md b/PolicyGradient/README.md index 3094fb332..1e7a1c68d 100644 --- a/PolicyGradient/README.md +++ b/PolicyGradient/README.md @@ -36,7 +36,7 @@ **Optional:** -- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 11: Policy Gradient Methods (Under Construction) +- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 11: Policy Gradient Methods (Under Construction) - [Deterministic Policy Gradient Algorithms](http://jmlr.org/proceedings/papers/v32/silver14.pdf) - [Deterministic Policy Gradient Algorithms (Talk)](http://techtalks.tv/talks/deterministic-policy-gradient-algorithms/61098/) - [Continuous control with deep reinforcement learning](https://arxiv.org/abs/1509.02971) diff --git a/README.md b/README.md index ad2abe1d3..60974e0dd 100644 --- a/README.md +++ b/README.md @@ -50,7 +50,7 @@ All code is written in Python 3 and uses RL environments from [OpenAI Gym](https Textbooks: -- [Reinforcement Learning: An Introduction (2nd Edition)](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) +- [Reinforcement Learning: An Introduction (2nd Edition)](http://incompleteideas.net/book/bookdraft2017nov5.pdf) Classes: diff --git a/TD/README.md b/TD/README.md index ac2488167..b54bfead8 100644 --- a/TD/README.md +++ b/TD/README.md @@ -28,14 +28,14 @@ **Required:** -- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 6: Temporal-Difference Learning +- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 6: Temporal-Difference Learning - David Silver's RL Course Lecture 4 - Model-Free Prediction ([video](https://www.youtube.com/watch?v=PnHCvfgC_ZA), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MC-TD.pdf)) - David Silver's RL Course Lecture 5 - Model-Free Control ([video](https://www.youtube.com/watch?v=0g4j2k_Ggc4), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/control.pdf)) **Optional:** -- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 7: Multi-Step Bootstrapping -- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 12: Eligibility Traces +- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 7: Multi-Step Bootstrapping +- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 12: Eligibility Traces ### Exercises From 9ee6cdd8494ff529df270d6d07658abbec0d62aa Mon Sep 17 00:00:00 2001 From: Alex Date: Wed, 6 Dec 2017 17:16:45 +0900 Subject: [PATCH 3/3] Updated link to Sutton's book --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 60974e0dd..43a7be82a 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ This repository provides code, exercises and solutions for popular Reinforcement Learning algorithms. These are meant to serve as a learning tool to complement the theoretical materials from -- [Reinforcement Learning: An Introduction (2nd Edition)](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) +- [Reinforcement Learning: An Introduction (2nd Edition)](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - [David Silver's Reinforcement Learning Course](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html) Each folder in corresponds to one or more chapters of the above textbook and/or course. In addition to exercises and solution, each folder also contains a list of learning goals, a brief concept summary, and links to the relevant readings.