From 14a38a7c0a9c98bea48985ea82a1b1f5ee7f0921 Mon Sep 17 00:00:00 2001 From: Harsh Nilesh Pathak Date: Fri, 9 Oct 2020 17:11:29 -0700 Subject: [PATCH 1/9] update introduction 1. URL for slides 2. RL vs Supervised Learning --- Introduction/README.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/Introduction/README.md b/Introduction/README.md index ca8897826..f40fa6e11 100644 --- a/Introduction/README.md +++ b/Introduction/README.md @@ -8,7 +8,8 @@ ### Summary - Reinforcement Learning (RL) is concerned with goal-directed learning and decision-making. -- In RL an agent learns from experiences it gains by interacting with the environment. In Supervised Learning we cannot affect the environment. +- In RL an agent learns from experiences it gains by interacting with the environment. In Supervised Learning we cannot affect the environment. +- Moreover, RL provides evaluative feedback whereas, supervised learning provides instructive feedback. - In RL rewards are often delayed in time and the agent tries to maximize a long-term goal. For example, one may need to make seemingly suboptimal moves to reach a winning position in a game. - An agent interacts with the environment via states, actions and rewards. @@ -18,12 +19,12 @@ **Required:** - [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/RLbook2018.pdf) - Chapter 1: The Reinforcement Learning Problem -- David Silver's RL Course Lecture 1 - Introduction to Reinforcement Learning ([video](https://www.youtube.com/watch?v=2pWv7GOvuf0), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/intro_RL.pdf)) +- David Silver's RL Course Lecture 1 - Introduction to Reinforcement Learning ([video](https://www.youtube.com/watch?v=2pWv7GOvuf0), [slides](https://www.davidsilver.uk/wp-content/uploads/2020/03/intro_RL.pdf)) - [OpenAI Gym Tutorial](https://gym.openai.com/docs) -**Optional:** -N/A +**Optional:** +- [RL vs Supervised Learning Blog](https://www.quora.com/What-is-the-difference-between-supervised-learning-and-reinforcement-learning) ### Exercises From e001b421b40c323a7fe70fc9efeaf9f23b3e66d4 Mon Sep 17 00:00:00 2001 From: Harsh Nilesh Pathak Date: Fri, 9 Oct 2020 17:16:04 -0700 Subject: [PATCH 2/9] update MDP 1. Update Slides URL --- MDP/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MDP/README.md b/MDP/README.md index 08e73d072..80292dd69 100644 --- a/MDP/README.md +++ b/MDP/README.md @@ -26,7 +26,7 @@ **Required:** - [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/RLbook2018.pdf) - Chapter 3: Finite Markov Decision Processes -- David Silver's RL Course Lecture 2 - Markov Decision Processes ([video](https://www.youtube.com/watch?v=lfHX2hHRMVQ), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MDP.pdf)) +- David Silver's RL Course Lecture 2 - Markov Decision Processes ([video](https://www.youtube.com/watch?v=lfHX2hHRMVQ), [slides](https://www.davidsilver.uk/wp-content/uploads/2020/03/MDP.pdf)) ### Exercises From 87027cadf448391f803e30cee5ca58bbc6d91ad0 Mon Sep 17 00:00:00 2001 From: Harsh Nilesh Pathak Date: Fri, 9 Oct 2020 17:18:00 -0700 Subject: [PATCH 3/9] update DP --- DP/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/DP/README.md b/DP/README.md index a6dabe88c..6fb96e2bc 100644 --- a/DP/README.md +++ b/DP/README.md @@ -24,7 +24,7 @@ **Required:** -- David Silver's RL Course Lecture 3 - Planning by Dynamic Programming ([video](https://www.youtube.com/watch?v=Nd1-UUMVfz4), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/DP.pdf)) +- David Silver's RL Course Lecture 3 - Planning by Dynamic Programming ([video](https://www.youtube.com/watch?v=Nd1-UUMVfz4), [slides](https://www.davidsilver.uk/wp-content/uploads/2020/03/DP.pdf)) **Optional:** @@ -47,4 +47,4 @@ - Implement Gambler's Problem - [Exercise](Gamblers%20Problem.ipynb) - - [Solution](Gamblers%20Problem%20Solution.ipynb) \ No newline at end of file + - [Solution](Gamblers%20Problem%20Solution.ipynb) From f6e09a697be1250e913570d844693c2d330efcc6 Mon Sep 17 00:00:00 2001 From: Harsh Nilesh Pathak Date: Fri, 9 Oct 2020 17:20:21 -0700 Subject: [PATCH 4/9] update MC --- MC/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/MC/README.md b/MC/README.md index 8f246c38d..e9e626915 100644 --- a/MC/README.md +++ b/MC/README.md @@ -31,8 +31,8 @@ **Optional:** -- David Silver's RL Course Lecture 4 - Model-Free Prediction ([video](https://www.youtube.com/watch?v=PnHCvfgC_ZA), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MC-TD.pdf)) -- David Silver's RL Course Lecture 5 - Model-Free Control ([video](https://www.youtube.com/watch?v=0g4j2k_Ggc4), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/control.pdf)) +- David Silver's RL Course Lecture 4 - Model-Free Prediction ([video](https://www.youtube.com/watch?v=PnHCvfgC_ZA), [slides](https://www.davidsilver.uk/wp-content/uploads/2020/03/MC-TD.pdf)) +- David Silver's RL Course Lecture 5 - Model-Free Control ([video](https://www.youtube.com/watch?v=0g4j2k_Ggc4), [slides](https://www.davidsilver.uk/wp-content/uploads/2020/03/control.pdf)) ### Exercises From 37e844639bce76a6cbc17d63895c28a1b8eef048 Mon Sep 17 00:00:00 2001 From: Harsh Nilesh Pathak Date: Fri, 9 Oct 2020 17:22:25 -0700 Subject: [PATCH 5/9] update TD --- TD/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/TD/README.md b/TD/README.md index 9b34caecc..eb30ac1f1 100644 --- a/TD/README.md +++ b/TD/README.md @@ -29,8 +29,8 @@ **Required:** - [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/RLbook2018.pdf) - Chapter 6: Temporal-Difference Learning -- David Silver's RL Course Lecture 4 - Model-Free Prediction ([video](https://www.youtube.com/watch?v=PnHCvfgC_ZA), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MC-TD.pdf)) -- David Silver's RL Course Lecture 5 - Model-Free Control ([video](https://www.youtube.com/watch?v=0g4j2k_Ggc4), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/control.pdf)) +- David Silver's RL Course Lecture 4 - Model-Free Prediction ([video](https://www.youtube.com/watch?v=PnHCvfgC_ZA), [slides](https://www.davidsilver.uk/wp-content/uploads/2020/03/MC-TD.pdf)) +- David Silver's RL Course Lecture 5 - Model-Free Control ([video](https://www.youtube.com/watch?v=0g4j2k_Ggc4), [slides](https://www.davidsilver.uk/wp-content/uploads/2020/03/control.pdf)) **Optional:** From c934f5f3a3d112aaf45604648942ba8099223dd9 Mon Sep 17 00:00:00 2001 From: Harsh Nilesh Pathak Date: Fri, 9 Oct 2020 17:24:14 -0700 Subject: [PATCH 6/9] update FA --- FA/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/FA/README.md b/FA/README.md index a8456622d..439e70c89 100644 --- a/FA/README.md +++ b/FA/README.md @@ -24,7 +24,7 @@ **Required:** -- David Silver's RL Course Lecture 6 - Value Function Approximation ([video](https://www.youtube.com/watch?v=UoPei5o4fps), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/FA.pdf)) +- David Silver's RL Course Lecture 6 - Value Function Approximation ([video](https://www.youtube.com/watch?v=UoPei5o4fps), [slides](https://www.davidsilver.uk/wp-content/uploads/2020/03/FA.pdf)) - [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/RLbook2018.pdf) - Chapter 9: On-policy Prediction with Approximation - [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/RLbook2018.pdf) - Chapter 10: On-policy Control with Approximation From b5f41fd22610eef469c7a68023057e18226c5537 Mon Sep 17 00:00:00 2001 From: Harsh Nilesh Pathak Date: Fri, 9 Oct 2020 17:25:45 -0700 Subject: [PATCH 7/9] update dqn --- DQN/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/DQN/README.md b/DQN/README.md index 07c887bbc..b23cfebe7 100644 --- a/DQN/README.md +++ b/DQN/README.md @@ -24,7 +24,7 @@ - [Human-Level Control through Deep Reinforcement Learning](http://www.readcube.com/articles/10.1038/nature14236) - [Demystifying Deep Reinforcement Learning](https://ai.intel.com/demystifying-deep-reinforcement-learning/) -- David Silver's RL Course Lecture 6 - Value Function Approximation ([video](https://www.youtube.com/watch?v=UoPei5o4fps), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/FA.pdf)) +- David Silver's RL Course Lecture 6 - Value Function Approximation ([video](https://www.youtube.com/watch?v=UoPei5o4fps), [slides](https://www.davidsilver.uk/wp-content/uploads/2020/03/FA.pdf)) **Optional:** From 6bb40aaf3900df904275eb228f64b98a427d4c75 Mon Sep 17 00:00:00 2001 From: Harsh Nilesh Pathak Date: Fri, 9 Oct 2020 17:28:12 -0700 Subject: [PATCH 8/9] update pg --- PolicyGradient/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/PolicyGradient/README.md b/PolicyGradient/README.md index e8e793b77..9f04471dc 100644 --- a/PolicyGradient/README.md +++ b/PolicyGradient/README.md @@ -32,7 +32,7 @@ **Required:** -- David Silver's RL Course Lecture 7 - Policy Gradient Methods ([video](https://www.youtube.com/watch?v=KHZVXao4qXs), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/pg.pdf)) +- David Silver's RL Course Lecture 7 - Policy Gradient Methods ([video](https://www.youtube.com/watch?v=KHZVXao4qXs), [slides](https://www.davidsilver.uk/wp-content/uploads/2020/03/pg.pdf)) **Optional:** From d1e9eb83599f46c64795b446253119be424a16ff Mon Sep 17 00:00:00 2001 From: Harsh Nilesh Pathak Date: Wed, 21 Oct 2020 16:17:24 -0700 Subject: [PATCH 9/9] SL and RL --- Introduction/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Introduction/README.md b/Introduction/README.md index f40fa6e11..68c0700b9 100644 --- a/Introduction/README.md +++ b/Introduction/README.md @@ -8,7 +8,7 @@ ### Summary - Reinforcement Learning (RL) is concerned with goal-directed learning and decision-making. -- In RL an agent learns from experiences it gains by interacting with the environment. In Supervised Learning we cannot affect the environment. +- In RL an agent learns from experiences it gains by interacting with the environment. In Supervised Learning we cannot affect the environment. It learns from data. - Moreover, RL provides evaluative feedback whereas, supervised learning provides instructive feedback. - In RL rewards are often delayed in time and the agent tries to maximize a long-term goal. For example, one may need to make seemingly suboptimal moves to reach a winning position in a game. - An agent interacts with the environment via states, actions and rewards.