Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated link to Sutton's book. Changed Lambda to Gamma in FA #123

Merged
merged 3 commits into from
Dec 6, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DP/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@

**Optional:**

- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 4: Dynamic Programming
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 4: Dynamic Programming


### Exercises
Expand Down
40 changes: 12 additions & 28 deletions FA/Q-Learning with Value Function Approximation Solution.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,7 @@
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
Expand All @@ -31,9 +29,7 @@
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"name": "stderr",
Expand All @@ -50,9 +46,7 @@
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand All @@ -74,7 +68,7 @@
"scaler = sklearn.preprocessing.StandardScaler()\n",
"scaler.fit(observation_examples)\n",
"\n",
"# Used to converte a state to a featurizes represenation.\n",
"# Used to convert a state to a featurizes represenation.\n",
"# We use RBF kernels with different variances to cover different parts of the space\n",
"featurizer = sklearn.pipeline.FeatureUnion([\n",
" (\"rbf1\", RBFSampler(gamma=5.0, n_components=100)),\n",
Expand All @@ -88,9 +82,7 @@
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"class Estimator():\n",
Expand Down Expand Up @@ -151,9 +143,7 @@
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"def make_epsilon_greedy_policy(estimator, epsilon, nA):\n",
Expand Down Expand Up @@ -182,9 +172,7 @@
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"def q_learning(env, estimator, num_episodes, discount_factor=1.0, epsilon=0.1, epsilon_decay=1.0):\n",
Expand All @@ -196,7 +184,7 @@
" env: OpenAI environment.\n",
" estimator: Action-Value function estimator\n",
" num_episodes: Number of episodes to run for.\n",
" discount_factor: Lambda time discount factor.\n",
" discount_factor: Gamma discount factor.\n",
" epsilon: Chance the sample a random action. Float betwen 0 and 1.\n",
" epsilon_decay: Each episode, epsilon is decayed by this factor\n",
" \n",
Expand Down Expand Up @@ -283,9 +271,7 @@
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"name": "stdout",
Expand All @@ -305,9 +291,7 @@
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand Down Expand Up @@ -384,9 +368,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 1
}
30 changes: 11 additions & 19 deletions FA/Q-Learning with Value Function Approximation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
"collapsed": true
},
"outputs": [],
"source": [
Expand All @@ -31,9 +31,7 @@
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"name": "stderr",
Expand All @@ -50,9 +48,7 @@
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand Down Expand Up @@ -89,7 +85,7 @@
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
"collapsed": true
},
"outputs": [],
"source": [
Expand Down Expand Up @@ -149,7 +145,7 @@
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
"collapsed": true
},
"outputs": [],
"source": [
Expand Down Expand Up @@ -180,7 +176,7 @@
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
"collapsed": true
},
"outputs": [],
"source": [
Expand All @@ -193,7 +189,7 @@
" env: OpenAI environment.\n",
" estimator: Action-Value function estimator\n",
" num_episodes: Number of episodes to run for.\n",
" discount_factor: Lambda time discount factor.\n",
" discount_factor: Gamma discount factor.\n",
" epsilon: Chance the sample a random action. Float betwen 0 and 1.\n",
" epsilon_decay: Each episode, epsilon is decayed by this factor\n",
" \n",
Expand Down Expand Up @@ -237,9 +233,7 @@
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"name": "stdout",
Expand All @@ -259,9 +253,7 @@
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand Down Expand Up @@ -326,9 +318,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.3"
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 1
}
6 changes: 4 additions & 2 deletions FA/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@
**Required:**

- David Silver's RL Course Lecture 6 - Value Function Approximation ([video](https://www.youtube.com/watch?v=UoPei5o4fps), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/FA.pdf))
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 9: On-policy Prediction with Approximation
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 10: On-policy Control with Approximation
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 9: On-policy Prediction with Approximation
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 10: On-policy Control with Approximation

**Optional:**

Expand All @@ -35,6 +35,8 @@

### Exercises

- Get familiar with the [Mountain Car Playground](MountainCar%20Playground.ipynb)

- Solve Mountain Car Problem using Q-Learning with Linear Function Approximation
- [Exercise](Q-Learning%20with%20Value%20Function%20Approximation.ipynb)
- [Solution](Q-Learning%20with%20Value%20Function%20Approximation%20Solution.ipynb)
2 changes: 1 addition & 1 deletion Introduction/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

**Required:**

- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 1: The Reinforcement Learning Problem
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 1: The Reinforcement Learning Problem
- David Silver's RL Course Lecture 1 - Introduction to Reinforcement Learning ([video](https://www.youtube.com/watch?v=2pWv7GOvuf0), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/intro_RL.pdf))
- [OpenAI Gym Tutorial](https://gym.openai.com/docs)

Expand Down
2 changes: 1 addition & 1 deletion MC/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@

**Required:**

- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 5: Monte Carlo Methods
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 5: Monte Carlo Methods


**Optional:**
Expand Down
2 changes: 1 addition & 1 deletion MDP/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@

**Required:**

- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 3: Finite Markov Decision Processes
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 3: Finite Markov Decision Processes
- David Silver's RL Course Lecture 2 - Markov Decision Processes ([video](https://www.youtube.com/watch?v=lfHX2hHRMVQ), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MDP.pdf))


Expand Down
2 changes: 1 addition & 1 deletion PolicyGradient/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@

**Optional:**

- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 11: Policy Gradient Methods (Under Construction)
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 11: Policy Gradient Methods (Under Construction)
- [Deterministic Policy Gradient Algorithms](http://jmlr.org/proceedings/papers/v32/silver14.pdf)
- [Deterministic Policy Gradient Algorithms (Talk)](http://techtalks.tv/talks/deterministic-policy-gradient-algorithms/61098/)
- [Continuous control with deep reinforcement learning](https://arxiv.org/abs/1509.02971)
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This repository provides code, exercises and solutions for popular Reinforcement Learning algorithms. These are meant to serve as a learning tool to complement the theoretical materials from

- [Reinforcement Learning: An Introduction (2nd Edition)](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf)
- [Reinforcement Learning: An Introduction (2nd Edition)](http://incompleteideas.net/book/bookdraft2017nov5.pdf)
- [David Silver's Reinforcement Learning Course](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html)

Each folder in corresponds to one or more chapters of the above textbook and/or course. In addition to exercises and solution, each folder also contains a list of learning goals, a brief concept summary, and links to the relevant readings.
Expand Down Expand Up @@ -50,7 +50,7 @@ All code is written in Python 3 and uses RL environments from [OpenAI Gym](https

Textbooks:

- [Reinforcement Learning: An Introduction (2nd Edition)](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf)
- [Reinforcement Learning: An Introduction (2nd Edition)](http://incompleteideas.net/book/bookdraft2017nov5.pdf)

Classes:

Expand Down
6 changes: 3 additions & 3 deletions TD/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,14 @@

**Required:**

- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 6: Temporal-Difference Learning
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 6: Temporal-Difference Learning
- David Silver's RL Course Lecture 4 - Model-Free Prediction ([video](https://www.youtube.com/watch?v=PnHCvfgC_ZA), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MC-TD.pdf))
- David Silver's RL Course Lecture 5 - Model-Free Control ([video](https://www.youtube.com/watch?v=0g4j2k_Ggc4), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/control.pdf))

**Optional:**

- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 7: Multi-Step Bootstrapping
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 12: Eligibility Traces
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 7: Multi-Step Bootstrapping
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 12: Eligibility Traces


### Exercises
Expand Down