Skip to content

Commit

Permalink
Merge pull request #123 from BAILOOL/master
Browse files Browse the repository at this point in the history
Updated link to Sutton's book. Changed Lambda to Gamma in FA
  • Loading branch information
dennybritz authored Dec 6, 2017
2 parents 74d301c + 9ee6cdd commit f45bcbf
Show file tree
Hide file tree
Showing 10 changed files with 37 additions and 59 deletions.
2 changes: 1 addition & 1 deletion DP/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@

**Optional:**

- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 4: Dynamic Programming
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 4: Dynamic Programming


### Exercises
Expand Down
40 changes: 12 additions & 28 deletions FA/Q-Learning with Value Function Approximation Solution.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,7 @@
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
Expand All @@ -31,9 +29,7 @@
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"name": "stderr",
Expand All @@ -50,9 +46,7 @@
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand All @@ -74,7 +68,7 @@
"scaler = sklearn.preprocessing.StandardScaler()\n",
"scaler.fit(observation_examples)\n",
"\n",
"# Used to converte a state to a featurizes represenation.\n",
"# Used to convert a state to a featurizes represenation.\n",
"# We use RBF kernels with different variances to cover different parts of the space\n",
"featurizer = sklearn.pipeline.FeatureUnion([\n",
" (\"rbf1\", RBFSampler(gamma=5.0, n_components=100)),\n",
Expand All @@ -88,9 +82,7 @@
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"class Estimator():\n",
Expand Down Expand Up @@ -151,9 +143,7 @@
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"def make_epsilon_greedy_policy(estimator, epsilon, nA):\n",
Expand Down Expand Up @@ -182,9 +172,7 @@
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"def q_learning(env, estimator, num_episodes, discount_factor=1.0, epsilon=0.1, epsilon_decay=1.0):\n",
Expand All @@ -196,7 +184,7 @@
" env: OpenAI environment.\n",
" estimator: Action-Value function estimator\n",
" num_episodes: Number of episodes to run for.\n",
" discount_factor: Lambda time discount factor.\n",
" discount_factor: Gamma discount factor.\n",
" epsilon: Chance the sample a random action. Float betwen 0 and 1.\n",
" epsilon_decay: Each episode, epsilon is decayed by this factor\n",
" \n",
Expand Down Expand Up @@ -283,9 +271,7 @@
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"name": "stdout",
Expand All @@ -305,9 +291,7 @@
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand Down Expand Up @@ -384,9 +368,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 1
}
30 changes: 11 additions & 19 deletions FA/Q-Learning with Value Function Approximation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
"collapsed": true
},
"outputs": [],
"source": [
Expand All @@ -31,9 +31,7 @@
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"name": "stderr",
Expand All @@ -50,9 +48,7 @@
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand Down Expand Up @@ -89,7 +85,7 @@
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
"collapsed": true
},
"outputs": [],
"source": [
Expand Down Expand Up @@ -149,7 +145,7 @@
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
"collapsed": true
},
"outputs": [],
"source": [
Expand Down Expand Up @@ -180,7 +176,7 @@
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
"collapsed": true
},
"outputs": [],
"source": [
Expand All @@ -193,7 +189,7 @@
" env: OpenAI environment.\n",
" estimator: Action-Value function estimator\n",
" num_episodes: Number of episodes to run for.\n",
" discount_factor: Lambda time discount factor.\n",
" discount_factor: Gamma discount factor.\n",
" epsilon: Chance the sample a random action. Float betwen 0 and 1.\n",
" epsilon_decay: Each episode, epsilon is decayed by this factor\n",
" \n",
Expand Down Expand Up @@ -237,9 +233,7 @@
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"name": "stdout",
Expand All @@ -259,9 +253,7 @@
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand Down Expand Up @@ -326,9 +318,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.3"
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 1
}
6 changes: 4 additions & 2 deletions FA/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@
**Required:**

- David Silver's RL Course Lecture 6 - Value Function Approximation ([video](https://www.youtube.com/watch?v=UoPei5o4fps), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/FA.pdf))
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 9: On-policy Prediction with Approximation
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 10: On-policy Control with Approximation
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 9: On-policy Prediction with Approximation
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 10: On-policy Control with Approximation

**Optional:**

Expand All @@ -35,6 +35,8 @@

### Exercises

- Get familiar with the [Mountain Car Playground](MountainCar%20Playground.ipynb)

- Solve Mountain Car Problem using Q-Learning with Linear Function Approximation
- [Exercise](Q-Learning%20with%20Value%20Function%20Approximation.ipynb)
- [Solution](Q-Learning%20with%20Value%20Function%20Approximation%20Solution.ipynb)
2 changes: 1 addition & 1 deletion Introduction/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

**Required:**

- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 1: The Reinforcement Learning Problem
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 1: The Reinforcement Learning Problem
- David Silver's RL Course Lecture 1 - Introduction to Reinforcement Learning ([video](https://www.youtube.com/watch?v=2pWv7GOvuf0), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/intro_RL.pdf))
- [OpenAI Gym Tutorial](https://gym.openai.com/docs)

Expand Down
2 changes: 1 addition & 1 deletion MC/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@

**Required:**

- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 5: Monte Carlo Methods
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 5: Monte Carlo Methods


**Optional:**
Expand Down
2 changes: 1 addition & 1 deletion MDP/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@

**Required:**

- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 3: Finite Markov Decision Processes
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 3: Finite Markov Decision Processes
- David Silver's RL Course Lecture 2 - Markov Decision Processes ([video](https://www.youtube.com/watch?v=lfHX2hHRMVQ), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MDP.pdf))


Expand Down
2 changes: 1 addition & 1 deletion PolicyGradient/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@

**Optional:**

- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 11: Policy Gradient Methods (Under Construction)
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 11: Policy Gradient Methods (Under Construction)
- [Deterministic Policy Gradient Algorithms](http://jmlr.org/proceedings/papers/v32/silver14.pdf)
- [Deterministic Policy Gradient Algorithms (Talk)](http://techtalks.tv/talks/deterministic-policy-gradient-algorithms/61098/)
- [Continuous control with deep reinforcement learning](https://arxiv.org/abs/1509.02971)
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This repository provides code, exercises and solutions for popular Reinforcement Learning algorithms. These are meant to serve as a learning tool to complement the theoretical materials from

- [Reinforcement Learning: An Introduction (2nd Edition)](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf)
- [Reinforcement Learning: An Introduction (2nd Edition)](http://incompleteideas.net/book/bookdraft2017nov5.pdf)
- [David Silver's Reinforcement Learning Course](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html)

Each folder in corresponds to one or more chapters of the above textbook and/or course. In addition to exercises and solution, each folder also contains a list of learning goals, a brief concept summary, and links to the relevant readings.
Expand Down Expand Up @@ -50,7 +50,7 @@ All code is written in Python 3 and uses RL environments from [OpenAI Gym](https

Textbooks:

- [Reinforcement Learning: An Introduction (2nd Edition)](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf)
- [Reinforcement Learning: An Introduction (2nd Edition)](http://incompleteideas.net/book/bookdraft2017nov5.pdf)

Classes:

Expand Down
6 changes: 3 additions & 3 deletions TD/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,14 @@

**Required:**

- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 6: Temporal-Difference Learning
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 6: Temporal-Difference Learning
- David Silver's RL Course Lecture 4 - Model-Free Prediction ([video](https://www.youtube.com/watch?v=PnHCvfgC_ZA), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MC-TD.pdf))
- David Silver's RL Course Lecture 5 - Model-Free Control ([video](https://www.youtube.com/watch?v=0g4j2k_Ggc4), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/control.pdf))

**Optional:**

- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 7: Multi-Step Bootstrapping
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 12: Eligibility Traces
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 7: Multi-Step Bootstrapping
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 12: Eligibility Traces


### Exercises
Expand Down

0 comments on commit f45bcbf

Please sign in to comment.