Skip to content

Latest commit

 

History

History
8 lines (3 loc) · 366 Bytes

existing-problems.md

File metadata and controls

8 lines (3 loc) · 366 Bytes

Existing problems in Coursera RL course

Section 4: Softmax Policy, the second equation lacks of $$\tau$$.

Section 5: Putting the pieces together. In the pseudocode for performing the updates: Do Expected Sarsa update with $$Q_t$$. The two $$Q_{t+1}$$ on the right side of "<--" should be replaced by $$Q_t$$.