The best way to learn and to avoid the illusion of competence is to test yourself. This will help you to find where you need to reinforce your knowledge.
📝 Take a piece of paper and try to answer by writing, then check the solutions.
Solution
Reinforcement learning is a framework for solving control tasks (also called decision problems) by building agents that learn from the environment by interacting with it through trial and error and receiving rewards (positive or negative) as unique feedback.
📖 If you don't remember, check 👉 https://huggingface.co/blog/deep-rl-intro#a-formal-definition
At every step:
- Our Agent receives ______ from the environment
- Based on that ______ the Agent takes an ______
- Our Agent will move to the right
- The Environment goes to a ______
- The Environment gives ______ to the Agent
Solution
At every step:
- Our Agent receives state s0 from the environment
- Based on that state s0 the Agent takes an action a0
- Our Agent will move to the right
- The Environment goes to a new state s1
- The Environment gives a reward r1 to the Agent
📖 If you don't remember, check 👉 https://huggingface.co/blog/deep-rl-intro#the-rl-process
Solution
-
The state is a complete description of the state of the world (there is no hidden information), in a fully observed environment. For instance, in chess game, we receive a state from the environment since we have access to the whole checkboard information.
-
The observation is a partial description of the state. In a partially observed environment. For instance, in Super Mario Bros, we only see a part of the level close to the player, so we receive an observation.
📖 If you don't remember, check 👉 https://huggingface.co/blog/deep-rl-intro#observationsstates-space
Solution
-
Episodic task : we have a starting point and an ending point (a terminal state). This creates an episode: a list of States, Actions, Rewards, and new States. For instance, think about Super Mario Bros: an episode begin at the launch of a new Mario Level and ending when you’re killed or you reached the end of the level.
-
Continuous task : these are tasks that continue forever (no terminal state). In this case, the agent must learn how to choose the best actions and simultaneously interact with the environment.
📖 If you don't remember, check 👉 https://huggingface.co/blog/deep-rl-intro#type-of-tasks
Solution
In Reinforcement Learning, we need to balance how much we explore the environment and how much we exploit what we know about the environment.
-
Exploration is exploring the environment by trying random actions in order to find more information about the environment.
-
Exploitation is exploiting known information to maximize the reward.
📖 If you don't remember, check 👉 https://huggingface.co/blog/deep-rl-intro#exploration-exploitation-tradeoff
Solution
- The Policy π is the brain of our Agent, it’s the function that tell us what action to take given the state we are. So it defines the agent’s behavior at a given time.
📖 If you don't remember, check 👉 https://huggingface.co/blog/deep-rl-intro#the-policy-%CF%80-the-agents-brain
Solution
- Value-based methods is one of the main approaches for solving RL problems.
- In Value-based methods, instead of training a policy function, we train a value function that maps a state to the expected value of being at that state.
📖 If you don't remember, check 👉 https://huggingface.co/blog/deep-rl-intro#value-based-methods
Solution
- In Policy-Based Methods, we learn a policy function directly.
- This policy function will map from each state to the best corresponding action at that state. Or a probability distribution over the set of possible actions at that state.
📖 If you don't remember, check 👉 https://huggingface.co/blog/deep-rl-intro#value-based-methods
Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read again the chapter to reinforce (😏) your knowledge.
Keep Learning, Stay Awesome