Skip to content

Commit

Permalink
updated @gen docstring and a few minor things
Browse files Browse the repository at this point in the history
  • Loading branch information
zsunberg committed Feb 11, 2022
1 parent 4e334cd commit 6adf702
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 10 deletions.
2 changes: 1 addition & 1 deletion docs/src/def_pomdp.md
Original file line number Diff line number Diff line change
Expand Up @@ -350,7 +350,7 @@ It is easy to see that the new methods are similar to the keyword arguments in t

In some cases, you may wish to use a simulator that generates the next state, observation, and/or reward (``s'``, ``o``, and ``r``) simultaneously. This is sometimes called a "generative model".

For example if you are working on an autonomous driving POMDP, the car may travel for one or more seconds in between POMDP decision steps during which it may accumulate reward and observation measurements. In this case it might be very difficult to create a reward or observation function based on ``s``, ``a``, and ``s'``.
For example if you are working on an autonomous driving POMDP, the car may travel for one or more seconds in between POMDP decision steps during which it may accumulate reward and observation measurements. In this case it might be very difficult to create a [`reward`](@ref) or [`observation`](@ref) function based on ``s``, ``a``, and ``s'`` arguments.

For situations like this, `gen` is an alternative to `transition`, `observation`, and `reward`. The `gen` function should take in state, action, and random number generator arguments and return a [`NamedTuple`](https://docs.julialang.org/en/v1/manual/types/#Named-Tuple-Types) with keys `sp` (for "s-prime", the next state), `o`, and `r`. The [mountaincar example above](@ref po-mountaincar) can be implemented with `gen` as shown below.

Expand Down
11 changes: 5 additions & 6 deletions docs/src/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,14 @@
### For problem implementers

- [`transition`](@ref) should be implemented to define the state transition distribution, either explicitly, or, if only samples from the distribution are available, with an [`ImplicitDistribution`](@ref implicit_distribution_section).
- [`gen`](@ref) should **only** be implemented if your simulator can only output samples of two or more of the next state, observation, and reward *at the same time*, e.g. if rewards are calculated as a robot moves from the current state to the next state so it is difficult to define the [`reward`](@ref) function separately from the state transitions.
- [`@gen`](@ref) should **never** be implemented or modified by the problem writer; it is only used in simulators and solvers (see below).
- [`gen`](@ref) should *only* be implemented if your simulator can only output samples of two or more of the next state, observation, and reward *at the same time*, e.g. if rewards are calculated as a robot moves from the current state to the next state so it is difficult to define the [`reward`](@ref) function separately from the state transitions.
- [`@gen`](@ref) should *never* be implemented or modified by the problem writer; it is only used in simulators and solvers (see below).

### For solver/simulator implementers

- [`@gen`](@ref) should be called whenever a sample of the next state, observation, and or reward is needed. It automatically combines calls to `rand`, [`transition`](@ref), [`observation`](@ref), [`reward`](@ref), and [`gen`](@ref), depending on what is implemented for the problem and the outputs requested by the caller without any overhead.
- [`transition`](@ref) should be called **only** when you need access to the explicit transition probability distribution.
- [`gen`](@ref) should **never** be called directly by a solver or simulator; it is only a tool for implementers (see above).
- [`transition`](@ref) should be called *only* when you need access to the explicit transition probability distribution.
- [`gen`](@ref) should *never* be called directly by a solver or simulator; it is only a tool for implementers (see above).

## How do I save my policies?

Expand All @@ -26,8 +26,7 @@ save("my_policy.jld2", "policy", policy)
```
## Why is my solver producing a suboptimal policy?

There could be a number of things that are going wrong. If you have a discrete POMDP or MDP and you're using a solver that requires the explicit transition probabilities
(you've implemented a `pdf` function), the first thing to try is make sure that your probability masses sum up to unity.
There could be a number of things that are going wrong. If you have a discrete POMDP or MDP and you're using a solver that requires the explicit transition probabilities, the first thing to try is make sure that your probability masses sum up to unity.
We've provide some tools in POMDPToolbox that can check this for you.
If you have a POMDP called pomdp, you can run the checks by doing the following:

Expand Down
6 changes: 3 additions & 3 deletions src/generative.jl
Original file line number Diff line number Diff line change
Expand Up @@ -42,20 +42,20 @@ function gen end
Call the generative model for a (PO)MDP `m`; Sample values from several nodes in the dynamic decision network. X is one or more symbols indicating which nodes to output.
Solvers and simulators should usually call this rather than the `gen` function. Problem writers should implement methods of the `gen` function.
Solvers and simulators should call this rather than the `gen` function. Problem writers should implement a method of the `transition` or `gen` function instead of altering `@gen`.
# Arguments
- `m`: an `MDP` or `POMDP` model
- `s`: the current state
- `a`: the action
- `rng`: a random number generator (Typically a `MersenneTwister`)
- `rng` (optional): a random number generator (Typically a `MersenneTwister`)
# Return
If `X`, is a symbol, return a value sample from the corresponding node. If `X` is several symbols, return a `Tuple` of values sampled from the specified nodes.
# Examples
Let `m` be an `MDP` or `POMDP`, `s` be a state of `m`, `a` be an action of `m`, and `rng` be an `AbstractRNG`.
- `@gen(:sp, :r)(m, s, a, rng)` returns a `Tuple` containing the next state and reward.
- `@gen(:sp, :r)(m, s, a)` returns a `Tuple` containing the next state and reward.
- `@gen(:sp, :o, :r)(m, s, a, rng)` returns a `Tuple` containing the next state, observation, and reward.
- `@gen(:sp)(m, s, a, rng)` returns the next state.
"""
Expand Down

0 comments on commit 6adf702

Please sign in to comment.