diff --git a/docs/src/def_pomdp.md b/docs/src/def_pomdp.md index 08dc767d..e22cd0ea 100644 --- a/docs/src/def_pomdp.md +++ b/docs/src/def_pomdp.md @@ -350,7 +350,7 @@ It is easy to see that the new methods are similar to the keyword arguments in t In some cases, you may wish to use a simulator that generates the next state, observation, and/or reward (``s'``, ``o``, and ``r``) simultaneously. This is sometimes called a "generative model". -For example if you are working on an autonomous driving POMDP, the car may travel for one or more seconds in between POMDP decision steps during which it may accumulate reward and observation measurements. In this case it might be very difficult to create a reward or observation function based on ``s``, ``a``, and ``s'``. +For example if you are working on an autonomous driving POMDP, the car may travel for one or more seconds in between POMDP decision steps during which it may accumulate reward and observation measurements. In this case it might be very difficult to create a [`reward`](@ref) or [`observation`](@ref) function based on ``s``, ``a``, and ``s'`` arguments. For situations like this, `gen` is an alternative to `transition`, `observation`, and `reward`. The `gen` function should take in state, action, and random number generator arguments and return a [`NamedTuple`](https://docs.julialang.org/en/v1/manual/types/#Named-Tuple-Types) with keys `sp` (for "s-prime", the next state), `o`, and `r`. The [mountaincar example above](@ref po-mountaincar) can be implemented with `gen` as shown below. diff --git a/docs/src/faq.md b/docs/src/faq.md index 0faccb4c..61b54310 100644 --- a/docs/src/faq.md +++ b/docs/src/faq.md @@ -7,14 +7,14 @@ ### For problem implementers - [`transition`](@ref) should be implemented to define the state transition distribution, either explicitly, or, if only samples from the distribution are available, with an [`ImplicitDistribution`](@ref implicit_distribution_section). -- [`gen`](@ref) should **only** be implemented if your simulator can only output samples of two or more of the next state, observation, and reward *at the same time*, e.g. if rewards are calculated as a robot moves from the current state to the next state so it is difficult to define the [`reward`](@ref) function separately from the state transitions. -- [`@gen`](@ref) should **never** be implemented or modified by the problem writer; it is only used in simulators and solvers (see below). +- [`gen`](@ref) should *only* be implemented if your simulator can only output samples of two or more of the next state, observation, and reward *at the same time*, e.g. if rewards are calculated as a robot moves from the current state to the next state so it is difficult to define the [`reward`](@ref) function separately from the state transitions. +- [`@gen`](@ref) should *never* be implemented or modified by the problem writer; it is only used in simulators and solvers (see below). ### For solver/simulator implementers - [`@gen`](@ref) should be called whenever a sample of the next state, observation, and or reward is needed. It automatically combines calls to `rand`, [`transition`](@ref), [`observation`](@ref), [`reward`](@ref), and [`gen`](@ref), depending on what is implemented for the problem and the outputs requested by the caller without any overhead. -- [`transition`](@ref) should be called **only** when you need access to the explicit transition probability distribution. -- [`gen`](@ref) should **never** be called directly by a solver or simulator; it is only a tool for implementers (see above). +- [`transition`](@ref) should be called *only* when you need access to the explicit transition probability distribution. +- [`gen`](@ref) should *never* be called directly by a solver or simulator; it is only a tool for implementers (see above). ## How do I save my policies? @@ -26,8 +26,7 @@ save("my_policy.jld2", "policy", policy) ``` ## Why is my solver producing a suboptimal policy? -There could be a number of things that are going wrong. If you have a discrete POMDP or MDP and you're using a solver that requires the explicit transition probabilities -(you've implemented a `pdf` function), the first thing to try is make sure that your probability masses sum up to unity. +There could be a number of things that are going wrong. If you have a discrete POMDP or MDP and you're using a solver that requires the explicit transition probabilities, the first thing to try is make sure that your probability masses sum up to unity. We've provide some tools in POMDPToolbox that can check this for you. If you have a POMDP called pomdp, you can run the checks by doing the following: diff --git a/src/generative.jl b/src/generative.jl index 07a51967..55e98e46 100644 --- a/src/generative.jl +++ b/src/generative.jl @@ -42,20 +42,20 @@ function gen end Call the generative model for a (PO)MDP `m`; Sample values from several nodes in the dynamic decision network. X is one or more symbols indicating which nodes to output. -Solvers and simulators should usually call this rather than the `gen` function. Problem writers should implement methods of the `gen` function. +Solvers and simulators should call this rather than the `gen` function. Problem writers should implement a method of the `transition` or `gen` function instead of altering `@gen`. # Arguments - `m`: an `MDP` or `POMDP` model - `s`: the current state - `a`: the action -- `rng`: a random number generator (Typically a `MersenneTwister`) +- `rng` (optional): a random number generator (Typically a `MersenneTwister`) # Return If `X`, is a symbol, return a value sample from the corresponding node. If `X` is several symbols, return a `Tuple` of values sampled from the specified nodes. # Examples Let `m` be an `MDP` or `POMDP`, `s` be a state of `m`, `a` be an action of `m`, and `rng` be an `AbstractRNG`. -- `@gen(:sp, :r)(m, s, a, rng)` returns a `Tuple` containing the next state and reward. +- `@gen(:sp, :r)(m, s, a)` returns a `Tuple` containing the next state and reward. - `@gen(:sp, :o, :r)(m, s, a, rng)` returns a `Tuple` containing the next state, observation, and reward. - `@gen(:sp)(m, s, a, rng)` returns the next state. """