documentation

msainsburydale · Feb 17, 2025 · b77f329 · b77f329
1 parent b0f97fc
commit b77f329
Show file tree

Hide file tree

Showing 6 changed files with 308 additions and 287 deletions.
diff --git a/docs/src/API/architectures.md b/docs/src/API/architectures.md
@@ -1,6 +1,6 @@
 # Architectures
 
-In principle, any [`Flux`](https://fluxml.ai/Flux.jl/stable/) model can be used to construct the neural network. To integrate it into the workflow, one need only define a method that transforms $K$-dimensional vectors of data sets into matrices with $K$ columns, where the number of rows corresponds to the dimensionality of the output spaces listed in the [Overview](@ref). 
+In principle, any [`Flux`](https://fluxml.ai/Flux.jl/stable/) model can be used to construct the neural network (see the [Gridded data](@ref) example). To integrate it into the workflow, one need only define a method that transforms $K$-dimensional vectors of data sets into matrices with $K$ columns, where the number of rows corresponds to the dimensionality of the output spaces listed in the [Overview](@ref). 
 
 ## Modules
 

diff --git a/docs/src/workflow/examples.md b/docs/src/workflow/examples.md
@@ -198,7 +198,7 @@ function simulate(parameters::Parameters, m = 1)
 end
 ```
 
-A possible architecture is as follows. Note that deeper architectures that employ residual connections (see [ResidualBlock](@ref)) often lead to improved performance, and certain pooling layers (e.g., [GlobalMeanPool](https://fluxml.ai/Flux.jl/stable/reference/models/layers/#Flux.GlobalMeanPool)) allow the neural network to accommodate grids of varying dimension; for further discussion and an illustration, see [Sainsbury-Dale et al. (2025, Sec. S3, S4)](https://doi.org/10.48550/arXiv.2501.04330). 
+A possible neural-network architecture is as follows. Note that deeper architectures that employ residual connections (see [ResidualBlock](@ref)) often lead to improved performance, and certain pooling layers (e.g., [GlobalMeanPool](https://fluxml.ai/Flux.jl/stable/reference/models/layers/#Flux.GlobalMeanPool)) allow the neural network to accommodate grids of varying dimension; for further discussion and an illustration, see [Sainsbury-Dale et al. (2025, Sec. S3, S4)](https://doi.org/10.48550/arXiv.2501.04330). 
 
 ```julia
 # Inner network 
@@ -217,6 +217,25 @@ A possible architecture is as follows. Note that deeper architectures that emplo
 network = DeepSet(ψ, ϕ)
 ```
 
+Above, we embedded our CNN within the DeepSets framework to accommodate scenarios involving replicated spatial data (e.g., when fitting models for spatial extremes). However, as noted in Step 4 of the [Overview](@ref), the package allows users to define the neural network using any Flux model. Since this example does not include independent replicates, the following CNN model is equivalent to the DeepSets architecture used above:  
+
+```julia
+struct CNN{T <: Chain} 
+	chain::T
+end
+function (cnn::CNN)(Z) 
+	cnn.chain(stackarrays(Z))
+end
+network = CNN(Chain(
+	Conv((3, 3), 1 => 32, relu),
+	MaxPool((2, 2)),Conv((3, 3), 32 => 64, relu), 
+	MaxPool((2, 2)), 
+	Flux.flatten, 
+	Dense(256, 64, relu), 
+	Dense(64, 1)                   
+))
+```
+
 Next, we initialise a point estimator and a posterior credible-interval estimator:
 
 ```julia
@@ -227,12 +246,11 @@ q̂ = IntervalEstimator(network)
 Now we train the estimators, here using fixed parameter instances to avoid repeated Cholesky factorisations (see [Storing expensive intermediate objects for data simulation](@ref) and [On-the-fly and just-in-time simulation](@ref) for further discussion):
 
 ```julia
-K = 10000  # number of training parameter vectors
-m = 1      # number of independent replicates in each data set
+K = 10000 # number of training parameter vectors
 θ_train = sample(K)
-θ_val = sample(K ÷ 10)
-θ̂ = train(θ̂, θ_train, θ_val, simulate, m = m)
-q̂ = train(q̂, θ_train, θ_val, simulate, m = m)
+θ_val   = sample(K ÷ 10)
+θ̂ = train(θ̂, θ_train, θ_val, simulate)
+q̂ = train(q̂, θ_train, θ_val, simulate)
 ```
 
 Once the estimators have been trained, we assess them using empirical simulation-based methods:
@@ -253,10 +271,10 @@ plot(assessment)
 Finally, we can apply our estimators to observed data:
 
 ```julia
-θ = sample(1)                          # true parameter
+θ = Parameters(Matrix([0.1]'))         # true parameter
 Z = simulate(θ)                        # "observed" data
-estimate(θ̂, Z)                         # point estimate
-interval(q̂, Z)                         # 95% marginal posterior credible intervals
+estimate(θ̂, Z)                         # point estimate: 0.11
+interval(q̂, Z)                         # 95% marginal posterior credible interval: [0.08, 0.16]
 ```
 
 Note that missing data (e.g., due to cloud cover) can be accommodated using the [missing-data methods](@ref "Missing data") implemented in the package.

diff --git a/docs/src/workflow/overview.md b/docs/src/workflow/overview.md
@@ -11,7 +11,7 @@ Neural inferential methods have marked practical appeal, as their implementation
     * For neural posterior estimators, the neural network is a mapping $\mathcal{Z}\to\mathcal{K}$, where $\mathcal{K}$ denotes the space of the approximate-distribution parameters $\boldsymbol{\kappa}$. 
     * For neural ratio estimators, the neural network is a mapping $\mathcal{Z}\times\Theta\to\mathbb{R}$. 
 
-    Any [Flux](https://fluxml.ai/Flux.jl/stable/) model can be used to construct the neural network. To integrate it into the workflow, one need only define a method that transforms $K$-dimensional vectors of data (see Step 2 above) into matrices with $K$ columns, where the number of rows corresponds to the dimensionality of the output spaces listed above. The type [`DeepSet`](@ref) serves as a convenient wrapper for embedding standard neural networks (e.g., MLPs, CNNs, GNNs) in a framework for making inference with an arbitrary number of independent replicates, and it comes with pre-defined methods for handling the transformations from a $K$-dimensional vector of data to a matrix output. 
+    Any [Flux](https://fluxml.ai/Flux.jl/stable/) model can be used to construct the neural network. To integrate it into the workflow, one need only define a method that transforms $K$-dimensional vectors of data (see Step 2 above) into matrices with $K$ columns, where the number of rows corresponds to the dimensionality of the output spaces listed above (see the [Gridded data](@ref) example). The type [`DeepSet`](@ref) serves as a convenient wrapper for embedding standard neural networks (e.g., MLPs, CNNs, GNNs) in a framework for making inference with an arbitrary number of independent replicates, and it comes with pre-defined methods for handling the transformations from a $K$-dimensional vector of data to a matrix output. 
 1. Wrap the neural network (and possibly the approximate distribution) in a  [subtype of `NeuralEstimator`](@ref "Estimators") corresponding to the intended inferential method:
     * For neural Bayes estimators under general, user-defined loss functions, use [`PointEstimator`](@ref); 
     * For neural posterior estimators, use [`PosteriorEstimator`](@ref);

diff --git a/src/Architectures.jl b/src/Architectures.jl
@@ -124,10 +124,10 @@ X  = [rand32(dₓ) for _ ∈ eachindex(Z)]
 ds((Z, X))
 ```
 """
-struct DeepSet{T, G, K}
+struct DeepSet{T, G, K, A}
 	ψ::T
 	ϕ::G
-	a::ElementwiseAggregator
+	a::A
 	S::K
 end
 function DeepSet(ψ, ϕ, a::Function = mean; S = nothing)

diff --git a/src/NeuralEstimators.jl b/src/NeuralEstimators.jl
@@ -87,7 +87,6 @@ end
 # - Functionality: assess(estimator::PosteriorEstimator) and assess(estimator::RatioEstimator) and corresponding diagnostics. 
 # - Functionality: Incorporate the following package (possibly as an extension) to expand bootstrap functionality; https://github.com/juliangehring/Bootstrap.jl. Note also the "straps()" method that allows one to obtain the bootstrap distribution. I think what I can do is define a method of interval(bs::BootstrapSample). Maybe one difficulty will be how to re-sample... Not sure how the bootstrap method will know to sample from the independent replicates dimension (the last dimension) of each array.
 # -	Functionality: Training, option to check validation risk (and save the optimal estimator) more frequently than the end of each epoch, which would avoid wasted computation when we have very large training sets. 
-# - Functionality: Helper functions for censored data. 
 # - Functionality: Explicit learning of summary statistics.
 # - Polishing: Might be better to use Plots rather than {AlgebraOfGraphics, CairoMakie}.
 # - Add NeuralEstimators.jl to the list of packages that use Documenter: see https://documenter.juliadocs.org/stable/man/examples/