-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New interface #27
New interface #27
Conversation
Some update of #6 (comment): I think we should add a |
I will give a try if things haven't been moved here. |
@xukai92 Agreed! |
I'm slightly itchy about using a vector to implement the composition operation. As you point out @torfjelde it will probably create some type instability issues. Could we instead use a |
@willtebbutt After messing about with trying to get type-stability for VI, I think making it a
So it seems worth it to leave the "optimization" up to the user. EDIT: though it's going to need some work/thought to "unwrap" compositions of |
@torfjelde could you explain the difference between your tuple implementation and splatted implementation please? I had imagined something like struct Composed{Tts<:Tuple} <: Bijector
ts::Tts
end so the type of each of the things being composed should be known. Alternatively, we could just not type the stuff that's contained e.g. struct Composed{Tts} <: Bijector
ts::Tts
end and leave it up to the user to use their preferred iterable data structure i.e. they could choose to use either a |
Using I also thinked about this but choosed to use vector only because I want a easy way to do |
Fair enough @xukai92. How critical is the |
@willtebbutt The difference is in the constructor. For the tuple-impl we have to do:
for it to be type-stable. If we drop type-stability from the constructor we can do
which will
We can of course drop type-stability of constructor but keep the
Agree; that's what I did locally to try it out. |
It would work but just a little bit trickier as I was playing with an idea on adding or removing the transformations during training. I could build a new model each time I need to change, or I can push or pop.
As I said in Slakc I feel this is a little bit over-optimisation. If it doens't give much performance gain we'd better not to do this automatic change which is not expected by user inituitivly. |
Or maybe add a flag called to trigger this behaviour. |
One thing to note about this though, depending on how often you add/remove, is that with the type-stable |
The performance gain really depends on the context: as you say if we're working in the DL setting, then this optimisation is unnecessary. However, if we're expecting to use these in other settings then it's probably not unreasonable to ask for type stability. The other option is, again, to provide both interfaces and just expect the Another alternative is to have two different types, which have different performance characteristics and provide different behaviour. I don't have any good suggestions for names though. @torfjelde : thanks for the explanation of your implementations
I agree that this is generally tricky to do in a type-stable way, but it's not clear to me why we mind if the user creates nested compositions. Is there something obvious that I'm missing?
I agree with you that this is a nice-to-have rather than something that's especially important. |
I'm talking about optimizations for the type stable implementations, i.e. the un-wrapping and getting rid of inverse operations. I completely agree that we need type stable version. |
Is this still true? |
Yes; and it has already caused some confusion I believe. At least, I'm very much for making |
I'm pro to stick to the mathematical convention. I think the current implementation already does so right? |
Internally, i.e. Currently, |
I feel it's better to make Maybe what we can do is to drop |
That's not a bad idea 👍 Only thing is though, There's another thing I've been thinking about; you can do multiple dispatch on something like |
I guess the way to link
I didn't mean to get nested compositions via |
Ah, follow 👍 Yeah that sounds good! |
struct Log <: Bijector end | ||
|
||
(b::Log)(x) = @. log(x) | ||
(b::Exp)(y) = @. exp(y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We often need to apply this to just one of the dimensions. Any ideas on how to go about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me and @xukai92 were discussing this earlier, and we feel like batch computation is something we'll have to have a proper discussion about after this PR is merged. For now assume batch-computation is not supported :/ The difficulty is that in a lot of cases it's ambiguous whether or not something is a batch. And we want to make sure that if we say we support batch-computation, then it's supported for all bijectors, not just on a case-by-case basis.
I know you use it a lot though, so I'd suggest making a different branch where you try to accommodate batch-computation and then we can potentially use this when we're considering the approach we're going to take to support it:)
Discussion: #27 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sharanry @torfjelde Let's experiment this feature in a different branch. I'm going to merge this PR soon.
We also need to increment the version number in |
Sure, 0.4 sounds good to me. |
@torfjelde Can you increment the version please? |
Done! |
EDIT: I missed the latest suggestion from @kai in issue #6: #6 (comment). Seems like better approach!EDIT 2: I've now tried incorporating the suggested interface.
Overview
This is a proposal for a new interface related to discussion in issue #6. It introduces
Bijector
andInversed{<: Bijector} <: Bijector
which represents an invertible transform and its inverse, respectively, with the following methods:transform(b::Bijector, x)
andtransform(ib::Inversed{<: Bijector}, y)
: transforms from original variable space to transformed space, and vice versa.inv(b::Bijector, y)
: simply returns theInversed{<: Bijector}
.logabsdetjac(b::Bijector, x)
andlogabsdetjac(b::Inversed{<: Bijector}, y)
: computes thelogabsdet
of the Jacobian of the transformation and its inverse. Inverse defaults to-logabsdetjac(ib.orig, transform(ib, y))
jacobian(b::Bijector, x)
: this is currently only used in default impl oflogabsdetjac
forADBijector
, thus only requires impl iflogabsdetjac
is not implemented (further than default impl).Composed{<: Bijector}
which allow composing bijectorscompose(bs...)
: returns theComposed
object of the bijectorsbs
∘(b1::Bijector, b2::Bijector)
: results in a composition of the two bijectors (firstb2
, thenb1
)UnivariateTransformed <: UnivariateDistribution{Continuous}
andMultivariateTransformed <: MultivariateDistribution{Continuous}
both which implement the correspoding Distributions.jl interfaces:logpdf(td, y)
rand(td)
logpdf_with_jac(td, y)
: returns bothlogpdf(td, y)
andlogabsdetjac(td, y)
, saving repetitive computation in cases where both these quantities are wanted. I believe this is often enough to warrent such a method.transformed(d::Distribution, b::Bijector)
: returns a transformed distribution usingb
as the bijector.transformed(d::Distribution) = transformed(d, DistributionBijector(d))
is provided, whereDistributionBijector
is a proxy for thelink
andinvlink
already in Bijectors.jl (logabsdetjac
is provided by AD)For the user to implement their own
UserBijector
, the easiest approach is then to makeUserBijector{AD} <: ADBijector{AD}
and then they only need to implementtransform(b::UserBijector, x)
andtransform(ib::Inversed{UserBijector}, y)
.Examples
Composition
ADVI
Finite-set Stein Discrepancy (FSSD) test
This is a fun little (and accidental!) "outside-of-Turing"-application of this interface.
I have a package called
KernelGoodnessOfFit
which performs a goodness-of-fit test for a given model distributionq <: Distribution
. The issue with the FSSD test is that it assumes the distribution to have support on the entire real line, and thus does not work for distributions such asBeta
. With this interface forBijectors
we can simply plug in theUnivariateTransformed{Beta}
and the test works! (in the sense that the false-negative/rejection ratio is less than the specified size of the test)I know this is a frequentist application; please don't shoot me!
It's of course not directly related to what we do in Turing, but it shows a very neat application that immediately follows from this interface.
Normalizing flows (thanks to @sharanry!)
Currently we have
PlanarFlow
andRadialFlow
implemented, but implementing other flows is fairly straightforward.Issues / Notes
transform(Composed{<: Bijector}, x)
is implemented by applying transforms from left-to-right; should this be changed?logabsdetjac
"manually" rather than using AD for these known transformations.Turing
for the different backend-types; probably don't want to do this. Need to decide how to handle this.Composed
type.Should we uselogdetinvjac
orlogdetjac
? They're equivalent up to minus sign, but inside the transformed distributions we'll mostly need thelogdetinvjac
(though could always dologdetjac(td.transform, transform(td.transform, y))
logabsdetjac(ib::Inversed{<: Bijector}, y) = -logabsdetjac(ib.orig, transform(ib, y))
logabsdetjac(ib::SpecificBijector, y)
to in cases and corresponding forInversed{<:SpecificBijector}
to get both.The interface is potentially a bit verbose at the moment, can do:Rename fields of the transformed distributions tod::Distribution
andb::Bijector
transform(d::Distribution, y) = transform(d.transform, y)
and similarily forinverse