New interface #27

torfjelde · 2019-07-08T22:04:08Z

~~EDIT: I missed the latest suggestion from @kai in issue #6: #6 (comment). Seems like better approach!~~
EDIT 2: I've now tried incorporating the suggested interface.

Overview

This is a proposal for a new interface related to discussion in issue #6. It introduces

Bijector and Inversed{<: Bijector} <: Bijector which represents an invertible transform and its inverse, respectively, with the following methods:
- transform(b::Bijector, x) and transform(ib::Inversed{<: Bijector}, y): transforms from original variable space to transformed space, and vice versa.
- inv(b::Bijector, y): simply returns the Inversed{<: Bijector}.
- logabsdetjac(b::Bijector, x) and logabsdetjac(b::Inversed{<: Bijector}, y): computes the logabsdet of the Jacobian of the transformation and its inverse. Inverse defaults to -logabsdetjac(ib.orig, transform(ib, y))
- [OPTIONAL] jacobian(b::Bijector, x): this is currently only used in default impl of logabsdetjac for ADBijector, thus only requires impl if logabsdetjac is not implemented (further than default impl).
Composed{<: Bijector} which allow composing bijectors
- compose(bs...): returns the Composed object of the bijectors bs
- ∘(b1::Bijector, b2::Bijector): results in a composition of the two bijectors (first b2, then b1)
UnivariateTransformed <: UnivariateDistribution{Continuous} and MultivariateTransformed <: MultivariateDistribution{Continuous} both which implement the correspoding Distributions.jl interfaces:
- logpdf(td, y)
- rand(td)
- logpdf_with_jac(td, y): returns both logpdf(td, y) and logabsdetjac(td, y), saving repetitive computation in cases where both these quantities are wanted. I believe this is often enough to warrent such a method.
transformed(d::Distribution, b::Bijector): returns a transformed distribution using b as the bijector.
- Default transformed(d::Distribution) = transformed(d, DistributionBijector(d)) is provided, where DistributionBijector is a proxy for the link and invlink already in Bijectors.jl (logabsdetjac is provided by AD)

For the user to implement their own UserBijector, the easiest approach is then to make UserBijector{AD} <: ADBijector{AD} and then they only need to implement transform(b::UserBijector, x) and transform(ib::Inversed{UserBijector}, y).

Examples

Composition

d = Beta()
b = DistributionBijector(d)
b⁻¹ = inv(b)

id = Identity()

x = rand(d)
(b ∘ b⁻¹)(x) ≈ id(x) # => `true`

ADVI

# Usage in ADVI
d = InverseGamma()
b = DistributionBijector(d)    # (0, ∞) → ℝ
ib = inv(b)                    # ℝ → (0, ∞)
td = transformed(Normal(), ib) # x ∼ 𝓝(0, 1) then f(x) ∈ (0, ∞)
x = rand(td)                   # ∈ (0, ∞)
0 < x < Inf                    # => true ✓

logpdf(td, x) != logpdf(d, x)  # => true ✓

Finite-set Stein Discrepancy (FSSD) test

This is a fun little (and accidental!) "outside-of-Turing"-application of this interface.

I have a package called KernelGoodnessOfFit which performs a goodness-of-fit test for a given model distribution q <: Distribution. The issue with the FSSD test is that it assumes the distribution to have support on the entire real line, and thus does not work for distributions such as Beta. With this interface for Bijectors we can simply plug in the UnivariateTransformed{Beta} and the test works! (in the sense that the false-negative/rejection ratio is less than the specified size of the test)

I know this is a frequentist application; please don't shoot me!

using Bijectors
using KernelGoodnessOfFit

using Random; Random.seed!(123)
using ForwardDiff
function Distributions.gradlogpdf(d::ContinuousUnivariateDistribution, x::Real)
    ForwardDiff.derivative(z -> logpdf(d, z), x)
end

p = Beta(2, 3)         # true distribution
td = transformed(p)    # transformed
q = Beta()             # model distribution

α = 0.05               # size of the test
n = 100                # number of samples
num_experiments = 100  # number of experiments to run

# Under H₀: p = q
total = 0.0            # rejection ratio
for i = 1:num_experiments
    xs = reshape(rand(p, n), 1, :)
    ys = td.transform.(xs)

    res = FSSDopt(ys, td; β_H₁ = 0.01)

    total += res.p_val ≤ α
end
total / n              # => 0.03 ≤ 0.05 (✓)

# Under H₁: p ≠ q
total = 0.0            # rejection ratio
for i = 1:num_experiments
    xs = reshape(rand(q, n), 1, :)
    ys = td.transform.(xs)

    res = FSSDopt(ys, td; β_H₁ = 0.01)

    total += res.p_val ≤ α
end
total / n              # => 0.83 "close" to 1.0 (✓)

It's of course not directly related to what we do in Turing, but it shows a very neat application that immediately follows from this interface.

Normalizing flows (thanks to @sharanry!)

d = MvNormal(zeros(10), ones(10))
b = PlanarLayer(10)

flow = transformed(d, b) # <= TransformedDistribution

y = rand(flow)           # samples from `MvNormal` and transforms using `b`
logpdf(flow, y)          # computes the logpdf using the inverse of `b`

# more efficiently (in certain cases), we can compute logpdf in forward pass
x = rand(d)
logpdf_forward(flow, x)

# Want to fit the flow? No problem!
using Tracker
b = PlanarLayer(10, param)                  # construct parameters using `param`
flow = transformed(d, b)
rand(flow)                                  # <= TrackedArray

x = rand(flow.dist)
Tracker.back!(logpdf_forward(flow, x), 1.0) # backprob through flow

Currently we have PlanarFlow and RadialFlow implemented, but implementing other flows is fairly straightforward.

Issues / Notes

Composition in mathematics is read right-to-left, but in the current implementation transform(Composed{<: Bijector}, x) is implemented by applying transforms from left-to-right; should this be changed?
Ideally, we'd implement the logabsdetjac "manually" rather than using AD for these known transformations.
Currently depend on Turing for the different backend-types; probably don't want to do this. Need to decide how to handle this.
Type-stability; difficult with current Composed type.
Should we use logdetinvjac or logdetjac? They're equivalent up to minus sign, but inside the transformed distributions we'll mostly need the logdetinvjac (though could always do logdetjac(td.transform, transform(td.transform, y))
- Default: logabsdetjac(ib::Inversed{<: Bijector}, y) = -logabsdetjac(ib.orig, transform(ib, y))
- Can then just override logabsdetjac(ib::SpecificBijector, y) to in cases and corresponding for Inversed{<:SpecificBijector} to get both.
~~The interface is potentially a bit verbose at the moment, can do:~~
- ~~Rename fields of the transformed distributions to d::Distribution and b::Bijector~~
- ~~transform(d::Distribution, y) = transform(d.transform, y) and similarily for inverse~~

xukai92 · 2019-07-10T11:43:59Z

Some update of #6 (comment):

I think we should add a transform that returns the transformed r.v. only and forward returns both the transformed r.v and logdetabsjacob. Why? Because in cases we only need "sample" so the transformed r.v. is necessay while computing the logdetabsjacob would be a waste of time.

xukai92 · 2019-07-10T11:44:18Z

I will give a try if things haven't been moved here.

torfjelde · 2019-07-10T12:01:46Z

@xukai92 Agreed!

willtebbutt · 2019-07-14T18:21:36Z

I'm slightly itchy about using a vector to implement the composition operation. As you point out @torfjelde it will probably create some type instability issues. Could we instead use a Tuple so that we can guarantee type stability? Or do we imagine that we're going to need to compose a large number of heterogeneous transformations?

torfjelde · 2019-07-16T09:36:28Z

@willtebbutt After messing about with trying to get type-stability for VI, I think making it a Tuple is a good idea. If we also make compose(ts...) = Composed(ts...) which @xukai92 had in his original proposal, the constructor of composed also becomes type-stable. Granted, this means we lose the "optimization" of turning b ∘ b⁻¹ = Identity(). But, a quick test, using compose(ts...) = Composed(ts...):

@btime b1 ∘ b2
# 1.7 μs for Array
# 600 ns for Tuple with same constructor as Array
# 25 ns for `Compose(ts...)` with Tuple

So it seems worth it to leave the "optimization" up to the user.

EDIT: though it's going to need some work/thought to "unwrap" compositions of Composed to get type-stability.

willtebbutt · 2019-07-16T17:02:25Z

@torfjelde could you explain the difference between your tuple implementation and splatted implementation please? I had imagined something like

struct Composed{Tts<:Tuple} <: Bijector
    ts::Tts
end

so the type of each of the things being composed should be known. Alternatively, we could just not type the stuff that's contained e.g.

struct Composed{Tts} <: Bijector
    ts::Tts
end

and leave it up to the user to use their preferred iterable data structure i.e. they could choose to use either a Vector or some Tuple

xukai92 · 2019-07-16T19:25:05Z

Using Tuple sounds like a very good idea for me.

I also thinked about this but choosed to use vector only because I want a easy way to do push! to a composed transformation + for deep learning the computational bottleneck was on GPUs. But certainly we need this for Turing.jl.

willtebbutt · 2019-07-16T23:10:53Z

I also thinked about this but choosed to use vector only because I want a easy way to do push! to a composed transformation

Fair enough @xukai92. How critical is the push! interface? Does the composition interface suffice in your opinion?

torfjelde · 2019-07-17T07:58:50Z

@torfjelde could you explain the difference between your tuple implementation and splatted implementation please?

@willtebbutt The difference is in the constructor. For the tuple-impl we have to do:

compose(ts...) = Composed(ts)  # <= type-stable

for it to be type-stable. If we drop type-stability from the constructor we can do

function compose(ts...) # <= NOT type-stable
    res = []
    
    for b ∈ ts
        if b isa Composed
            # "lift" the transformations
            for b_ ∈ b.ts
                push!(res, b_)
            end
        else
            if (length(res) > 0) && (res[end] == inv(b))
                # remove if inverse
                pop!(res)
            else
                push!(res, b)
            end
        end
    end

    length(res) == 0 ? IdentityBijector : Composed([res...])
end

which will

"Un-wrap" composed Compositions so that we don't get nested compositions.
Drop sequential application of bijector and it's inverse, i.e. b ∘ inv(b) == Identity()
If we want type-stability in the constructor, I don't see how we can implement the above. I think 1. is quite preferable to have, while 2. can always just be left to the user.

We can of course drop type-stability of constructor but keep the Tuple implementation. We'd still benefit from type-stability, but less so. This is sort of similar to what we're currently doing for TypedVarInfo: not type-stable constructor, but type-stable operations. I'm just thinking we might be composing functions more often that creating instances of VarInfo, thus having a type-stable constructor in this case would be more preferable.

Alternatively, we could just not type the stuff that's contained

Agree; that's what I did locally to try it out.

xukai92 · 2019-07-17T08:25:35Z

I also thinked about this but choosed to use vector only because I want a easy way to do push! to a composed transformation

Fair enough @xukai92. How critical is the push! interface? Does the composition interface suffice in your opinion?

It would work but just a little bit trickier as I was playing with an idea on adding or removing the transformations during training. I could build a new model each time I need to change, or I can push or pop.

@willtebbutt The difference is in the constructor. For the tuple-impl we have to do:

As I said in Slakc I feel this is a little bit over-optimisation. If it doens't give much performance gain we'd better not to do this automatic change which is not expected by user inituitivly.

xukai92 · 2019-07-17T08:29:09Z

Or maybe add a flag called to trigger this behaviour.

torfjelde · 2019-07-17T08:37:03Z

It would work but just a little bit trickier as I was playing with an idea on adding or removing the transformations during training. I could build a new model each time I need to change, or I can push or pop.

One thing to note about this though, depending on how often you add/remove, is that with the type-stable Bijector this would lead to possibly noticiable compilation overhead. But if we do the "generic" container approach, the user can then just choose to use Array instead.

willtebbutt · 2019-07-17T16:14:50Z

As I said in Slakc I feel this is a little bit over-optimisation. If it doens't give much performance gain we'd better not to do this automatic change which is not expected by user inituitivly.

The performance gain really depends on the context: as you say if we're working in the DL setting, then this optimisation is unnecessary. However, if we're expecting to use these in other settings then it's probably not unreasonable to ask for type stability. The other option is, again, to provide both interfaces and just expect the push! interface to work when a Vector of bijectors is provided, and code to be type-stable when a Tuple of bijectors (or some other type-stable container) is provided.

Another alternative is to have two different types, which have different performance characteristics and provide different behaviour. I don't have any good suggestions for names though.

@torfjelde : thanks for the explanation of your implementations

"Un-wrap" composed Compositions so that we don't get nested compositions.

I agree that this is generally tricky to do in a type-stable way, but it's not clear to me why we mind if the user creates nested compositions. Is there something obvious that I'm missing?

Drop sequential application of bijector and it's inverse, i.e. b ∘ inv(b) == Identity()

I agree with you that this is a nice-to-have rather than something that's especially important.

xukai92 · 2019-07-17T16:19:14Z

The performance gain really depends on the context: as you say if we're working in the DL setting, then this optimisation is unnecessary. However, if we're expecting to use these in other settings then it's probably not unreasonable to ask for type stability. The other option is, again, to provide both interfaces and just expect the push! interface to work when a Vector of bijectors is provided, and code to be type-stable when a Tuple of bijectors (or some other type-stable container) is provided.

I'm talking about optimizations for the type stable implementations, i.e. the un-wrapping and getting rid of inverse operations. I completely agree that we need type stable version.

xukai92 · 2019-08-27T20:56:27Z

Composition in mathematics is read right-to-left, but in the current implementation transform(Composed{<: Bijector}, x) is implemented by applying transforms from left-to-right; should this be changed?

Is this still true?

torfjelde · 2019-08-27T21:42:48Z

Composition in mathematics is read right-to-left, but in the current implementation transform(Composed{<: Bijector}, x) is implemented by applying transforms from left-to-right; should this be changed?

Is this still true?

Yes; and it has already caused some confusion I believe. At least, I'm very much for making compose(ts...) = Composed(reversed(ts)). The user should really never have to write Composed((b1, b2, b3)) anymore, so I think changing the order of Composed.ts isn't as crucial (and will require a bit more work)

xukai92 · 2019-08-27T21:52:16Z

I'm pro to stick to the mathematical convention. I think the current implementation already does so right?

torfjelde · 2019-08-27T22:05:57Z

I'm pro to stick to the mathematical convention. I think the current implementation already does so right?

Internally, i.e. Composed.ts, we represent in a left-to-right manner. So Composed.ts[1] is applied, then Composed.ts[2], and so on. This is opposite of mathematical convention, but it's quite convenient for a recursive implementation (though not necessary). But ∘ is implemented so that b1 ∘ b2 == Composed((b2, b1)), which results in b1(b2(x)), which is what you'd expect from ∘.

Currently, compose(ts...) = Composed(ts) but maybe this should be compose(ts...) = Composed(reversed(ts)) OR we re-implement the transform of Composed to apply Composed.ts[end] first, and so on. My initial thought is that this will require an addition reverse(cb.ts) in the implementation which is not as performant, right?

xukai92 · 2019-08-27T22:20:30Z

I feel it's better to make ∘ and compose consistent somehow. compose(ts...) = Composed(reversed(ts)) sounds good but on the user side it would make confusion IMO. What if I just want a transfom do 1, 2 and 3? Users have to construct things by reversing them first.

Maybe what we can do is to drop compose but implment composel(ts) and composer(ts) which are simply equivalences for foldl(∘, ts) and foldr(∘, ts). In this case things are consistent and clear, indicated by l and r. And users have the choice to use whichever the one is convenient.

torfjelde · 2019-08-27T22:40:03Z

I feel it's better to make ∘ and compose consistent somehow. compose(ts...) = Composed(reversed(ts)) sounds good but one the user side it would make confusion IMO. What if I just want a transfom do 1, 2 and 3? Users have to construct things by reversing them first.

Maybe what we can do is to drop compose but implment composel(ts) and composer(ts) which are simply equivalences for foldl(∘, ts) and foldr(∘, ts). In this case things are consistent and clear, indicated by l and r. And users have the choice to use whichever the one is convenient.

That's not a bad idea 👍 Only thing is though, composel(ts...) = foldl(∘, ts) actually leads to right-to-left application because ∘ reverses the order, right? I find this a bit confusing, as I'd expect the composel to give me application from left-to-right. I'm also not certain if the fact that the user has to reverse 1, 2, and 3 is much of a problem. If you're already putting them in a list or something, doing it in the reverse order isn't much of an issue. But I'd be fine with doing what you suggested, though maybe composel(ts...) = foldr(∘, ts) instead?

There's another thing I've been thinking about; you can do multiple dispatch on something like Composed{B1, B2, B3} where B1, B2, and B3 are specific types. If all the "standard" ways of constructing a Composed are such that we get nested compositions, it becomes difficult to do something like what I just mentioned. Because of this, I would like to have at least one way to construct a composition without nesting.

xukai92 · 2019-08-27T23:03:16Z

I guess the way to link composel and composer to foldr and foldr introduces extra confusion. Let's forget about foldr and foldr and simply say composel means applying things from the left and composer means applying things from the right. How does this sound?

If all the "standard" ways of constructing a Composed are such that we get nested compositions, it becomes difficult to do something like what I just mentioned.

I didn't mean to get nested compositions via composel and composer. Simply composel(ts...) = Composed(ts) and composer(ts...) = Composed(reverse(ts)).

torfjelde · 2019-08-27T23:19:58Z

Ah, follow 👍 Yeah that sounds good!

sharanry · 2019-08-29T10:55:16Z

src/interface.jl

+struct Log <: Bijector end
+
+(b::Log)(x) = @. log(x)
+(b::Exp)(y) = @. exp(y)


We often need to apply this to just one of the dimensions. Any ideas on how to go about this?

Me and @xukai92 were discussing this earlier, and we feel like batch computation is something we'll have to have a proper discussion about after this PR is merged. For now assume batch-computation is not supported :/ The difficulty is that in a lot of cases it's ambiguous whether or not something is a batch. And we want to make sure that if we say we support batch-computation, then it's supported for all bijectors, not just on a case-by-case basis.

I know you use it a lot though, so I'd suggest making a different branch where you try to accommodate batch-computation and then we can potentially use this when we're considering the approach we're going to take to support it:)

Discussion: #27 (comment)

@sharanry @torfjelde Let's experiment this feature in a different branch. I'm going to merge this PR soon.

xukai92 · 2019-08-29T19:02:41Z

We also need to increment the version number in Project.toml. Since these are major interface introduction, shall we increment to 0.4.0 @yebai ?

yebai · 2019-08-29T19:26:00Z

We also need to increment the version number in Project.toml. Since these are major interface introduction, shall we increment to 0.4.0 @yebai ?

Sure, 0.4 sounds good to me.

xukai92 · 2019-08-29T19:36:29Z

@torfjelde Can you increment the version please?

torfjelde · 2019-08-29T19:53:22Z

@torfjelde Can you increment the version please?

Done!

torfjelde added 4 commits July 8, 2019 21:26

initial implementation of new interface

839c882

fixed the testing

65fce15

now adheres to the style guide

0be22a6

cant have type templates on next line...

16d83aa

torfjelde added 8 commits July 10, 2019 20:15

made more similar to the interface suggested by @xukai92

4c90e1e

fixed typo in error message

c182b1f

added composition of Bijectors

52b5deb

improved composing compositions and added overloading of circ

88c681d

composition with its own inverse now results in Identity

b413972

made a constant IdentityBijector

aa65723

added Logit bijector and specialization for Beta distribution

08f5e1a

fixed a typo in transform of inv(Logit)

2e71597

torfjelde mentioned this pull request Jul 11, 2019

New interface #6

Closed

torfjelde added 3 commits July 12, 2019 14:45

fixed typo

625917e

added jacobian function for bijectors, with AD implementations

d043ef4

added jacobian using AD for inverses of ADBijector

1940748

torfjelde added 7 commits August 26, 2019 03:09

added my name to a comment

eb4bbc8

fixed a typo leading to Kolmogorov being treated as unit-contrained

a568f86

added SingularJacobianException to logabsdetjac of ADBijector

0d5a3be

made changes to adhere to style-guide

323e61a

mode style-changes

04f5afa

added a more useful forward(td::TransformedDistribution)

68355c7

replaced constant variables log_b and exp_b with Log and Exp

a2336d5

torfjelde mentioned this pull request Aug 27, 2019

Update style-guide.md TuringLang/Turing.jl#897

Merged

compose replaced by composel and composer

d71b87e

xukai92 approved these changes Aug 28, 2019

View reviewed changes

torfjelde added 2 commits August 29, 2019 01:05

fixed sign typo and added AD verification

d6856d1

added special logpdf_forward for dirichlet in forward

a851abe

sharanry reviewed Aug 29, 2019

View reviewed changes

torfjelde added 2 commits August 29, 2019 20:01

added docstring to forward(d::Distribution)

ea6f1ee

change variable name in forward(d::Distribution)

fc2351a

xukai92 changed the title ~~[WIP] New interface~~ New interface Aug 29, 2019

increment version number to 0.4.0

b49aac5

yebai merged commit 6ea2c37 into TuringLang:master Aug 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New interface #27

New interface #27

torfjelde commented Jul 8, 2019 •

edited by xukai92

Loading

xukai92 commented Jul 10, 2019

xukai92 commented Jul 10, 2019

torfjelde commented Jul 10, 2019

willtebbutt commented Jul 14, 2019

torfjelde commented Jul 16, 2019 •

edited

Loading

willtebbutt commented Jul 16, 2019

xukai92 commented Jul 16, 2019

willtebbutt commented Jul 16, 2019

torfjelde commented Jul 17, 2019

xukai92 commented Jul 17, 2019

xukai92 commented Jul 17, 2019

torfjelde commented Jul 17, 2019

willtebbutt commented Jul 17, 2019

xukai92 commented Jul 17, 2019

xukai92 commented Aug 27, 2019 •

edited

Loading

torfjelde commented Aug 27, 2019

xukai92 commented Aug 27, 2019

torfjelde commented Aug 27, 2019

xukai92 commented Aug 27, 2019 •

edited

Loading

torfjelde commented Aug 27, 2019

xukai92 commented Aug 27, 2019

torfjelde commented Aug 27, 2019

sharanry Aug 29, 2019

torfjelde Aug 29, 2019

xukai92 Aug 29, 2019

xukai92 commented Aug 29, 2019

yebai commented Aug 29, 2019

xukai92 commented Aug 29, 2019

torfjelde commented Aug 29, 2019

New interface #27

New interface #27

Conversation

torfjelde commented Jul 8, 2019 • edited by xukai92 Loading

Overview

Examples

Composition

ADVI

Finite-set Stein Discrepancy (FSSD) test

Normalizing flows (thanks to @sharanry!)

Issues / Notes

xukai92 commented Jul 10, 2019

xukai92 commented Jul 10, 2019

torfjelde commented Jul 10, 2019

willtebbutt commented Jul 14, 2019

torfjelde commented Jul 16, 2019 • edited Loading

willtebbutt commented Jul 16, 2019

xukai92 commented Jul 16, 2019

willtebbutt commented Jul 16, 2019

torfjelde commented Jul 17, 2019

xukai92 commented Jul 17, 2019

xukai92 commented Jul 17, 2019

torfjelde commented Jul 17, 2019

willtebbutt commented Jul 17, 2019

xukai92 commented Jul 17, 2019

xukai92 commented Aug 27, 2019 • edited Loading

torfjelde commented Aug 27, 2019

xukai92 commented Aug 27, 2019

torfjelde commented Aug 27, 2019

xukai92 commented Aug 27, 2019 • edited Loading

torfjelde commented Aug 27, 2019

xukai92 commented Aug 27, 2019

torfjelde commented Aug 27, 2019

sharanry Aug 29, 2019

Choose a reason for hiding this comment

torfjelde Aug 29, 2019

Choose a reason for hiding this comment

xukai92 Aug 29, 2019

Choose a reason for hiding this comment

xukai92 commented Aug 29, 2019

yebai commented Aug 29, 2019

xukai92 commented Aug 29, 2019

torfjelde commented Aug 29, 2019

torfjelde commented Jul 8, 2019 •

edited by xukai92

Loading

torfjelde commented Jul 16, 2019 •

edited

Loading

xukai92 commented Aug 27, 2019 •

edited

Loading

xukai92 commented Aug 27, 2019 •

edited

Loading