Bayesian Dark Knowledge

TL;DR: proposes “distilled SGLD” where SGLD is “packed into one neural net” for computational savings.

Their most complicated experiment is with MNIST classification, and they show that SGLD is superior with SGD, and for once they say that they run 10 trials (not just one) though unfortunately they don’t describe hyperparameter tuning in detail. Well, at least this is better than related work.

Their distilled SGLD is slightly worse wrt accuracy (than SGLD, maybe SGD not sure) but their point was to just sacrifice a small amount of accuracy and get LOTS of computational and memory savings. Uses thinning and burn-in. Unfortunately, no code seems to be available from them BUT it’s on MXNet.

I don’t know the stopping condition and I assume they averaged the predictions (ensembles) for SGLD at least, and perhaps the other methods. It would really be helpful to have all this information clearly laid out in an appendix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bayesian_Dark_Knowledge.md

Bayesian_Dark_Knowledge.md

Bayesian Dark Knowledge

Files

Bayesian_Dark_Knowledge.md

Latest commit

History

Bayesian_Dark_Knowledge.md

File metadata and controls

Bayesian Dark Knowledge