Machine Learning Done Bayesian

In the dark corners of the academic world there is a rampant fight between practitioners of deep learning and researchers of Bayesian methods. This polemic article testifies to this, although firmly establishing itself as anti-Bayesian.

There is not much you can have against Bayes’ rule, so the hate runs deeper than this. I think it stems from the very behavior of Bayesian researchers rewriting existing methods as approximations to Bayesian methods.

Ferenc Huszár, a machine learning researcher at Twitter describes some of these approximations.

  • L1 regularization is just Maximum A Posteriori (MAP) estimation with sparsity inducing priors;
  • Support vector machines are just the wrong way to train Gaussian processes;
  • Herding is just Bayesian quadrature done slightly wrong;
  • Dropout is just variational inference done slightly wrong;
  • Stochastic gradient descent (SGD) is just variational inference (variational EM) done slightly wrong.

Do you have other approximations you can think of?


The comments are not automatically shown, so the page loads faster and you only partake in the disqus network if you click. However, this does not mean that your comments are not appreciated, to the contrary!